Runway’s Gen-3 Alpha Creates AI Videos Up to 10-Seconds

Runway ML has introduced a new foundation model, Gen-3 Alpha, which the company says can generate high-quality, realistic scenes of up to 10 seconds long from text prompts, still images or a video sample. Offering a variety of camera movements, Gen-3 Alpha will initially roll out to Runway’s paid subscribers, but the company plans to add a free version in the future. Runway says Gen-3 Alpha is the first of a new series of models trained on the company’s new large-scale multimodal infrastructure, which offers improvements “in fidelity, consistency, and motion over Gen-2,” released last year. Continue reading Runway’s Gen-3 Alpha Creates AI Videos Up to 10-Seconds

OpenAI Is Working on New Frontier Model to Succeed GPT-4

OpenAI has begun training a new flagship artificial intelligence model to succeed GPT-4, the technology currently associated with ChatGPT. The new model — which some are already calling GPT-5, although OpenAI hasn’t yet shared its name — is expected to take the company’s compute to the next level as it works toward artificial general intelligence (AGI), intelligence equal to or surpassing human cognitive abilities. The company also announced it has formed a new Safety and Security Committee two weeks after dissolving the old one upon the departure of OpenAI co-founder and Chief Scientist Ilya Sutskever. Continue reading OpenAI Is Working on New Frontier Model to Succeed GPT-4

Meta Advances Multimodal Model Architecture with Chameleon

Meta Platforms has unveiled its first natively multimodal model, Chameleon, which observers say can make it competitive with frontier model firms. Although Chameleon is not yet released, Meta says internal research indicates it outperforms the company’s own Llama 2 in text-only tasks and “matches or exceeds the performance of much larger models” including Google’s Gemini Pro and OpenAI’s GPT-4V in a mixed-modal generation evaluation “where either the prompt or outputs contain mixed sequences of both images and text.” In addition, Meta calls Chameleon’s image generation “non-trivial,” noting that’s “all in a single model.” Continue reading Meta Advances Multimodal Model Architecture with Chameleon

Google Ups AI Quotient with Search-Optimized Gemini Model

Google has infused search with more Gemini AI, adding expanded AI Overviews and more planning and research capabilities. “Ask whatever’s on your mind or whatever you need to get done — from researching to planning to brainstorming — and Google will take care of the legwork” culling from “a knowledge base of billions of facts about people, places and things,” explained Google and Alphabet CEO Sundar Pichai at the Google I/O developer conference. AI Overviews will roll out to all U.S. users this week. Coming soon are customizable AI Overview options that can simplify language or add more detail. Continue reading Google Ups AI Quotient with Search-Optimized Gemini Model

Anthropic Debuts Enterprise Plan, Free Claude App for iPhone

Anthropic has launched a paid tier catering to business customers as well as a free mobile app for iOS users featuring its chatbot Claude. The generative AI startup — which has backing from Amazon, Google and Salesforce — is positioning itself to compete with companies like OpenAI, Google and Microsoft that focus on enterprise plans for revenue while also offering individual plans. Anthropic’s Team plan starts at $30 per user per month, on par with competing enterprise products, and requires a minimum of five seats. Anthropic has been beta testing Team over the past few quarters in industries including legal, tech and healthcare. Continue reading Anthropic Debuts Enterprise Plan, Free Claude App for iPhone

Apple Unveils OpenELM Tech Optimized for Local Applications

The trend toward small language models that can efficiently run on a single device instead of requiring cloud connectivity has emerged as a focus for Big Tech companies involved in artificial intelligence. Apple has released the OpenELM family of open-source models as its entry in that field. OpenELM uses “a layer-wise scaling strategy” to efficiently allocate parameters within each layer of the transformer model, resulting in what Apple claims is “enhanced accuracy.” The “ELM” stands for “Efficient Language Models,” and one media outlet couches it as “the future of AI on the iPhone.” Continue reading Apple Unveils OpenELM Tech Optimized for Local Applications

Google Offers Public Preview of Gemini Pro for Cloud Clients

Google is moving its most powerful artificial intelligence model, Gemini 1.5 Pro, into public preview for developers and Google Cloud customers. Gemini 1.5 Pro includes what Google claims is a breakthrough in long context understanding, with the ability to run 1 million tokens of information “opening up new possibilities for enterprises to create, discover and build using AI.” Gemini’s multimodal capabilities allow it to process audio, video, text, code and more, which when combined with long context, “enables enterprises to do things that just weren’t possible with AI before,” according to Google. Continue reading Google Offers Public Preview of Gemini Pro for Cloud Clients

Apple’s ReALM AI Advances the Science of Digital Assistants

Apple has developed a large language model it says has advanced screen-reading and comprehension capabilities. ReALM (Reference Resolution as Language Modeling) is artificial intelligence that can see and read computer screens in context, according to Apple, which says it advances technology essential for a true AI assistant “that aims to allow a user to naturally communicate their requirements to an agent, or to have a conversation with it.” Apple claims that in a benchmark against GPT-3.5 and GPT-4, the smallest ReALM model performed “comparable” to GPT-4, with its “larger models substantially outperforming it.” Continue reading Apple’s ReALM AI Advances the Science of Digital Assistants

Apple Launches Open-Source Language-Based Image Editor

Apple has released MGIE, an open-source AI model that edits images using natural language instructions. MGIE, short for MLLM-Guided Image Editing, can also modify and optimize images. Developed in conjunction with University of California Santa Barbara, MGIE is Apple’s first AI model. The multimodal MGIE, which understands text and image input, also crops, resizes, flips, and adds filters based on text instructions using what Apple says is an easier instruction set than other AI editing programs, and is simpler and faster than learning a traditional program, like Apple’s own Final Cut Pro. Continue reading Apple Launches Open-Source Language-Based Image Editor

Conversational Chatbot Optimizes Google Ads, Search Results

Google’s multimodal Gemini large language model will offer chat capabilities that help advertisers build and scale Search campaigns within the Google Ads platform using natural language prompts. “We’ve been actively testing Gemini to further enhance our ads solutions, and, we’re pleased to share that Gemini is now powering the conversational experience,” Google said, explaining the functionality is now available in beta to English language advertisers in the U.S., UK and will be rolling out globally to all English language advertisers over the next few weeks, with additional languages offered in the months ahead. Continue reading Conversational Chatbot Optimizes Google Ads, Search Results

Unpacked: Samsung Intros Galaxy AI with Next Gen S Phones

During this week’s Unpacked event, Samsung introduced Galaxy AI, a suite of artificial intelligence tools designed for the new Galaxy S series smartphones — the Galaxy S24, Galaxy S24+, and Galaxy S24 Ultra. “AI amplifies nearly every experience on the Galaxy S24 series,” including real-time text and call translations, a powerful suite of creative tools in the ProVisual Engine and a new kind of “gestural search that lets users circle, highlight, scribble on or tap anything onscreen” to see related search results. The AI enhancements are largely enabled by a multiyear deal with Google and Qualcomm. Samsung also debuted a wearable accessory, the Galaxy Ring. Continue reading Unpacked: Samsung Intros Galaxy AI with Next Gen S Phones

VideoPoet: Google Launches a Multimodal AI Video Generator

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Microsoft Brings Meta’s Llama 2 to Azure Models as a Service

Microsoft has expanded its Models as a Service (MaaS) catalog for Azure AI Studio, building beyond the 40 models announced at the Microsoft Ignite event last month with the addition of the Llama 2 code generation model from Meta Platforms in public preview. In addition, GPT-4 Turbo with Vision has been added to accelerate generative AI and multimodal application development. Similar to things like Software as a Service (SaaS) and Infrastructure as a Service (IaaS), MaaS lets customers use AI models on-demand over the web with easy setup and technical support. Continue reading Microsoft Brings Meta’s Llama 2 to Azure Models as a Service

Google Debuts Turnkey Gemini AI Studio for Developing Apps

Google is rolling out Gemini to developers, enticing them with tools including AI Studio, an easy-to-navigate Web-based platform that will serve as a portal to the multi-tiered Gemini ecosystem, beginning with Gemini Pro, with Gemini Ultra to come next year. The service aims to allow developers to quickly create prompts and Gemini-powered chatbots, providing access to API keys to integrate them into apps. They’ll also be able to access code, should projects require a full featured IDE. The site is essentially a revamped version of what was formerly Google’s MakerSuite. Continue reading Google Debuts Turnkey Gemini AI Studio for Developing Apps

Google Announces the Launch of Gemini, Its Largest AI Model

Google is closing the year by heralding 2024 as the “Gemini era,” with the introduction of its “most capable and general AI model yet,” Gemini 1.0. This new foundation model is optimized for three different use-case sizes: Ultra, Pro and Nano. As a result, Google is releasing a new, Gemini-powered version of its Bard chatbot, available to English speakers in the U.S. and 170 global regions. Google touts Gemini as built from the ground up for multimodality, reasoning across text, images, video, audio and code. However, Bard will not as yet incorporate Gemini’s ability to analyze sound and images. Continue reading Google Announces the Launch of Gemini, Its Largest AI Model