Multimodal AI Archives - Page 2 of 3

OpenAI Pushes GPT-4o Customization with Free Token Offer

By Paula Parisi
August 27, 2024

OpenAI announced its newest model, GPT-4o, can now be customized. The company said that the ability to fine-tune the multimodal GPT-4o has been “one of the most requested features from developers.” Customization can move the model toward more specific structure and tone of responses or allow it to follow specific instruction sets geared toward individual use cases. Developers can now implement custom datasets, aiming for better performance at a lower cost. The ChatGPT maker is rolling out the welcome mat by offering 1 million training tokens per day “for free for every organization” through September 23. Continue reading OpenAI Pushes GPT-4o Customization with Free Token Offer

Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice

By Paula Parisi
August 20, 2024

Google has released its AI assistant, Gemini Live, and is positioning it to replace Google Assistant on mobile. Gemini Live is rolling out on Android to subscribers of Gemini Advanced, which is part of the $20 monthly Google One AI Premium plan. Those consumers who purchase the new Pixel 9 Pro — which begins shipping this week — will get the assistant as part of a year of free access to Gemini Advanced, a $240 value, according to the company. Google claims that Gemini Live technology enables natural, flowing conversations with the AI assistant, putting “a sidekick in your pocket.” Continue reading Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice

OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

By Paula Parisi
August 5, 2024

OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall. Continue reading OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

Tough EU Laws Prompt Meta, Apple to Withhold New Products

By Paula Parisi
July 19, 2024

U.S. tech companies are fighting back against what they feel are overly oppressive European Union regulations by withholding products from that market. Meta Platforms will not release its next Llama multimodal AI model there, along with future products. Apple last month said certain Apple Intelligence AI features will not be released in the EU. Previously, tech companies would accommodate regional laws by adapting global strategies so they could do business everywhere with the same products. Given the restrictions of the Digital Markets Act and other EU rules, Big Tech is signaling that may no longer be possible. Continue reading Tough EU Laws Prompt Meta, Apple to Withhold New Products

Apple Launches Public Demo of Its Multimodal 4M AI Model

By Paula Parisi
July 3, 2024

Apple has released a public demo of the 4M AI model it developed in collaboration with the Swiss Federal Institute of Technology Lausanne (EPFL). The technology debuts seven months after the model was first open-sourced, allowing informed observers the opportunity to interact with it and assess its capabilities. Apple says 4M was built by applying masked modeling to a single unified Transformer encoder-decoder “across a wide range of input/output modalities — including text, images, geometric and semantic modalities, as well as neural network feature maps.” Continue reading Apple Launches Public Demo of Its Multimodal 4M AI Model

Runway’s Gen-3 Alpha Creates AI Videos Up to 10-Seconds

By Paula Parisi
June 19, 2024

Runway ML has introduced a new foundation model, Gen-3 Alpha, which the company says can generate high-quality, realistic scenes of up to 10 seconds long from text prompts, still images or a video sample. Offering a variety of camera movements, Gen-3 Alpha will initially roll out to Runway’s paid subscribers, but the company plans to add a free version in the future. Runway says Gen-3 Alpha is the first of a new series of models trained on the company’s new large-scale multimodal infrastructure, which offers improvements “in fidelity, consistency, and motion over Gen-2,” released last year. Continue reading Runway’s Gen-3 Alpha Creates AI Videos Up to 10-Seconds

OpenAI Is Working on New Frontier Model to Succeed GPT-4

By Paula Parisi
May 30, 2024

OpenAI has begun training a new flagship artificial intelligence model to succeed GPT-4, the technology currently associated with ChatGPT. The new model — which some are already calling GPT-5, although OpenAI hasn’t yet shared its name — is expected to take the company’s compute to the next level as it works toward artificial general intelligence (AGI), intelligence equal to or surpassing human cognitive abilities. The company also announced it has formed a new Safety and Security Committee two weeks after dissolving the old one upon the departure of OpenAI co-founder and Chief Scientist Ilya Sutskever. Continue reading OpenAI Is Working on New Frontier Model to Succeed GPT-4

Meta Advances Multimodal Model Architecture with Chameleon

By Paula Parisi
May 28, 2024

Meta Platforms has unveiled its first natively multimodal model, Chameleon, which observers say can make it competitive with frontier model firms. Although Chameleon is not yet released, Meta says internal research indicates it outperforms the company’s own Llama 2 in text-only tasks and “matches or exceeds the performance of much larger models” including Google’s Gemini Pro and OpenAI’s GPT-4V in a mixed-modal generation evaluation “where either the prompt or outputs contain mixed sequences of both images and text.” In addition, Meta calls Chameleon’s image generation “non-trivial,” noting that’s “all in a single model.” Continue reading Meta Advances Multimodal Model Architecture with Chameleon

Google Ups AI Quotient with Search-Optimized Gemini Model

By Paula Parisi
May 16, 2024

Google has infused search with more Gemini AI, adding expanded AI Overviews and more planning and research capabilities. “Ask whatever’s on your mind or whatever you need to get done — from researching to planning to brainstorming — and Google will take care of the legwork” culling from “a knowledge base of billions of facts about people, places and things,” explained Google and Alphabet CEO Sundar Pichai at the Google I/O developer conference. AI Overviews will roll out to all U.S. users this week. Coming soon are customizable AI Overview options that can simplify language or add more detail. Continue reading Google Ups AI Quotient with Search-Optimized Gemini Model

Anthropic Debuts Enterprise Plan, Free Claude App for iPhone

By ETCentric Staff
May 6, 2024

Anthropic has launched a paid tier catering to business customers as well as a free mobile app for iOS users featuring its chatbot Claude. The generative AI startup — which has backing from Amazon, Google and Salesforce — is positioning itself to compete with companies like OpenAI, Google and Microsoft that focus on enterprise plans for revenue while also offering individual plans. Anthropic’s Team plan starts at $30 per user per month, on par with competing enterprise products, and requires a minimum of five seats. Anthropic has been beta testing Team over the past few quarters in industries including legal, tech and healthcare. Continue reading Anthropic Debuts Enterprise Plan, Free Claude App for iPhone

Apple Unveils OpenELM Tech Optimized for Local Applications

By ETCentric Staff
April 26, 2024

The trend toward small language models that can efficiently run on a single device instead of requiring cloud connectivity has emerged as a focus for Big Tech companies involved in artificial intelligence. Apple has released the OpenELM family of open-source models as its entry in that field. OpenELM uses “a layer-wise scaling strategy” to efficiently allocate parameters within each layer of the transformer model, resulting in what Apple claims is “enhanced accuracy.” The “ELM” stands for “Efficient Language Models,” and one media outlet couches it as “the future of AI on the iPhone.” Continue reading Apple Unveils OpenELM Tech Optimized for Local Applications

Google Offers Public Preview of Gemini Pro for Cloud Clients

By ETCentric Staff
April 11, 2024

Google is moving its most powerful artificial intelligence model, Gemini 1.5 Pro, into public preview for developers and Google Cloud customers. Gemini 1.5 Pro includes what Google claims is a breakthrough in long context understanding, with the ability to run 1 million tokens of information “opening up new possibilities for enterprises to create, discover and build using AI.” Gemini’s multimodal capabilities allow it to process audio, video, text, code and more, which when combined with long context, “enables enterprises to do things that just weren’t possible with AI before,” according to Google. Continue reading Google Offers Public Preview of Gemini Pro for Cloud Clients

Apple’s ReALM AI Advances the Science of Digital Assistants

By ETCentric Staff
April 4, 2024

Apple has developed a large language model it says has advanced screen-reading and comprehension capabilities. ReALM (Reference Resolution as Language Modeling) is artificial intelligence that can see and read computer screens in context, according to Apple, which says it advances technology essential for a true AI assistant “that aims to allow a user to naturally communicate their requirements to an agent, or to have a conversation with it.” Apple claims that in a benchmark against GPT-3.5 and GPT-4, the smallest ReALM model performed “comparable” to GPT-4, with its “larger models substantially outperforming it.” Continue reading Apple’s ReALM AI Advances the Science of Digital Assistants

Apple Launches Open-Source Language-Based Image Editor

By ETCentric Staff
February 9, 2024

Apple has released MGIE, an open-source AI model that edits images using natural language instructions. MGIE, short for MLLM-Guided Image Editing, can also modify and optimize images. Developed in conjunction with University of California Santa Barbara, MGIE is Apple’s first AI model. The multimodal MGIE, which understands text and image input, also crops, resizes, flips, and adds filters based on text instructions using what Apple says is an easier instruction set than other AI editing programs, and is simpler and faster than learning a traditional program, like Apple’s own Final Cut Pro. Continue reading Apple Launches Open-Source Language-Based Image Editor

Conversational Chatbot Optimizes Google Ads, Search Results

By Paula Parisi
January 25, 2024

Google’s multimodal Gemini large language model will offer chat capabilities that help advertisers build and scale Search campaigns within the Google Ads platform using natural language prompts. “We’ve been actively testing Gemini to further enhance our ads solutions, and, we’re pleased to share that Gemini is now powering the conversational experience,” Google said, explaining the functionality is now available in beta to English language advertisers in the U.S., UK and will be rolling out globally to all English language advertisers over the next few weeks, with additional languages offered in the months ahead. Continue reading Conversational Chatbot Optimizes Google Ads, Search Results