Multimodal AI Archives - Page 3 of 3

Unpacked: Samsung Intros Galaxy AI with Next Gen S Phones

By Paula Parisi
January 19, 2024

During this week’s Unpacked event, Samsung introduced Galaxy AI, a suite of artificial intelligence tools designed for the new Galaxy S series smartphones — the Galaxy S24, Galaxy S24+, and Galaxy S24 Ultra. “AI amplifies nearly every experience on the Galaxy S24 series,” including real-time text and call translations, a powerful suite of creative tools in the ProVisual Engine and a new kind of “gestural search that lets users circle, highlight, scribble on or tap anything onscreen” to see related search results. The AI enhancements are largely enabled by a multiyear deal with Google and Qualcomm. Samsung also debuted a wearable accessory, the Galaxy Ring. Continue reading Unpacked: Samsung Intros Galaxy AI with Next Gen S Phones

VideoPoet: Google Launches a Multimodal AI Video Generator

By Paula Parisi
December 22, 2023

Google has unveiled a new large language model designed to advance video generation. VideoPoet is capable of text-to-video, image-to-video, video stylization, video inpainting and outpainting, and video-to-audio. “The leading video generation models are almost exclusively diffusion-based,” Google says, citing Imagen Video as an example. Google finds this counter intuitive, since “LLMs are widely recognized as the de facto standard due to their exceptional learning capabilities across various modalities.” VideoPoet eschews the diffusion approach of relying on separately trained tasks in favor of integrating many video generation capabilities in a single LLM. Continue reading VideoPoet: Google Launches a Multimodal AI Video Generator

Microsoft Brings Meta’s Llama 2 to Azure Models as a Service

By Paula Parisi
December 20, 2023

Microsoft has expanded its Models as a Service (MaaS) catalog for Azure AI Studio, building beyond the 40 models announced at the Microsoft Ignite event last month with the addition of the Llama 2 code generation model from Meta Platforms in public preview. In addition, GPT-4 Turbo with Vision has been added to accelerate generative AI and multimodal application development. Similar to things like Software as a Service (SaaS) and Infrastructure as a Service (IaaS), MaaS lets customers use AI models on-demand over the web with easy setup and technical support. Continue reading Microsoft Brings Meta’s Llama 2 to Azure Models as a Service

Google Debuts Turnkey Gemini AI Studio for Developing Apps

By Paula Parisi
December 15, 2023

Google is rolling out Gemini to developers, enticing them with tools including AI Studio, an easy-to-navigate Web-based platform that will serve as a portal to the multi-tiered Gemini ecosystem, beginning with Gemini Pro, with Gemini Ultra to come next year. The service aims to allow developers to quickly create prompts and Gemini-powered chatbots, providing access to API keys to integrate them into apps. They’ll also be able to access code, should projects require a full featured IDE. The site is essentially a revamped version of what was formerly Google’s MakerSuite. Continue reading Google Debuts Turnkey Gemini AI Studio for Developing Apps

Google Announces the Launch of Gemini, Its Largest AI Model

By Paula Parisi
December 8, 2023

Google is closing the year by heralding 2024 as the “Gemini era,” with the introduction of its “most capable and general AI model yet,” Gemini 1.0. This new foundation model is optimized for three different use-case sizes: Ultra, Pro and Nano. As a result, Google is releasing a new, Gemini-powered version of its Bard chatbot, available to English speakers in the U.S. and 170 global regions. Google touts Gemini as built from the ground up for multimodality, reasoning across text, images, video, audio and code. However, Bard will not as yet incorporate Gemini’s ability to analyze sound and images. Continue reading Google Announces the Launch of Gemini, Its Largest AI Model

Woodpecker: Chinese Researchers Combat AI Hallucinations

By Paula Parisi
October 27, 2023

The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations

ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

By Paula Parisi
October 11, 2023

OpenAI began previewing vision capabilities for GPT-4 in March, and the company is now starting to roll out the image input and output to users of its popular ChatGPT. The multimodal expansion also includes audio functionality, with OpenAI proclaiming late last month that “ChatGPT can now see, hear and speak.” The upgrade vaults GPT-4 into the multimodal category with what OpenAI is apparently calling GPT-4V (for “Vision,” though equally applicable to “Voice”). “We’re rolling out voice and images in ChatGPT to Plus and Enterprise users,” OpenAI announced. Continue reading ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability

Yasa-1: Startup Reka Launches New AI Multimodal Assistant

By Paula Parisi
October 11, 2023

Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment. Continue reading Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Meta Plans Personality-Driven Chatbots to Boost Engagement

By Paula Parisi
August 3, 2023

Meta Platforms is amping up its AI play, with plans to launch a suite of personality-driven chatbots as soon as next month. The company has been developing the series of artificially intelligent character bots with a goal of using them to boost engagement with its social media brands by making them available to have “humanlike discussions” on platforms including Facebook, Instagram and WhatsApp. Internally dubbed “personas,” the chatbots simulate characters ranging from historical figures like Abraham Lincoln to a surfer dude that dispenses travel advice. Continue reading Meta Plans Personality-Driven Chatbots to Boost Engagement

Reka AI Raises $58 Million to Customize LLMs for Enterprise

By Paula Parisi
June 30, 2023

Based on the premise that it is impractical to deploy an all-purpose LLM for specific use cases, a group of researchers from Google, Baidu, DeepMind and Meta founded Reka AI in July 2022. A year later the company has emerged from stealth mode with news of $58 million in Series A funding led by DST Global and Radical Ventures. Strategic partner Snowflake Ventures also participated, along with angel investor Nat Friedman, former CEO of GitHub. The Sunnyvale, California-based startup says it is building “enterprise-grade state-of-the-art AI assistants for everyone, regardless of language and culture.” Continue reading Reka AI Raises $58 Million to Customize LLMs for Enterprise

Meta’s Open-Source ImageBind Works Across Six Modalities

By Paula Parisi
May 15, 2023

Meta Platforms has built and is open-sourcing ImageBind, an artificial intelligence that combines six modalities: audio, visual, text, thermal, movement and depth data. Currently a research project, it suggests a future in which AI models generate multisensory content. “ImageBind equips machines with a holistic understanding that connects objects in a photo with how they will sound, their 3D shape, how warm or cold they are, and how they move,” Meta says. In other words, ImageBind’s approach more closely approximates human thinking by training on the relationship between things rather than ingesting massive datasets so as absorb every possibility. Continue reading Meta’s Open-Source ImageBind Works Across Six Modalities

Microsoft’s Next Generation of Bing AI Interacts with Images

By Paula Parisi
May 8, 2023

Microsoft’s AI-powered Bing search engine has been drawing in excess of 100 million daily active users and logged half a billion chats. With OpenAI’s GPT-4 and DALL-E 2 models driving the action, it has also created over 200 million images since debuting in limited preview in February. Seeking to build on that momentum, Microsoft is adding new features and integrating Bing more tightly with its Edge browser. The company is also ditching its waitlist in a move to open preview. “We’re underway with the transformation of search,” CVP and consumer CMO Yusuf Mehdi said at a preview event last week. Continue reading Microsoft’s Next Generation of Bing AI Interacts with Images

Microsoft Unveils AI Model That Comprehends Image Content

By Paula Parisi
March 6, 2023

Microsoft researchers have unveiled Kosmos-1, a new AI model the company says analyzes images for content, performs visual text recognition, solves visual puzzles and passes visual IQ tests. It also understands natural language instructions. The new model is what’s known as multimodal AI, which means it uses different instruction sets, from text to audio and video. Mixing media is a key step in building artificial general intelligence (AGI) that can perform tasks in a manner approximating human performance. Examples from a Kosmos-1 research paper show it can effectively analyze images, answering questions about them. Continue reading Microsoft Unveils AI Model That Comprehends Image Content