By
Paula ParisiOctober 27, 2023
The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations
By
Paula ParisiOctober 11, 2023
Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment. Continue reading Yasa-1: Startup Reka Launches New AI Multimodal Assistant
By
Paula ParisiSeptember 27, 2023
OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search
By
Paula ParisiSeptember 1, 2023
Google DeepMind and Google Cloud have teamed to launch what they claim is an indelible AI watermark tool, which if it works would mark an industry first. Called SynthID, the technique for identifying AI-generated images is being launched in beta. The technology embeds its digital watermark “directly into the pixels of an image, making it imperceptible to the human eye, but detectable for identification,” according to DeepMind. SynthID is being released to a limited number of Google’s Vertex AI customers using Imagen, a Google AI language model that generates photorealistic images. Continue reading Google Introduces an AI Watermark That Cannot Be Removed
By
Paula ParisiAugust 9, 2023
Meta Platforms is releasing AudioCraft, a generative AI framework that creates “high-quality,” “realistic” audio and music from text prompts. AudioCraft consists of three models: MusicGen, AudioGen and EnCodec, all of which Meta announced it is open-sourcing. Released in June, MusicGen was trained on Meta-owned and licensed music, and generates music from text prompts, while AudioGen, which was trained on public domain samples, generates sound effects (like honking horns and barking dogs) from text prompts. The EnCodec decoder allows “higher quality music generation with fewer artifacts,” according to Meta. Continue reading Meta’s AudioCraft Turns Words into Music with Generative AI
By
Paula ParisiJuly 21, 2023
Apple is reportedly developing tools it could use to enter the artificial intelligence space, joining rivals such as Microsoft and Google, which have already released popular products. In Cupertino, the company is said to have built a framework for large language models, which power AI-based chatbot offerings similar to Google’s Bard and OpenAI’s ChatGPT. Called Ajax, the platform is the basis for what is referred to inside the company as Apple GPT. Though Apple has built automation into its products for some time, it could now be preparing to make a direct play for the generative AI market. Continue reading Apple Chatbot ‘Ajax’ Could Be Next Major Player in AI Space
By
Paula ParisiJune 22, 2023
Vimeo is leveraging artificial intelligence to automate video editing, and says its new AI suite of tools enables the creation of “a fully produced video in minutes by generating scripts from text prompts, recording videos in one take, and editing content as easily as a Word doc,” the company claims. Features include recording using a built-in screen teleprompter and the ability to quickly delete unwanted filler words (“ums” and “uhs”) and long pauses. The video hosting and sharing platform is rolling out the AI tools in July as part of the $20 per month standard subscription. Continue reading Vimeo Says Its AI Makes Video as Easy to Edit as Word Docs
By
Paula ParisiJune 15, 2023
Meta Platforms continues to make progress on a mission to develop artificial intelligence that can teach itself to learn how the world works. Chief AI Scientist Yann LeCun has taken a special interest in developing the new model, called Image Joint Embedding Predictive Architecture, or I-JEPA, which learns by building an internal representation of the outside world and analyzing image abstracts instead of comparing pixels. The approach allows AI techto learn more like humans do, with their ability to figure out complex tasks and adapt to new situations. Continue reading Meta Develops Computer Vision AI That Learns Like Humans
By
Paula ParisiJune 13, 2023
Google-backed AI startup Runway has released Gen-2, an early entry among commercially available text-to-video models. Previously waitlisted in limited release, the commercial availability is impactful, since text-to-video is predicted as the next big bump in artificial intelligence, following the explosion of AI use generating text and images. While Runway’s solution may not be ready to serve as a professional video tool, this is the next step in development of tech expected to impact media and entertainment. Filmmaker Joe Russo recently predicted that within the next two years, AI may have the ability to create feature films. Continue reading Runway Makes Next Advance in Consumer Text-to-Video AI
By
Paula ParisiMay 23, 2023
Details are emerging about the text-based Twitter competitor being developed by Meta Platforms. What is being referred to internally as “Instagram’s new text-based app for conversations” will offer a feed with text posts of up to 500 characters that are capable of attaching links, photos, and videos. The move comes as alternatives including Bluesky, Cohost, Hive, Mastodon and Substack try to gain market share by luring disaffected Twitter users to their platforms. Instagram’s entry in progress — codenamed “P92,” and alternately referred to as “Barcelona” — may soon be interoperable with all of them. Continue reading Meta Testing Decentralized Instagram App as Rival to Twitter
By
Paula ParisiMay 15, 2023
Meta Platforms has built and is open-sourcing ImageBind, an artificial intelligence that combines six modalities: audio, visual, text, thermal, movement and depth data. Currently a research project, it suggests a future in which AI models generate multisensory content. “ImageBind equips machines with a holistic understanding that connects objects in a photo with how they will sound, their 3D shape, how warm or cold they are, and how they move,” Meta says. In other words, ImageBind’s approach more closely approximates human thinking by training on the relationship between things rather than ingesting massive datasets so as absorb every possibility. Continue reading Meta’s Open-Source ImageBind Works Across Six Modalities
By
Paula ParisiMay 8, 2023
Microsoft’s AI-powered Bing search engine has been drawing in excess of 100 million daily active users and logged half a billion chats. With OpenAI’s GPT-4 and DALL-E 2 models driving the action, it has also created over 200 million images since debuting in limited preview in February. Seeking to build on that momentum, Microsoft is adding new features and integrating Bing more tightly with its Edge browser. The company is also ditching its waitlist in a move to open preview. “We’re underway with the transformation of search,” CVP and consumer CMO Yusuf Mehdi said at a preview event last week. Continue reading Microsoft’s Next Generation of Bing AI Interacts with Images
By
Paula ParisiMarch 27, 2023
After several months of testing, Anthropic is making its AI chatbot Claude available for general release in two configurations: the high-performace Claude and a lighter, cheaper, faster option called Claude Instant. Anthropic was launched in 2021 by a pair of former OpenAI employees, and its Claude chatbots are competitors to that firm’s ChatGPT. Accessible through a chat interface and API in Anthropic’s developer console, Claude is being marketed as the product of training designed to produce a more “helpful, honest, and harmless AI systems.” To that end, Anthropic says “Claude is much less likely to produce harmful outputs.” Continue reading Anthropic Takes Claude Chatbot Public After Months of Tests
By
Paula ParisiMarch 16, 2023
OpenAI has released GPT-4, which it says is a more powerful and reliable version of the artificial intelligence technology powering its viral ChatGPT chatbot. GPT-4 can analyze images and handle larger blocks of text and is generally “more creative and collaborative” than earlier iterations when it comes to things like composing songs, writing screenplays and mimicking a user’s authorial style. “GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities,” OpenAI says. GPT-4 is already driving the chatbot technology behind Microsoft’s Bing AI search engine, now in beta. Continue reading OpenAI Announces Official Launch of GPT-4 Multimodal Tech
By
Paula ParisiMarch 16, 2023
Google is readying an API and other enterprise tools for its Pathways Language Model (PaLM) — a large language model similar to GPT — to encourage developers to create chatbots and other apps using the platform. PaLM is one of Google’s most advanced systems, with the capability to generate text, images, code, video and audio from natural language prompts. Much like OpenAI’s GTP series and the LLaMA family from Meta Platforms, it is suitable for a wide variety of general tasks. To facilitate PaLM’s use for specific tasks, Google is launching the MakerSuite along with the PaLM API. Continue reading Google’s PaLM API, MakerSuite Coming to Select Developers