Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Apple Unveils Progress in Multimodal Large Language Models

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models

Grok-1 Architecture Open-Sourced for General Release by xAI

Elon Musk’s xAI has released its Grok chatbot and open-sourced part of the underlying Grok-1 model architecture for any developer or entrepreneur to use for purposes including commercial applications. Musk unveiled Grok in November and announced that it would be publicly released this month. The chatbot itself is available to X social premium members, who can ask the cheeky AI questions and get answers with a snarky attitude inspired by “The Hitchhiker’s Guide to the Galaxy” sci-fi novel. The training for Grok’s foundation LLM is said to include X social posts. Continue reading Grok-1 Architecture Open-Sourced for General Release by xAI

AI Video Startup Haiper Announces Funding and Plans for AGI

London-based AI video startup Haiper has emerged from stealth mode with $13.8 million in seed funding and a platform that generates up to two seconds of HD video from text prompts or images. Founded by alumni from Google DeepMind, TikTok and various academic research labs, Haiper is built around a bespoke foundation model that aims to serve the needs of the creative community while the company pursues a path to artificial general intelligence (AGI). Haiper is offering a free trial of what is currently a web-based user interface similar to offerings from Runway and Pika. Continue reading AI Video Startup Haiper Announces Funding and Plans for AGI

GenAI Lets Snapchat+ Subscribers Create and Share Images

Snapchat+ is rolling out new artificial intelligence features that let subscribers use text prompts to create generative AI images to share with friends. In addition, the Dreams feature, which creates generative AI selfies, is now able to add your friends to those photos. Snapchat+ subscribers get one pack of 8 Dreams per month as part of their $3.99 monthly fee. An onscreen button labeled “AI” lets subscribers access the AI image generator to choose from a menu of prompts (including “sunny day at the beach” and “planet made of cheese”) or they can enter their own descriptions. Continue reading GenAI Lets Snapchat+ Subscribers Create and Share Images

Threads Lets Users Delete Accounts Separate from Instagram

Threads, the Twitter competitor launched in July by Meta Platforms to record-breaking numbers, has added features that make it easier for users to separate their Threads feeds from Instagram and Facebook. Users can now delete their Threads accounts separate from Instagram, something that previously confounded users. Because those signing up for Threads were required to do so either from their existing or a new Instagram account, the two were entwined. Instagram/Threads CEO Adam Mosseri also announced that propagation of Threads posts to Instagram and Facebook can now be turned off, to keep discussions separate. Continue reading Threads Lets Users Delete Accounts Separate from Instagram

Meta’s WhatsApp Launches Voice Chat for Up to 128 People

Meta Platforms-owned instant messaging and VoIP service WhatsApp has updated its Voice Chat feature for mobile so it can now host group calls of up to 128 participants. Voice chats allow WhatsApp users to instantly talk live with members of a group chat while still being able to message within the group. The new feature, which is being compared to a Discord server, is being rolled out globally. The idea is to have the Voice Chat be less disruptive than group calling, which rings-in all group members. Voice chats can be quietly started with an in-chat bubble users tap to join. The updated version will have end-to-end encryption by default. Continue reading Meta’s WhatsApp Launches Voice Chat for Up to 128 People

Woodpecker: Chinese Researchers Combat AI Hallucinations

The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations

Yasa-1: Startup Reka Launches New AI Multimodal Assistant

Startup Reka AI is releasing in preview its first artificial intelligence assistant, Yasa-1. The multimodal AI is described as “a language assistant with visual and auditory sensors.” The year-old company says it “trained Yasa-1 from scratch,” including pretraining foundation models “from ground zero,” then aligning them and optimizing to its training and server infrastructures. “Yasa-1 is not just a text assistant, it also understands images, short videos and audio (yes, sounds too),” said Reka AI co-founder and Chief Scientist Yi Tay. Yasa-1 is available via Reka’s APIs and as docker containers for on-site or virtual private cloud deployment. Continue reading Yasa-1: Startup Reka Launches New AI Multimodal Assistant

OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

OpenAI is experimenting with new voice and image capabilities in ChatGPT. According to the company, users can now “speak with ChatGPT and have it talk back,” thanks to an intuitive new interface that, in addition to facilitating voice conversations, will allow users to show ChatGPT an image to discuss. “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it,” OpenAI explains, alternatively suggesting you “snap pictures of your fridge and pantry to figure out what’s for dinner” or have it help with homework based on pictures of a math problem. Continue reading OpenAI’s ChatGPT Upgraded with ‘Talk’ Tech, Image Search

Google Introduces an AI Watermark That Cannot Be Removed

Google DeepMind and Google Cloud have teamed to launch what they claim is an indelible AI watermark tool, which if it works would mark an industry first. Called SynthID, the technique for identifying AI-generated images is being launched in beta. The technology embeds its digital watermark “directly into the pixels of an image, making it imperceptible to the human eye, but detectable for identification,” according to DeepMind. SynthID is being released to a limited number of Google’s Vertex AI customers using Imagen, a Google AI language model that generates photorealistic images. Continue reading Google Introduces an AI Watermark That Cannot Be Removed

Meta’s AudioCraft Turns Words into Music with Generative AI

Meta Platforms is releasing AudioCraft, a generative AI framework that creates “high-quality,” “realistic” audio and music from text prompts. AudioCraft consists of three models: MusicGen, AudioGen and EnCodec, all of which Meta announced it is open-sourcing. Released in June, MusicGen was trained on Meta-owned and licensed music, and generates music from text prompts, while AudioGen, which was trained on public domain samples, generates sound effects (like honking horns and barking dogs) from text prompts. The EnCodec decoder allows “higher quality music generation with fewer artifacts,” according to Meta. Continue reading Meta’s AudioCraft Turns Words into Music with Generative AI

Apple Chatbot ‘Ajax’ Could Be Next Major Player in AI Space

Apple is reportedly developing tools it could use to enter the artificial intelligence space, joining rivals such as Microsoft and Google, which have already released popular products. In Cupertino, the company is said to have built a framework for large language models, which power AI-based chatbot offerings similar to Google’s Bard and OpenAI’s ChatGPT. Called Ajax, the platform is the basis for what is referred to inside the company as Apple GPT. Though Apple has built automation into its products for some time, it could now be preparing to make a direct play for the generative AI market. Continue reading Apple Chatbot ‘Ajax’ Could Be Next Major Player in AI Space

Vimeo Says Its AI Makes Video as Easy to Edit as Word Docs

Vimeo is leveraging artificial intelligence to automate video editing, and says its new AI suite of tools enables the creation of “a fully produced video in minutes by generating scripts from text prompts, recording videos in one take, and editing content as easily as a Word doc,” the company claims. Features include recording using a built-in screen teleprompter and the ability to quickly delete unwanted filler words (“ums” and “uhs”) and long pauses. The video hosting and sharing platform is rolling out the AI tools in July as part of the $20 per month standard subscription. Continue reading Vimeo Says Its AI Makes Video as Easy to Edit as Word Docs

Meta Develops Computer Vision AI That Learns Like Humans

Meta Platforms continues to make progress on a mission to develop artificial intelligence that can teach itself to learn how the world works. Chief AI Scientist Yann LeCun has taken a special interest in developing the new model, called Image Joint Embedding Predictive Architecture, or I-JEPA, which learns by building an internal representation of the outside world and analyzing image abstracts instead of comparing pixels. The approach allows AI techto learn more like humans do, with their ability to figure out complex tasks and adapt to new situations. Continue reading Meta Develops Computer Vision AI That Learns Like Humans