By
Paula ParisiOctober 11, 2024
Hailuo, the free text-to-video generator released last month by the Alibaba-backed company MiniMax, has delivered its promised image-to-video feature. Founded by AI researcher Yan Junjie, the Shanghai-based MiniMax also has backing from Tencent. The model earned high marks for what has been called “ultra realistic” video, and MiniMax says the new image-to-video feature will improve output across the board as a result of “text-and-image joint instruction following,” which means Hailuo now “seamlessly integrates both text and image command inputs, enhancing your visuals while precisely adhering to your prompts.” Continue reading MiniMax’s Hailuo AI Rolls Out New Image-to-Video Capability
By
Paula ParisiOctober 3, 2024
OpenAI unveiled major updates at its DevDay conference with the focus largely on making AI more accessible, efficient and affordable. Included were four innovations: Vision Fine-Tuning in the API, Model Distillation, Prompt Caching and the public beta of Realtime API. The approach underscores OpenAI’s effort to empower its developer ecosystem even as it continues to compete for end-users in the enterprise space. The Realtime API gives developers the option of building “nearly real-time” speech-to-speech app experiences, selecting from among six OpenAI voices. Vision Fine-Tuning for GPT-4o enables customization of the model’s visual understanding of images and text. Continue reading OpenAI Showcases Latest Updates for Voice, Picture and More
By
Paula ParisiSeptember 26, 2024
Meta has secured rights to the voices of actors Judi Dench, Kristen Bell, John Cena and others for its Meta AI chatbot, a ChatGPT-like digital assistant that is part of the plan for conversational AI as part of the multimodal Llama 3.2. Also revealed at Meta Connect this week was Orion, “the most advanced glasses the world has ever seen,” queued up to become Meta’s “first consumer full holographic AR glasses,” though they won’t be available anytime soon. A low-priced Quest 3 mixed reality headset, the $299 Quest 3S, will be arriving in time for the holidays, however. Continue reading Meta Reveals Orion Concept Glasses, Celeb Voices, Quest 3S
By
Paula ParisiSeptember 26, 2024
As OpenAI gears up to become a for-profit company next year, it is releasing ChatGPT Advanced Voice Mode, which brings a humanlike conversation mode to ChatGPT 4o. All U.S. subscribers to ChatGPT Plus and Team plans will gain access to the new feature, which will also be made available to those paying for ChatGPT Edu and Enterprise plans in the coming weeks. The firm is also adding five new voices and allowing customers to save personalized instructions for the voice assistant, including memory behaviors. Concurrently, executives including CTO Mira Murati have resigned as the company pivots to commerciality. Continue reading OpenAI Rolls Out Advanced Voice Mode Feature for ChatGPT
By
Paula ParisiSeptember 18, 2024
Google announced the company is making its new AI assistant Gemini Live available free to all Android users. The move follows the feature’s release last month to Gemini Advanced subscribers. This general release will occur gradually, and only in English for the time being. Gemini Live lets users have a more natural, free-flowing conversation with their phones than was available through Google Assistant via the “Hey, Google” prompt. Gemini inquiries are meant to be conversational, eliciting a back and forth that queriers can interrupt, adding more detail or veering to another topic entirely. Continue reading Google Begins Rolling Out Gemini Live Free to Android Users
By
Paula ParisiJuly 18, 2024
Google has launched the beta version of its Gemini-powered Google Vids productivity app, which lets users create work-related video presentations that embed documents, slides, audio recordings and even additional videos into a timeline. Incorporated into Workspace Labs, Google’s AI preview space, Google says invited participants can use Vids to “build a narrative with high quality templates” or “get to a first draft faster.” Access to Google’s royalty-free stock content library and Vids recording studio means a project can be completed “without ever leaving Workspace,” according to the company. Continue reading Gemini Powering Google Vids Multimedia Presentation Builder
By
Paula ParisiJune 6, 2024
ElevenLabs has launched its text-to-sound generator Sound Effects for all users, available now at the company’s website. The new AI tool can create audio effects, short instrumental tracks, soundscapes and even character voices. Sound Effects “has been designed to help creators — including film and television studios, video game developers, and social media content creators — generate rich and immersive soundscapes quickly, affordably and at scale,” according to the startup, which developed the tool in partnership with Shutterstock, using its library of licensed audio tracks. Continue reading ElevenLabs Launches an AI Tool for Generating Sound Effects
By
ETCentric StaffMarch 21, 2024
Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura