Conversational AI Archives

AWS Updates Nova Reels and Adds Nova Sonic Voice Model

By Paula Parisi
April 10, 2025

Amazon has updated its Nova model series, with Nova Reel 1.1 now able to generate AI videos of up to two minutes as well as gaining a new ‘multi-shot’ feature. Announced in December, Nova Reel marked Amazon’s initial foray into generative video. AWS developer advocate Elizabeth Fuentes says that Nova Reel accommodates user prompts of up to 4,000 characters that can generate a series of six-second shots for a sequence totaling two minutes. The company also introduced the Nova Sonic real-time voice model that supports third-party enterprise development. Continue reading AWS Updates Nova Reels and Adds Nova Sonic Voice Model

Midjourney Launches V7 Image Generator with Voice Prompts

By Paula Parisi
April 9, 2025

Generative AI program Midjourney has issued V7 in alpha, marking its first new model in almost a year. Notable updates include personalization turned on by default, which users must first set up — a process Midjourney says takes 5 minutes — and can then toggle on or off at any time. Another new flagship feature, Draft Mode, lets users render lower resolution images at “half the cost and 10 times the speed,” according to Midjourney, emphasizing “it’s so fast that we change the prompt bar to a ‘conversational mode’ when you’re using it on Web.” Draft Mode also supports voice prompts. Continue reading Midjourney Launches V7 Image Generator with Voice Prompts

Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

By Paula Parisi
March 28, 2025

Alibaba Cloud has released Qwen2.5-Omni-7B, a new AI model the company claims is efficient enough to run on edge devices like mobile phones and laptops. Boasting a relatively light 7-billion parameter footprint, Qwen2.5-Omni-7B understands text, images, audio and video and generates real-time responses in text and natural speech. Alibaba says its combination of compact size and multimodal capabilities is “unique,” offering “the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.” One example would be using a phone’s camera to help a vision impaired-person navigate their environment. Continue reading Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

OpenAI Pushes Conversational Agents with Three New Models

By Paula Parisi
March 24, 2025

OpenAI has debuted three new models for transcription and voice generation — gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. The text-to-speech and speech-to-text AI models are designed to help developers create AI agents with highly customizable voices. OpenAI claims these models will power natural and responsive voice agents, moving AI out of the text-based communications stage and into intuitive spoken conversations. The suite outperforms existing solutions in accuracy and reliability, OpenAI says, especially with “accents, noisy environments, and varying speech speeds,” making them well-suited for customer call centers and meeting notes. Continue reading OpenAI Pushes Conversational Agents with Three New Models

AI Startup Sesame Develops Next Stage of Voice Generation

By Paula Parisi
March 7, 2025

Sesame, an AI startup from Oculus co-founder Brendan Iribe, has created a conversational voice model that many feel has achieved uncanny levels of authenticity. Drawing comparisons to the charismatic vocal centerpiece of the 2013 Warner Bros. film “Her,” Sesame seems to have achieved a new level of engagement among AI voice assistants. While some are describing the tech as “amazing.” others have expressed concern over its capabilities. “Our goal is to achieve ‘voice presence’ — the magical quality that makes spoken interactions feel real, understood and valued,” explains a blog post by Iribe and others. Continue reading AI Startup Sesame Develops Next Stage of Voice Generation

New Samsung XR Headset Could Use Sony 4K Micro OLEDs

By Paula Parisi
March 7, 2025

Samsung shook things up at Mobile World Congress 2025 with a display of its Project Moohan XR headset, which CNBC confirms will be released this year. While the MWC display was just a teaser, and Samsung remained tight-lipped about its specs, the mirrored goggles generated a lot of coverage, including speculation that Samsung may deploy Sony’s 4K Micro OLEDs in the new device, increasing Mohan’s resolution over the Apple Vision Pro by nearly 2 million pixels per eye and offering superior color, too. Samsung worked with Qualcomm and Google to develop Moohan, which will use the new Android XR operating system. Continue reading New Samsung XR Headset Could Use Sony 4K Micro OLEDs

Google Live Gets Computer Vision Screen Sharing This Month

By Paula Parisi
March 6, 2025

Google plans to launch video- and screen-sharing capabilities for Gemini Live by the end of the month as part of the Gemini app on Android, according to discussions coming out of Mobile World Congress in Barcelona this week. Previewed a year ago as Project Astra, the new functionality will allow Gemini Live to accept a video stream captured in real time by the phone’s camera and answer questions about the feed in a conversational way, based on voice input, screen-sharing live videos with Gemini on mobile as Gemini 2.0 currently offers desktop users. Continue reading Google Live Gets Computer Vision Screen Sharing This Month

Amazon’s AI-Powered Alexa+ is Agentic with Computer Vision

By Paula Parisi
February 27, 2025

Over a year after teasing a next-gen Alexa virtual assistant, Amazon is releasing an AI-powered version called Alexa+. The new personal assistant can do things like order groceries for the household, facilitate event planning, manage smart home utilities and security, and, of course, shop online. “She’s smarter, more conversational, more capable,” according to Amazon SVP of Devices & Services Panos Panay. Strategically priced to entice the AI-curious into Amazon membership, Alexa+ costs $20 per month as a standalone service or comes free with Amazon Prime ($15 per month or $139 per year). Continue reading Amazon’s AI-Powered Alexa+ is Agentic with Computer Vision

Adobe Acrobat AI Assistant Can Now Read, Analyze Contracts

By Paula Parisi
February 6, 2025

Adobe has added intelligent contract capabilities to the Acrobat AI Assistant, which can now help interpret agreements for businesses and consumers. Adobe’s smart contract reader can help people understand those lengthy click-to-agree terms of service treatises that precede vendor websites and service agreements. “Customers open billions of contracts in Adobe Acrobat each month and AI can be a game changer in helping simplify their experience,” Adobe Document Cloud SVP Abhigyan Modi said in an announcement. For business users, Acrobat AI Assistant can identify key dates, prepare or review new agreements, and highlight changes. Continue reading Adobe Acrobat AI Assistant Can Now Read, Analyze Contracts

Facebook, Instagram, WhatsApp Get Meta AI Memory Boost

By Paula Parisi
January 29, 2025

Meta is rolling out personalization updates to its Meta AI personal assistant. At the end of last year, the company introduced a feature that lets Meta AI remember what you’ve shared with it in one-on-one chats on WhatsApp and Messenger so it could produce more relevant responses. That feature will now be available to Meta AI on Facebook, Messenger and WhatsApp for iOS and Android in the U.S. and Canada. “Meta AI will only remember certain things you tell it in 1:1 conversations (not group chats), and you can delete its memories at any time,” explains the company. Continue reading Facebook, Instagram, WhatsApp Get Meta AI Memory Boost

CES: Google TV Integrates Gemini AI for a Conversational Feel

By Paula Parisi
January 9, 2025

Google TV is incorporating Gemini AI to make it easier to converse with a voice assistant as well as generating helpful onscreen information. These new Google TV devices will also feature an upgraded, Gemini-powered voice experience capable of handling more complex voice commands. “You and your family will be able to gather together and have a natural conversation with your TV,” Google announced at CES 2025, where it shared a preview of the new capabilities. The Gemini model also lets Google TV users create customized artwork, control smart home devices and get an overview of the day’s news. Continue reading CES: Google TV Integrates Gemini AI for a Conversational Feel

Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam

By Paula Parisi
December 18, 2024

Meta has added new features to Ray-Ban Metas in time for the holidays via a firmware update that make the smart glasses “the gift that keeps on giving,” per Meta marketing. “Live AI” adds computer vision, letting Meta AI see and record what you see “and converse with you more naturally than ever before.” Along with Live AI, Live Translation is available for Meta Early Access members. Translation of Spanish, French or Italian will pipe through as English (or vice versa) in real time as audio in the glasses’ open-ear speakers. In addition, Shazam support is added for users interested in easily identifying songs. Continue reading Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam

Google Offers Spotify Extension for Gemini Mobile Ecosystem

By Paula Parisi
December 2, 2024

Google has added a Gemini extension that lets users link their Spotify accounts and leverage the AI for music search and discovery. Currently only for Android in English, the app accepts spoken and text prompts to select music by song, album, artist or playlist using “Play & Search.” Only Spotify Premium subscribers will be able to request and play specific tunes on demand. And while users will be able to use Gemini to activate existing playlists or pipe music themed to an activity or mood (like workouts or romantic meals), it cannot create a Spotify playlist or radio. Continue reading Google Offers Spotify Extension for Gemini Mobile Ecosystem

AI Search Wars Heat Up as OpenAI and Google Add Features

By Paula Parisi
November 4, 2024

The AI search wars are officially on, with Google giving Gemini access to its online answer engine just hours before OpenAI launched ChatGPT Search. Google is primarily targeting developers with its new feature, “Grounding with Google Search,” though the Alphabet company used the occasion to also tout its new search return template, AI Overviews. Launched last week, ChatGPT Search offers responses in real time using a conversational format. Initially, it is available only to ChatGPT Plus and Teams subscribers as well as those on the SearchGPT waitlist as part of ChatGPT’s existing interface. Continue reading AI Search Wars Heat Up as OpenAI and Google Add Features

Apple Ships iOS 18.1 with New Siri, Debuts iMac with M4 Chip

By Paula Parisi
October 30, 2024

Apple has revealed the new iMac, which features an M4 chip and Apple Intelligence in an ultra-thin case. Refreshed versions of all three MacBook Pros are also powered by M4 Pro and M4 Max chips. With the M4, the new iMac performs “up to 1.7x faster for daily productivity, and up to 2.1x faster for demanding workflows like photo editing and gaming, compared to iMac with M1.1,” Apple says, adding that the neural engine in the M4 makes “iMac the world’s best all-in-one for AI,” with bespoke Apple Intelligence. Apple released iOS 18.1, which also includes Apple Intelligence features, among them writing and summary tools and a “more natural and conversational Siri.” Continue reading Apple Ships iOS 18.1 with New Siri, Debuts iMac with M4 Chip