DeepMind’s V2A Generates Music, Sound Effects, Dialogue

Google DeepMind has unveiled new research on AI tech it calls V2A (“video-to-audio”) that can generate soundtracks for videos. The initiative complements the wave of AI video generators from companies ranging from biggies like OpenAI and Alibaba to startups such as Luma and Runway, all of which require a separate app to add sound. V2A technology “makes synchronized audiovisual generation possible” by combining video pixels with natural language text prompts “to generate rich soundscapes for the on-screen action,” DeepMind writes, explaining that it can “create shots with a dramatic score, realistic sound effects or dialogue.” Continue reading DeepMind’s V2A Generates Music, Sound Effects, Dialogue

Nokia Makes the First-Ever 3D Spatial Audio Cell Phone Call

Nokia made what it claims is “the world’s first immersive voice and audio call” using cell phones, made possible by the new 3GPP Immersive Voice and Audio Services (IVAS) codec that lets consumers hear 3D spatial sound in real-time. The codec — which Nokia participated in crafting — is a major leap from today’s standard monophonic smartphone voice call experience and is part of the upcoming 5G Advanced standard. The innovation paves the way towards enhanced immersive spatial communications, extended reality and metaverse applications, says Nokia, explaining that it works across “any connected device,” including smartphones, tablets and PCs. Continue reading Nokia Makes the First-Ever 3D Spatial Audio Cell Phone Call

Stability AI Releases Free Sound FX Tool, Stable Audio Open

Stability AI has added another audio product to its lineup, releasing the open-source text-to-audio generator Stable Audio Open 1.0 for sound design. The new model can generate up to 47 seconds of samples and sound effects, including drum beats, instrument riffs, ambient sounds, foley and production elements. It also allows for adapting variations and changing the style of audio samples. Stability AI — best known for the image generator Stable Diffusion — in September released Stable Audio, a commercial product that can generate sophisticated music tracks of up to three minutes. Continue reading Stability AI Releases Free Sound FX Tool, Stable Audio Open

ElevenLabs Launches an AI Tool for Generating Sound Effects

ElevenLabs has launched its text-to-sound generator Sound Effects for all users, available now at the company’s website. The new AI tool can create audio effects, short instrumental tracks, soundscapes and even character voices. Sound Effects “has been designed to help creators — including film and television studios, video game developers, and social media content creators — generate rich and immersive soundscapes quickly, affordably and at scale,” according to the startup, which developed the tool in partnership with Shutterstock, using its library of licensed audio tracks. Continue reading ElevenLabs Launches an AI Tool for Generating Sound Effects

AI Startup Suno Raises Funds to ‘Democratize Music Creation’

Music startup Suno, which leverages ChatGPT tech with the goal of emulating that app’s success in music, has raised $125 million in Series B funding, resulting in a valuation of $500 million. Founded by Harvard physics PhD turned tech entrepreneur Mikey Shulman, the company is being called “a rising star” in the realm of generative AI. Suno lets people generate original songs by using text prompts or lyrics, with the AI supplying the melodies and harmonies for fully-formed compositions. “We started Suno to build a future where anyone can make music,” according to the company. Continue reading AI Startup Suno Raises Funds to ‘Democratize Music Creation’

Sonos Rolls Out Its First Headphones, the $450 Bluetooth Ace

Sonos, the company that helped launch the Wi-Fi speaker market is now branching into wireless over-ear headphones. The launch marks a much-anticipated and also inevitable move, considering the U.S. headset market was estimated to be almost $2.2 billion last year, nearly twice as large as the total for wireless speaker sales, according to market research firm Circana. Sonos Ace headphones have what is being called exceptional noise-cancellation and feature Bluetooth connectivity and a Wi-Fi chip so they can be used in conjunction with the Sonos soundbar for a personal home-theater experience. They ship June 5 for $449. Continue reading Sonos Rolls Out Its First Headphones, the $450 Bluetooth Ace

Substack Creator Studio Bows with 10 Video Fellowship Slots

Substack is attempting to lure select TikTok posters to its publishing platform with the launch of Substack Creator Studio. Billed as “a fellowship for the next wave of video stars to turn their TikTok channels into Substack shows and communities,” the outlet says video-native creators will be able to forge a “more direct, intimate relationship with their audience” on Substack, while making money from subscriptions. Only 10 fellows will be initially selected, and given access to consulting and production support from Adam Faze’s Gymnasium short-form studio, producer of the TikTok series “Boy Room.” Continue reading Substack Creator Studio Bows with 10 Video Fellowship Slots

Adobe Considers Sora, Pika and Runway AI for Premiere Pro

Adobe plans to add generative AI capabilities to its Premiere Pro editing platform and is exploring the update with third-party AI technologies including OpenAI’s Sora, as well as models from Runway and Pika Labs, making it easier “to draw on the strengths of different models” within everyday workflows, according to Adobe. Editors will gain the ability to generate and add objects into scenes or shots, remove unwanted elements with a click, and even extend frames and footage length. The company is also developing a video model for its own Firefly AI for video and audio work in Premiere Pro. Continue reading Adobe Considers Sora, Pika and Runway AI for Premiere Pro

Pimax Intros VR Headset with Switchable QLED, OLED Panels

Virtual reality firm Pimax has unveiled two new headsets. The Crystal Super is a high-resolution performance model which starts at $1,800, while the Crystal Light will carry a base list of $700. The Crystal Super packs 29.5 million pixels and allows users to swap between QLED and micro-OLED panels, which Pimxax claims is a first. The Crystal Light offers the same 16.6 million pixels as its Crystal predecessor, but at a more affordable price. At its annual Frontier virtual event, Pimax also shared the specs for its 60G Airlink module, designed for high-fidelity wireless PCVR using WiGig technology. Continue reading Pimax Intros VR Headset with Switchable QLED, OLED Panels

Sony Rolls Out Brighter, Better-Sounding Bravia TVs for 2024

Sony’s new line of Bravia televisions focuses on MiniLED display tech with the high-end Bravia 9. There is also the OLED-based Bravia 8, and the company is keeping 2023’s A95L QD-OLED in the mix. But the spotlight is in the LED backlighting system that Sony has spent several years refining, XR Backlight Master Drive, which can assert precise control over each pixel. Sony says the technology is comparable to the underpinnings of its professional mastering monitors. The XR Backlight Master Drive system allocates LED resources using purpose-built silicon created by Sony for its MiniLED TVs. Continue reading Sony Rolls Out Brighter, Better-Sounding Bravia TVs for 2024

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

IAB: U.S. Digital Advertising Hit Record $225 Billion Last Year

Internet advertising revenues hit a record $225 billion in the U.S. in 2023, a 7.3 percent increase, according to a PwC report for the Interactive Advertising Bureau (IAB). The connected TV and audio categories saw double-digit growth, as did spending on e-commerce platforms, classified “retail media,” which rose 16.3 percent year-over-year, reaching $43.7 billion in 2023 as key players expanded their ad inventory. Video advertising revenue climbed 10.6 percent year-over-year, to $52.1 billion, with 42 percent of that revenue generated from CTV and OTT streaming. Continue reading IAB: U.S. Digital Advertising Hit Record $225 Billion Last Year

Audio-First Social Platform Airchat Has Successful Relaunch

Airchat is the latest app to take tech leaders in Silicon Valley by storm. Described as a “combination of voice notes and Twitter,” Airchat lets you follow other users and scroll through posts — adding replies, likes and shares — but the twist is the content is generated through audio recordings the app then transcribes. Airchat ranked 27th on the App Store’s social networking chart, even though users must be invited to join. Launched last year by Naval Ravikant, founder of AngelList, and erstwhile Tinder product exec Brian Norgard, Airchat was just relaunched on iOS and Android. Continue reading Audio-First Social Platform Airchat Has Successful Relaunch

Google Offers Public Preview of Gemini Pro for Cloud Clients

Google is moving its most powerful artificial intelligence model, Gemini 1.5 Pro, into public preview for developers and Google Cloud customers. Gemini 1.5 Pro includes what Google claims is a breakthrough in long context understanding, with the ability to run 1 million tokens of information “opening up new possibilities for enterprises to create, discover and build using AI.” Gemini’s multimodal capabilities allow it to process audio, video, text, code and more, which when combined with long context, “enables enterprises to do things that just weren’t possible with AI before,” according to Google. Continue reading Google Offers Public Preview of Gemini Pro for Cloud Clients

Deepgram’s Speech Portfolio Now Includes Human-Like Aura

Deepgram’s new Aura software turns text into generative audio with a “human-like voice.” The 9-year-old voice recognition company has raised nearly $86 million to date on the strength of its Voice AI platform. Aura is an extremely low-latency text-to-speech voice AI that can be used for voice AI agents, the company says. Paired with Deepgram’s Nova-2 speech-to-text API, developers can use it to “easily (and quickly) exchange real-time information between humans and LLMs to build responsive, high-throughput AI agents and conversational AI applications,” according to Deepgram. Continue reading Deepgram’s Speech Portfolio Now Includes Human-Like Aura