ElevenLabs Promotes Its Latest Advances in AI Audio Effects

“What if you could describe a sound and generate it with AI?,” asks startup ElevenLabs, which set out to do just that, and says it has succeeded. The two-year-old company explains it “used text prompts like ‘waves crashing,’ ‘metal clanging,’ ‘birds chirping,’ and ‘racing car engine’ to generate audio.” Best known for using machine learning to clone voices, the AI firm founded by Google and Palantir alums has yet to make publicly available its new text-to-sound model but began teasing it by releasing online demos this week. Some see the technology as a natural complement to the latest wave of image generators.

“We were blown away by the Sora announcement but felt it needed something,” an ElevenLabs “coming soon” blog post reads, referring to the new OpenAI Sora text-to-image generator that created a media sensation last week. ElevenLabs has posted some demo clips on X that map its own text-to-sound effects over visuals from OpenAI’s Sora announcement.

Previously, ElevenLabs’ research efforts seemed focused on “AI to make audio and video content — from movies to podcasts — accessible across languages and geographies,” VentureBeat writes, noting that “the company has debuted a range of offerings to further this, including text-to-speech and speech-to-speech models that can produce AI speech from a given piece of content (text/audio/video) in 29 different languages whilst delivering natural voice and emotions (original speaker’s voice in speech-to-speech).”

Lauding AI-generated imaging tools from companies including Runway, Pika and OpenAI, VentureBeat agrees that ElevenLabs’ new model completes the picture, so to speak, by providing what they lack: default audio, “allowing users to produce sound effects for their content by describing what they want,” enhancing their work.

Right now, ElevenLabs is only offering a sign-up sheet (on the blog post), with no projected beta test or release date shared, but the company wanted to show off its development activities in light of the rapid pace of advancement in the AI text-to-image field, ElevenLabs’ Luke Harries shared on a subsequent X post.

“If you thought Sora was impressive, now watch it with AI generated sound from ElevenLabs,” urges Tom’s Guide, writing that the UK-based ElevenLabs “was founded in 2022 and is seen as producing the most realistic synthetic voices, generating speech that is close enough to natural to be almost undetectable.”

The technology is already being used to create deepfakes that emulate politicians and market various products. Earlier this month, the FCC banned phone campaigns leveraging artificial intelligence.

In January, ElevenLabs was valued at $1 billion after an $80 million funding round co-led by investors including Andreessen Horowitz and former GitHub CEO Nat Friedman.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.