D-ID Employs AI to Translate Videos into Multiple Languages

D-ID, a platform that uses AI to generate digital humans, has announced D-ID Video Translate in general availability. The tool lets businesses and content creators automatically re-voice videos in multiple languages, “cloning the speaker’s voice and adapting their lip movements from a single upload.” D-ID is making the Video Translate tool, which accommodates 30 different languages, free to D-ID subscribers for a limited time, available through the D-ID Studio or the company’s API. Languages include Arabic, Mandarin, Japanese, Hindi and Ukrainian, in addition to Spanish, German, French and Italian. Users can simultaneously translate content using bulk translation.

“A D-ID subscription starts at $56 per year for its cheapest plan and the smallest number of credits to use toward AI features and then goes up to $1,293 per year before shifting to enterprise pricing,” TechCrunch reports, pointing out that “the new AI video technology could help customers save on localization costs when scaling campaigns to a global audience in areas like marketing, entertainment, and social media.”

“As video content becomes increasingly central to digital communication, the importance of engaging with a multilingual audience has never been more significant,” is the perspective of Gil Perry, co-founder and CEO of D-ID, as shared in the company’s product announcement.

Thanks to artificial intelligence, automatic video translation has become a competitive field. Among those that have released such tools is YouTube, which made available a multi-language AI audio feature that helps creators connect with a broader audience. Spotify uses AI to allow creators to translate podcasts.

“Numerous companies also offer voice cloning or AI translation tools (or sometimes both), including those from Descript, ElevenLabs, Speechify … to name a few, as well as tools that let you create videos using AI avatars that can speak dozens of languages, like those from HeyGen, DeepBrain AI and others,” TechCrunch points out.

Another company, Wav2Lip, makes it possible for startups to build generalized lip sync models using its open-source software.

TechCrunch says D-ID Video Translate is powered by Rosetta-1, the company’s proprietary model. The Israel-based D-ID was launched in 2017, and made a splash in early 2022 with a marketing-oriented generative video platform called Creative Reality Studio.

Video Translate “is currently available in beta on the web,” writes Gadgets 360, noting that the tool lets users “either upload a video file from their device or drag it from a cloud server.”

CTech says Video Translate works with videos of between 10 seconds and five minutes,” with output achieved in about five minutes for the outlet’s 30-second translation experiment.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.