Character.AI Introduces New Video Generator in Closed Beta

Character.AI, a platform offering AI chatbots for socializing and role play, has released a video generation model called AvatarFX in closed beta. Promising the ability to make photorealistic images “come to life — speak, sing and emote — all with the click of a button,” the technology combines audio and video to create a variety of visual style and voice, from realistic 3D — including “non-human faces (like a favorite pet)” — to 2D animations, according to the company. AvatarFX also has the ability “to maintain strong temporal consistency with face, hand and body movement” and can “power videos with multiple speakers.”

“AvatarFX distinguishes itself from competitors like OpenAI’s Sora because it isn’t solely a text-to-video generator. Users can also generate videos from preexisting images, allowing users to animate photos of real people,” reports TechCrunch.

A demo video embedded in a Character.AI blog post previews a wide range of capabilities, from anime to sci-fi, glossy high-fashion to biologically human. The company did not disclose resolution and length parameters, but the footage provided is so convincing that one of TechCrunch’s first reactions is “how this kind of tech could be leveraged for abuse” involving deepfakes of celebrities or people that users “know in real life.”

Character.AI says its AvatarFX video will be watermarked so people know it’s not “real.” Additionally, it plans to block the generation of video featuring minors and will run images of recognizable people through a filter that changes details to obscure their identity, according to TechCrunch, which concludes that “since AvatarFX is not widely available yet, there is no way to verify how well these safeguards work.”

Character.AI explains in the blog post how its flow-based models utilize a “parameter-efficient training pipeline” to generate “realistic lip, head, and body movement based on an audio sequence.”

Gadgets 360 reports “the base model was built from the ground up using the Document Image Transformer (DiT) architecture, which is a transformer encoder model,” and uses “a new inference strategy” that supports consistency, detail and expressiveness “even in longer duration videos.”

“The voices in these videos are generated by Character.AI’s own text-to-speech (TTS) technology, making conversations and singing sound natural and smooth,” writes Mint.

Those who want to try it can apply for early access, though CAI warns this phase is “currently limited to a small group of creators.”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.