ByteDance’s Goku Video Model Is Latest in Chinese AI Streak
February 24, 2025
Barely two weeks after the launch of its OmniHuman-1 AI model, ByteDance has released Goku, a new artificial intelligence designed to create photorealistic video featuring humanoid actors. Goku uses text prompts to create among other things, realistic product videos without the need for human actors. This last is a boon for ByteDance social media unit TikTok. Goku is open source, trained on a large dataset of roughly 36 million video-text pairs and 160 million image-text pairs. Goku’s debut is received as more bad news for OpenAI in the form of added competition, but a positive step for global enterprise.
Goku’s launch has raised the question of whether shutting China out of the AI race with trade restrictions and the like is a viable strategy. It “reinforces that model ‘regulation’ won’t work as the U.S. might have envisioned,” writes Forbes, noting that “open-source software cannot be easily restricted by trade barriers.”
The Goku model family is documented in an arXiv research paper by engineers at ByteDance, which worked in collaboration with the University of Hong Kong. The model is posted at GitHub on ByteDance’s official Goku page, with dozens of demos posted at the official project page Saiyan World (a nod to the maga series “Dragon Ball”).
Goku T2V is the video model, which accepts both text and image input while the text-to-image model T2I appears limited to text input.
While ByteDance hasn’t publicly disclosed output specs, saying only that it is “high-quality,” Decoder says the video demo clips posted at Saiyan World “are all four-second clips at 24 fps in 720p resolution, each running about 2 seconds.”
ByteDance claims Goku achieved a VBench score of 84.85 for text-to-video (which would suggest potentially as high as 1080p [1920×1080] at full-scale output). As of this week Goku wasn’t yet listed on the community-maintained VBench Leaderboard.
The Goku model family features transformer architectures with 2B and 8B parameters. Tom’s Guide says 8 billion parameters “is incredibly small for this kind of quality,” adding that Goku “is specifically targeting the advertising market, based no doubt on its massive back catalog of TikTok videos and shopping experiences.”
ByteDance asserts Goku could reduce video production costs for advertisers by up to 99 percent, according to Decoder, suggesting this could potentially impact a part of the creator economy that relies on “significant” payments for UGC product videos.
Goku is the latest in a lively string of Chinese generative AI models. Earlier this month, ByteDance unveiled OmniHuman-1, while January was punctuated by DeepSeek’s R1 and Janus-Pro. In Q4, Alibaba released Qwen2, updated last month to Qwen 2.5-Max. In October, the Alibaba-backed MiniMax made waves by adding free text-to-video capabilities to its Hailuo model.
No Comments Yet
You can be the first to comment!
Leave a comment
You must be logged in to post a comment.