arXiv Archives - ETCentric

ByteDance’s Goku Video Model Is Latest in Chinese AI Streak

By Paula Parisi
February 24, 2025

Barely two weeks after the launch of its OmniHuman-1 AI model, ByteDance has released Goku, a new artificial intelligence designed to create photorealistic video featuring humanoid actors. Goku uses text prompts to create among other things, realistic product videos without the need for human actors. This last is a boon for ByteDance social media unit TikTok. Goku is open source, trained on a large dataset of roughly 36 million video-text pairs and 160 million image-text pairs. Goku’s debut is received as more bad news for OpenAI in the form of added competition, but a positive step for global enterprise. Continue reading ByteDance’s Goku Video Model Is Latest in Chinese AI Streak

Researchers Create AI Technique to Generate Video Captions

By Debra Kaufman
February 13, 2020

Researchers at Microsoft Research Asia and the Harbin Institute of Technology have come up with a new technique to use artificial intelligence to generate live video captions. In the past, technologists have used encoder-decoder models, but didn’t model the interaction between videos and comments, resulting in mainly irrelevant comments. The new technique — based on a model that iteratively learns to capture the representations of audio, video and comments — outperforms current methods, according to the research team. Continue reading Researchers Create AI Technique to Generate Video Captions

Study’s Fantasy Text-Based Game Tests AI Agents’ Abilities

By Debra Kaufman
March 12, 2019

Facebook AI Research, the Lorraine Research Laboratory in Computer Science and its Applications (LORIA), and University College London recently conducted a study to determine if AI can navigate a fantasy text-based game, dubbed “LIGHT.” To examine the AI agents’ comprehension of the virtual world, the study investigated the so-called grounding dialogue, comprised of mutual knowledge, beliefs and assumptions allowing communication between two people. The large-scale, crowdsourced “LIGHT” environment allows AI and humans to interact. Continue reading Study’s Fantasy Text-Based Game Tests AI Agents’ Abilities

Bell Labs: Lensless Camera Records Multiple View Images

By Rob Scott
June 26, 2013

Bell Labs is developing a new class of imaging device that does not require a lens, but instead uses a light sensitive sensor to create a high resolution image. A new technique known as compressive sensing minimizes redundancy to acquire data with carefully chosen measurements. The camera, which merely features an aperture assembly and a sensor, records images that are never out of focus. Additionally, when using two pixels instead of one, it can create two different images of the scene. Continue reading Bell Labs: Lensless Camera Records Multiple View Images