By
Paula ParisiFebruary 28, 2025
Alibaba has open-sourced its Wan 2.1 video- and image-generating AI models, heating up an already competitive space. The Wan 2.1 family, which has four models, is said to produce “highly realistic” images and videos from text and images. The company has since December been previewing a new reasoning model, QwQ-Max, indicating it will be open-sourced when fully released. The move comes after another Chinese AI company, DeepSeek, released its R1 reasoning model for free download and use, triggering demand for more open-source artificial intelligence. Continue reading Highly Realistic Alibaba GenVid Models Are Available for Free
By
Paula ParisiFebruary 7, 2025
Snap has created a lightweight AI text-to-image model that will run on-device, expected to power some Snapchat mobile features in the months ahead. Using an iPhone 16 Pro Max, the model can produce high-resolution images in approximately 1.4 seconds, running on the phone, which reduces computational costs. Snap says the research model “is the continuation of our long-term investment in cutting edge AI and ML technologies that enable some of today’s most advanced interactive developer and consumer experiences.” Among the Snapchat AI features the new model will enhance are AI Snaps and AI Bitmoji Backgrounds. Continue reading Snap Develops a Lightweight Text-to-Video AI Model In-House
By
Paula ParisiFebruary 6, 2025
ByteDance has developed a generative model that can use a single photo to generate photorealistic video of humans in motion. Called OmniHuman-1, the multimodal system supports various visual and audio styles and can generate people doing things like singing, dancing, speaking and moving in a natural fashion. ByteDance says its new technology clears hurdles that hinder existing human-generators — obstacles like short play times and over-reliance on high-quality training data. The diffusion transformer-based OmniHuman addressed those challenges by mixing motion-related conditions into the training phase, a solution ByteDance researchers claim is new. Continue reading ByteDance’s AI Model Can Generate Video from Single Image
By
Paula ParisiFebruary 5, 2025
Cloudflare is making it easier to assess the authenticity of online images by adopting the Content Credentials system advanced by Adobe and embraced by many others. Images hosted using Cloudflare now integrate Content Credentials, ensuring metadata remains intact. The platform tracks ownership and subsequent modifications, including whether artificial intelligence was used to edit the images. With touchpoints to an estimated 20 percent of Internet traffic, connectivity firm Cloudflare substantively expands the reach of the Content Authenticity Initiative (CAI), founded in 2019. Continue reading Cloudflare Joins CAI, Adds C2PA Image Authenticity Protocol
By
Paula ParisiFebruary 5, 2025
Most people know Hugging Face as a resource-sharing community, but it also builds open-source applications and tools for machine learning. Its recent release of vision-language models small enough to run on smartphones while outperforming competitors that rely on massive data centers is being hailed as “a remarkable breakthrough in AI.” The new models — SmolVLM-256M and SmolVLM-500M — are optimized for “constrained devices” with less than around 1GB of RAM, making them ideal for mobile devices including laptops and also convenient for those interested in processing large amounts of data cheaply and with a low-energy footprint. Continue reading Hugging Face Has Developed Tiny Yet Powerful Vision Models
By
Paula ParisiJanuary 30, 2025
Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro
By
Paula ParisiJanuary 29, 2025
Meta is rolling out personalization updates to its Meta AI personal assistant. At the end of last year, the company introduced a feature that lets Meta AI remember what you’ve shared with it in one-on-one chats on WhatsApp and Messenger so it could produce more relevant responses. That feature will now be available to Meta AI on Facebook, Messenger and WhatsApp for iOS and Android in the U.S. and Canada. “Meta AI will only remember certain things you tell it in 1:1 conversations (not group chats), and you can delete its memories at any time,” explains the company. Continue reading Facebook, Instagram, WhatsApp Get Meta AI Memory Boost
By
Douglas ChanJanuary 8, 2025
The Eureka Park section at CES 2025 in Las Vegas is an exhibition area dedicated to thousands of startups and early-stage products from across the globe. Our reporting team visited the space organized specifically for Japanese startups and discovered a few that are developing innovative technologies that could potentially be applied to 3D computer graphics modeling, XR, and gaming. Among the standouts were Tokyo-based CalTa that developed the digital twin platform Trancity — and Japanese telecom giant NTT Docomo’s exhibit of its ongoing Feel Tech system. Continue reading CES: Japanese Startups Showcase 3D Modeling, XR, Gaming
By
Paul BennunJanuary 7, 2025
Israeli startup PxE Holographic Imaging has developed a drop-in replacement sensor for any camera that holographically captures depth information without lidar or other hardware. Or more specifically, it augments any existing sensor with this capability, so any existing sensor OEM’s product can be adapted. Imagine face ID without an IR projector and sensor, your videoconference camera able to send a 3D image, or volumetric capture suddenly becoming more affordable. Extraordinarily, the physics appears to check out, and PxE demonstrated the technology to us at short- and room-size range in their CES suite at The Venetian Las Vegas. Continue reading CES: PxE Develops Camera Sensor That Captures Depth Info
By
Paula ParisiDecember 18, 2024
Attempting to stay ahead of OpenAI in the generative video race, Google announced Veo 2, which it says can output 4K clips of two-minutes-plus at 4096 x 2160 pixels. Competitor Sora can generate video of up to 20 seconds at 1080p. However, TechCrunch says Veo 2’s supremacy is “theoretical” since it is currently available only through Google Labs’ experimental VideoFX platform, which is limited to videos of up to 8-seconds at 720p. VideoFX is also waitlisted, but Google says it will expand access this week (with no comment on expanding the cap). Continue reading Veo 2 Is Unveiled Weeks After Google Debuted Veo in Preview
By
Paula ParisiDecember 18, 2024
Pika Labs has updated its generative video model, Pika 2.0 adding more user control and customizability, the company says. Improvements include better “text alignment,” making it easier to have the AI follow through with intricate prompts. Enhanced motion rendering is said to deliver more “naturalistic movement” and better physics, including greater believability in transformations that tend toward the surreal, which has typically been a challenge for genAI tools. The biggest change may be “Scene Ingredients,” which lets users add their own images when building Pika-generated videos. Continue reading Pika 2.0 Video Generator Adds Character Integration, Objects
By
Paula ParisiDecember 18, 2024
Elon Musk’s xAI has been rolling out an updated Grok-2 model that is now available free to all users of the X social platform. Prior to last week, the “unfiltered” chatbot — which debuted in November 2023 — was available only by paid subscription. Now Grok is coming to X’s masses, but those on the free tier can only ask the chatbot 10 questions in two hours, while Premium and Premium+ users will “get higher usage limits and will be the first to access any new capabilities.” There is also now a Grok button featured on X that aims to encourage exploration. Continue reading Grok-2 Chatbot Is Now Available Free to All Users of X Social
By
Paula ParisiDecember 12, 2024
Ten months after its preview, OpenAI has officially released a Sora video model called Sora Turbo. Described as “hyperrealistic,” Sora Turbo generates clips of 10 to 20 seconds from text or image inputs. It outputs video in widescreen, vertical or square aspect ratios at resolutions from 480p to 1080p. The new product is being made available to ChatGPT Plus and Pro subscribers ($20 and $200 per month, respectively) but is not yet included with ChatGPT Team, Enterprise, or Edu plans, or available to minors. The company explains that Sora videos contain C2PA metadata indicating that they were generated by AI. Continue reading OpenAI Releases Sora, Adding It to ChatGPT Plus, Pro Plans
By
Paula ParisiDecember 12, 2024
World Labs, the AI startup co-founded by Stanford AI pioneer Fei-Fei Li, has debuted a “spatial intelligence” system that can generate 3D worlds from a single image. Although the output is not photorealistic, the tech could be a breakthrough for animation companies and video game developers. Deploying what it calls Large World Models (LWMs), World Labs is focused on transforming 2D images into turnkey 3D environments with which users can interact. Observers say that reciprocity is what sets World Labs’ technology apart from offerings by other AI companies that transform 2D to 3D. Continue reading World Labs AI Lets Users Create 3D Worlds from Single Photo
By
Paula ParisiNovember 6, 2024
Nvidia’s growing AI arsenal now includes video search and summarization tool AI Blueprint, which helps developers build visual AI agents that analyze video and image content. The agents can answer user questions, generate summaries and even enable alerts for specific scenarios. The new feature is part of Metropolis, Nvidia’s developer toolkit for building computer vision applications using generative AI. Globally, enterprises and public organizations increasingly rely on visual information. Cameras, IoT sensors and autonomous vehicles are ingesting visual data at high rates, and visual agents can help monitor and make sense of that workflow. Continue reading Nvidia’s AI Blueprint Develops Agents to Analyze Visual Data