Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

Alibaba Cloud has released Qwen2.5-Omni-7B, a new AI model the company claims is efficient enough to run on edge devices like mobile phones and laptops. Boasting a relatively light 7-billion parameter footprint, Qwen2.5-Omni-7B understands text, images, audio and video and generates real-time responses in text and natural speech. Alibaba says its combination of compact size and multimodal capabilities is “unique,” offering “the perfect foundation for developing agile, cost-effective AI agents that deliver tangible value, especially intelligent voice applications.” One example would be using a phone’s camera to help a vision impaired-person navigate their environment. Continue reading Alibaba’s Powerful Multimodal Qwen Model Is Built for Mobile

OpenAI Delivers Native GPT-4o Image Generator to ChatGPT

OpenAI has activated the multimodal image generation capabilities of GPT-4o, making it available to ChatGPT users on the Plus, Pro, Team and Free tiers. It replaces DALL-E 3 as the default image generator for the popular chatbot. GPT-4o’s accuracy with text, understanding of symbols and precision with prompts combined with well multimodal capabilities that allow the model to take cues from visual material have transformed its image capabilities from largely unpredictable to “consistent and context-aware,” resulting in “a practical tool with precision and power,” claims OpenAI. Continue reading OpenAI Delivers Native GPT-4o Image Generator to ChatGPT

Google Debuts Next-Gen Reasoning Models with Gemini 2.5

Google has released what it calls its most intelligent AI model yet, Gemini 2.5. The first 2.5 model release, an experimental version of Gemini 2.5 Pro, is a next-gen reasoning model that Google says outperformed OpenAI o3-mini and Claude 3.7 Sonnet from Anthropic on common benchmarks “by meaningful margins.” Gemini 2.5 models “are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy,” according to Google. The new model comes just three months after Google released Gemini 2.0 with reasoning and agentic capabilities. Continue reading Google Debuts Next-Gen Reasoning Models with Gemini 2.5

Roblox Reveals Its Generative AI System Cube for 3D and 4D

San Mateo, California-based game developer Roblox has released a 3D object generator called Cube 3D, the first of several models the company plans to make available. Cube currently generates 3D models and environments from text, and in the future the company plans to add image inputs. Roblox says it is open-sourcing the tool, making it available to users on and off the platform. Cube will serve as the core generative AI system for Roblox’s 3D and 4D plans, the latter referring to interactive responsiveness. The launch coincides with the Game Developers Conference, running through Friday in San Francisco. Continue reading Roblox Reveals Its Generative AI System Cube for 3D and 4D

Baidu Releases New LLMs that Undercut Competition’s Price

Baidu has launched two new AI systems, the native multimodal foundation model Ernie 4.5 and deep-thinking reasoning model Ernie X1. The latter supports features like generative imaging, advanced search and webpage content comprehension. Baidu is touting Ernie X1 as of comparable performance to another Chinese model, DeepSeek-R1, but says it is half the price. Both Baidu models are available to the public, including individual users, through the Ernie website. Baidu, the dominant search engine in China, says its new models mark a milestone in both reasoning and multimodal AI, “offering advanced capabilities at a more accessible price point.” Continue reading Baidu Releases New LLMs that Undercut Competition’s Price

Pinterest AI Labeling Policy Unveiled as Q4 Earnings Top $1B

Popular social media platform Pinterest is now labeling generative AI content. The app, which earned a reputation as fertile ground for design inspiration related to hand-crafted goods and human artistry, has recently been plagued by an onslaught of “AI slop,” something its regular users have been complaining of on Reddit and to Pinterest directly. The GenAI content was often used to redirect people to spammy sites, according to a recent report. Pinterest’s labeling news coincides with an earnings report of $1.15 billion in Q4 revenue, marking an 18 percent increase year-over-year. Continue reading Pinterest AI Labeling Policy Unveiled as Q4 Earnings Top $1B

Instagram Is Internally Testing Discord-Style Community Chat

Instagram is experimenting with a community chat feature that lets users gather in groups of up to 250. Meta’s photo- and video-sharing network is prototyping the feature internally, though external sources with knowledge of it are comparing it to Discord since it reportedly allows users to form chats around different topics and control who can join. While participation is said to be capped at 250 simultaneous users per community, all are invited to join and message. Instagram has had a flurry of new features, among them a video tool called Edits, and “profile cards” geared toward small businesses that want a more professional presence on the app. Continue reading Instagram Is Internally Testing Discord-Style Community Chat

Flora Is a New AI Interface Geared Toward Helping Creatives

Flora is a new software interface built by AI creatives for creative AI applications. Much like Apple reinvented the personal computer UI to make it feel natural for people who were not IT specialists, Flora aims to reframe the way designers and artists interact with generative AI. “AI tools make it easy to create, but lack creative control,” the startup’s founder Weber Wong says, opining that such tools have proven “great for making AI slop, but not for doing great creative work.” Wong’s goal is to make an AI interface everyone will find comfortable and intuitive, simplifying use and curating “the best text, image, and video models.” Continue reading Flora Is a New AI Interface Geared Toward Helping Creatives

Highly Realistic Alibaba GenVid Models Are Available for Free

Alibaba has open-sourced its Wan 2.1 video- and image-generating AI models, heating up an already competitive space. The Wan 2.1 family, which has four models, is said to produce “highly realistic” images and videos from text and images. The company has since December been previewing a new reasoning model, QwQ-Max, indicating it will be open-sourced when fully released. The move comes after another Chinese AI company, DeepSeek, released its R1 reasoning model for free download and use, triggering demand for more open-source artificial intelligence. Continue reading Highly Realistic Alibaba GenVid Models Are Available for Free

Snap Develops a Lightweight Text-to-Video AI Model In-House

Snap has created a lightweight AI text-to-image model that will run on-device, expected to power some Snapchat mobile features in the months ahead. Using an iPhone 16 Pro Max, the model can produce high-resolution images in approximately 1.4 seconds, running on the phone, which reduces computational costs. Snap says the research model “is the continuation of our long-term investment in cutting edge AI and ML technologies that enable some of today’s most advanced interactive developer and consumer experiences.” Among the Snapchat AI features the new model will enhance are AI Snaps and AI Bitmoji Backgrounds. Continue reading Snap Develops a Lightweight Text-to-Video AI Model In-House

ByteDance’s AI Model Can Generate Video from Single Image

ByteDance has developed a generative model that can use a single photo to generate photorealistic video of humans in motion. Called OmniHuman-1, the multimodal system supports various visual and audio styles and can generate people doing things like singing, dancing, speaking and moving in a natural fashion. ByteDance says its new technology clears hurdles that hinder existing human-generators — obstacles like short play times and over-reliance on high-quality training data. The diffusion transformer-based OmniHuman addressed those challenges by mixing motion-related conditions into the training phase, a solution ByteDance researchers claim is new. Continue reading ByteDance’s AI Model Can Generate Video from Single Image

Cloudflare Joins CAI, Adds C2PA Image Authenticity Protocol

Cloudflare is making it easier to assess the authenticity of online images by adopting the Content Credentials system advanced by Adobe and embraced by many others. Images hosted using Cloudflare now integrate Content Credentials, ensuring metadata remains intact. The platform tracks ownership and subsequent modifications, including whether artificial intelligence was used to edit the images. With touchpoints to an estimated 20 percent of Internet traffic, connectivity firm Cloudflare substantively expands the reach of the Content Authenticity Initiative (CAI), founded in 2019. Continue reading Cloudflare Joins CAI, Adds C2PA Image Authenticity Protocol

Hugging Face Has Developed Tiny Yet Powerful Vision Models

Most people know Hugging Face as a resource-sharing community, but it also builds open-source applications and tools for machine learning. Its recent release of vision-language models small enough to run on smartphones while outperforming competitors that rely on massive data centers is being hailed as “a remarkable breakthrough in AI.”  The new models — SmolVLM-256M and SmolVLM-500M — are optimized for “constrained devices” with less than around 1GB of RAM, making them ideal for mobile devices including laptops and also convenient for those interested in processing large amounts of data cheaply and with a low-energy footprint. Continue reading Hugging Face Has Developed Tiny Yet Powerful Vision Models

DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Facebook, Instagram, WhatsApp Get Meta AI Memory Boost

Meta is rolling out personalization updates to its Meta AI personal assistant. At the end of last year, the company introduced a feature that lets Meta AI remember what you’ve shared with it in one-on-one chats on WhatsApp and Messenger so it could produce more relevant responses. That feature will now be available to Meta AI on Facebook, Messenger and WhatsApp for iOS and Android in the U.S. and Canada. “Meta AI will only remember certain things you tell it in 1:1 conversations (not group chats), and you can delete its memories at any time,” explains the company. Continue reading Facebook, Instagram, WhatsApp Get Meta AI Memory Boost