Hugging Face Archives

Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

By Paula Parisi
September 24, 2025

Alibaba Cloud’s newest AI model, Qwen3-Omni-30B-A3B, has debuted with a splash. The Chinese company is touting it as “the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model.” While Qwen3-Omni can accept prompts of text, image, audio and video, it only outputs text and audio. Alibaba Cloud has released the three versions of Qwen3-Omni so users can select based on their needs, choosing between general multimodal capabilities, deep reasoning or specialized audio understanding. Alibaba has also developed an AI chip called T-Head that performs comparably to Nvidia’s H20. Continue reading Alibaba’s Qwen3-Omni AI Ingests Text, Images, Audio, Video

DeepSeek-V3.1 Offered with Improvements in Speed, Context

By Paula Parisi
August 21, 2025

This week, DeepSeek-V3.1 dropped on Hugging Face. Media outlets immediately began citing benchmark scores that rival proprietary systems from OpenAI and Anthropic for a system that is available via a permissive license, facilitating wide access. The 685-billion parameter Mixture-of-Experts (MoE) model has 37 billion active parameters and is designed for efficiency. It builds on DeepSeek-pioneered processes like multi-head latent attention (MLA) and multi-token prediction (MTP) to optimize inference, enabling high-performance computing on both enterprise servers loaded with H100 GPUs and consumer hardware like a Mac Studio or comparably powered PC. Continue reading DeepSeek-V3.1 Offered with Improvements in Speed, Context

Google Says New Gemma 3 Is Ideal for Mobile, Edge Devices

By Paula Parisi
August 18, 2025

Google has introduced a new ultra-light model called Gemma 3 270M ideal for smartphones and other on-device use cases. The open-source model is power-efficient and small enough to run locally in the absence of an Internet connection, as Google demonstrated in internal tests using a Pixel 9 Pro SoC. With just 270 million parameters, Gemma 3 270M is a fraction of the size of flagship LLMs, which typically have billions of parameters. While Google’s new model was not made for complex conversational use, it is “designed from the ground up for task-specific fine-tuning with strong instruction-following.” Continue reading Google Says New Gemma 3 Is Ideal for Mobile, Edge Devices

Hugging Face Opens Preorders on New ‘Reachy Mini’ Robots

By Paula Parisi
July 16, 2025

Software development platform Hugging Face is taking orders on Reachy Mini, a table-top robot that lets people use the latest AI models to develop, test, deploy, and share real-world AI applications from their desk. The tiny test subject is 11 inches at work and nine inches in sleep mode. Due to begin shipping later this summer, Reachy Mini comes in two configurations: a $299 Lite version that must be tethered to a computer running Mac or Linux OS (Windows coming soon) and a wireless $449 model that has a Raspberry Pi 5 single-board computer built-in. Continue reading Hugging Face Opens Preorders on New ‘Reachy Mini’ Robots

Google AI Edge Gallery App Runs Models Locally on Android

By Paula Parisi
June 4, 2025

Google has quietly released the AI Edge Gallery app, which lets users download models and run them locally without Internet connectivity. Available for Android and eventually on iOS, the experimental app is hosted on GitHub where it can be downloaded for free. Users can find compatible models capable of running on-device, like Google’s Gemma 3n, and run them offline to do things like generate images, get answers to questions, and write and edit code using the processor of supported smartphones. While locally running models aren’t as powerful as their cloud counterparts, they offer more privacy and can sometimes be faster. Continue reading Google AI Edge Gallery App Runs Models Locally on Android

DeepSeek’s New Update Heightens Rivalry with U.S. AI Firms

By Paula Parisi
June 3, 2025

DeepSeek-R1-0528 is here, and this latest iteration is generating almost as much stir as the initial open-source R1 reasoning model did in January. The Chinese startup, owned by quantitative analysis firm High-Flyer Capital, is touted by one media outlet as “near parity in reasoning capabilities with proprietary paid models such as OpenAI’s o3 and Google Gemini 2.5 Pro.” Promised are stronger capabilities in complex reasoning centered on math, science, business and coding, along with improved features for developers and researchers. As with the earlier release, the DeepSeek-R1-0528 is available under the MIT License, which supports commercial use and allows customization. Continue reading DeepSeek’s New Update Heightens Rivalry with U.S. AI Firms

Stability AI Releases a Fast Stereo Audio-Generator for Mobile

By Paula Parisi
May 20, 2025

Stability AI has released an AI model that generates stereo audio that is quick and lightweight enough for mobile devices. Called Stable Audio Open Small, the open-source model is the result of a collaboration between the AI startup and chipmaker Arm. While there are several AI-powered apps that generate audio — Suno and Udio among them — most rely on cloud processing, thus can’t be used offline. Stability says Stable Audio Open Small is also IP safe due to being trained entirely on audio from the royalty-free libraries Free Music Archive and Freesound. Continue reading Stability AI Releases a Fast Stereo Audio-Generator for Mobile

Lightricks LTXV Makes Video Generation Faster and Cheaper

By Paula Parisi
May 8, 2025

Lightricks, the company behind the Facetune and Videoleap apps, has released a new video model called LTX Video, or LTXV, that generates what the company describes as high-quality AI video at speeds up to 30 times faster than competing products, and does it using consumer-grade hardware. The open-source, 13-billion parameter model achieves such efficiency by utilizing an approach called multiscale rendering, which generates video in progressively detailed layers. The program can run on high-end laptops and standard desktop computers, opening up generative video to an audience beyond those who have access to enterprise equipment. Continue reading Lightricks LTXV Makes Video Generation Faster and Cheaper

Freepik Introduces a Responsibly Trained AI Image Generator

By Paula Parisi
May 2, 2025

Online graphic design platform Freepik, has unveiled F Lite, a text-to-image generator that the company says was trained only on licensed content, making it safe for commercial use. The 10 billion-parameter F Lite — currently available in two openly-licensed versions — was developed in partnership with Fal.ai, a San Francisco-based AI startup that uses a proprietary inference engine and APIs to enable fast training, inference, and scaling of image, video, audio, and multimodal AI models. Freepik Head of AI Iván de Prado describes F Lite as “a significant milestone in open, responsible AI.” Continue reading Freepik Introduces a Responsibly Trained AI Image Generator

Alibaba Touts Advance in Open-Source AI with Qwen3 Series

By Paula Parisi
April 30, 2025

China’s Alibaba Group has released a Qwen3 LLM series said to be at the leading edge of open-source models, nearly achieving the performance of proprietary models from AI competitors OpenAI and Google. Alibaba says Qwen3 offers improvements in reasoning, tool use, instruction following and multilingual abilities. The Qwen3 series features eight new models — two that are mixture-of-experts and six built on dense neural networks. Their sizes range from 600 million to 235 billion parameters. The size and scope of the Alibaba slate maintains China’s accelerated AI pace in the wake of DeepSeek’s game-changing debut. Continue reading Alibaba Touts Advance in Open-Source AI with Qwen3 Series

Researchers Debut Preview of DeepCoder Reasoning Model

By Paula Parisi
April 15, 2025

A new open-source code reasoning model called DeepCoder-14B-Preview has hit the market. Built atop DeepSeek-R1 and Qwen2.5 using reinforcement learning (RL), it aims to provide more flexibility by combining high-performance code generation with reasoning capabilities for real-world applications. Its performance is said to be comparable to OpenAI’s o3-mini, “but with a smaller footprint,” say its developers, the research-driven AI companies Together AI and Agentica. “We democratize the recipe for training a small model into a strong competitive coder,” explains Together AI. Continue reading Researchers Debut Preview of DeepCoder Reasoning Model

Alibaba Says Qwen Reasoning Model on Par with DeepSeek

By Paula Parisi
March 10, 2025

Alibaba is making AI news again, releasing another Qwen reasoning model, QwQ-32B, which was trained and scaled using reinforcement learning (RL). The Qwen team says it “has the potential to enhance model performance beyond conventional pretraining and post-training methods.” QwQ-32B, a 32 billion parameter model, “achieves performance comparable to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated),” Alibaba claims. While parameters refer to the total set of adjustable weights and biases in the model’s neural network, “activated” parameters are a subset used for a specific inference task, like generating a response. Continue reading Alibaba Says Qwen Reasoning Model on Par with DeepSeek

Highly Realistic Alibaba GenVid Models Are Available for Free

By Paula Parisi
February 28, 2025

Alibaba has open-sourced its Wan 2.1 video- and image-generating AI models, heating up an already competitive space. The Wan 2.1 family, which has four models, is said to produce “highly realistic” images and videos from text and images. The company has since December been previewing a new reasoning model, QwQ-Max, indicating it will be open-sourced when fully released. The move comes after another Chinese AI company, DeepSeek, released its R1 reasoning model for free download and use, triggering demand for more open-source artificial intelligence. Continue reading Highly Realistic Alibaba GenVid Models Are Available for Free

Muse Could Be a Gamechanger for Xbox Players, Developers

By Paula Parisi
February 21, 2025

Microsoft has unveiled a new AI model called Muse that can generate game visuals and controller actions and understands 3D space. The new model can create complex gameplay sequences with accurate physics and character behaviors. Classified by Microsoft as the first World and Human Action Model (WHAM), Muse was trained from over seven years’ worth of human gameplay data from the Xbox game “Bleeding Edge,” published by UK-based Microsoft Games subsidiary Ninja Theory. Muse can, in addition to game goals, provide research insights to support all sorts of creative use of generative AI, Microsoft says. Continue reading Muse Could Be a Gamechanger for Xbox Players, Developers

Meta Adds Indigenous Languages to Speech and Translation AI

By Paula Parisi
February 11, 2025

Meta is seeking to make AI more inclusive with a program to support underserved languages “and help bring their speakers into the digital conversation.” Meta’s Fundamental AI Research (FAIR) unit has teamed with UNESCO to launch the Language Technology Partner Program, which is looking for people who can provide more than 10 hours of speech recordings (with transcriptions) and chunks of written text (200+ sentences, with translation) in diverse languages. “Partners will work with our teams to help integrate these languages into AI-driven speech recognition and machine translation models, which when released will be open sourced,” Meta said. Continue reading Meta Adds Indigenous Languages to Speech and Translation AI