By
Paula ParisiFebruary 5, 2025
Most people know Hugging Face as a resource-sharing community, but it also builds open-source applications and tools for machine learning. Its recent release of vision-language models small enough to run on smartphones while outperforming competitors that rely on massive data centers is being hailed as “a remarkable breakthrough in AI.” The new models — SmolVLM-256M and SmolVLM-500M — are optimized for “constrained devices” with less than around 1GB of RAM, making them ideal for mobile devices including laptops and also convenient for those interested in processing large amounts of data cheaply and with a low-energy footprint. Continue reading Hugging Face Has Developed Tiny Yet Powerful Vision Models
By
Paula ParisiJanuary 24, 2025
Nvidia is hoping interest in artificial intelligence will translate to consumer sales of a relatively low-priced computer optimized for basic AI functionality. Last month, the company upgraded its Jetson line with a $249 “compact AI supercomputer,” the Jetson Orin Nano Super Developer Kit. At half the price of the original, the model aims to attract students, developers, hobbyists, small- and medium-sized businesses, and anyone who is AI curious. “As the AI world is moving from task-specific models into foundation models, it provides an accessible platform to transform ideas into reality,” according to Nvidia. Continue reading Nvidia Targets Consumers with $249 Compact Supercomputer
By
Paula ParisiDecember 18, 2024
Meta has added new features to Ray-Ban Metas in time for the holidays via a firmware update that make the smart glasses “the gift that keeps on giving,” per Meta marketing. “Live AI” adds computer vision, letting Meta AI see and record what you see “and converse with you more naturally than ever before.” Along with Live AI, Live Translation is available for Meta Early Access members. Translation of Spanish, French or Italian will pipe through as English (or vice versa) in real time as audio in the glasses’ open-ear speakers. In addition, Shazam support is added for users interested in easily identifying songs. Continue reading Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam
By
Paula ParisiDecember 9, 2024
OpenAI has launched ChatGPT Pro, a $200 per month subscription plan that provides unlimited access to the full version of o1, its new large reasoning model, and all other OpenAI models. The toolkit includes o1-mini, GPT-4o and Advanced Voice. It also includes the new o1 pro mode, “a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems,” OpenAI explains, describing the high-end subscription plan as a path to “research-grade intelligence” for a way for scientists, engineers, enterprise, academics and others who use AI to accelerate productivity. Continue reading OpenAI Announces $200 Monthly Subscription for ChatGPT Pro
By
Paula ParisiDecember 9, 2024
Microsoft has launched a new AI-powered feature for its Edge Browser. Copilot Vision is now in preview for a limited number of U.S. Copilot Pro subscribers by opt-in through Copilot Labs. With user permission, Copilot Vision “sees” what is onscreen and can respond to questions about text and images, explains the company. Calling Copilot Vision “the first AI experience of its kind,” Microsoft suggests the experience is “almost like having a second set of eyes as you browse,” adding that when users turn on Copilot Vision it will “instantly scan, analyze, and offer insights based on what it sees.” Continue reading Microsoft Previews AI-Powered Copilot Vision for Edge Browser
By
Paula ParisiNovember 6, 2024
Nvidia’s growing AI arsenal now includes video search and summarization tool AI Blueprint, which helps developers build visual AI agents that analyze video and image content. The agents can answer user questions, generate summaries and even enable alerts for specific scenarios. The new feature is part of Metropolis, Nvidia’s developer toolkit for building computer vision applications using generative AI. Globally, enterprises and public organizations increasingly rely on visual information. Cameras, IoT sensors and autonomous vehicles are ingesting visual data at high rates, and visual agents can help monitor and make sense of that workflow. Continue reading Nvidia’s AI Blueprint Develops Agents to Analyze Visual Data
By
Paula ParisiOctober 29, 2024
In its first week of public beta, Anthropic’s “Computer Use” feature is gaining immediate traction, helping people do research and complete coding tasks. Claude works autonomously in Computer Use mode, suggesting broad implications for future productivity and workforce goals. Coming on the heels of OpenAI’s Swarm framework, these early forays into independent AI assistants seem to indicate that implementing such systems will be an area of focus for businesses in 2025. Claude can “see” what’s onscreen and use its “judgment” to adapt to different tasks, segueing across workflows and software. Continue reading Anthropic’s AI Agents for Claude Sonnet Increase Productivity
By
Paula ParisiOctober 4, 2024
Nvidia has unveiled the NVLM 1.0 family of multimodal LLMs, a powerful open-source AI that the company says performs comparably to proprietary systems from OpenAI and Google. Led by NVLM-D-72B, with 72 billion parameters, Nvidia’s new entry in the AI race achieved what the company describes as “state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models.” Nvidia has made the model weights publicly available and says it will also be releasing the training code, a break from the closed approach of OpenAI, Anthropic and Google. Continue reading Nvidia Releases Open-Source Frontier-Class Multimodal LLMs
By
Paula ParisiSeptember 25, 2024
Alibaba Cloud last week globally released more than 100 new open-source variants of its large language foundation model, Qwen 2.5, to the global open-source community. The company has also revamped its proprietary offering as a full-stack AI-computing infrastructure across cloud products, networking and data center architecture, all aimed at supporting the growing demands of AI computing. Alibaba Cloud’s significant contribution was revealed at the Apsara Conference, the annual flagship event held by the cloud division of China’s e-retail giant, often referred to as the Chinese Amazon. Continue reading Alibaba Cloud Ups Its AI Game with 100 Open-Source Models
By
Paula ParisiSeptember 5, 2024
China’s largest cloud computing company, Alibaba Cloud, has released a new computer vision model, Qwen2-VL, which the company says improves on its predecessor in visual understanding, including video comprehension and text-to-image processing in languages including English, Japanese, French, Spanish, Chinese and others. The company says it can analyze videos of more than 20 minutes in length and is able to respond appropriately to questions about content. Third-party benchmark tests compare Qwen2-VL favorably to leading competitors and the company is releasing two open-source versions with a larger private model to come. Continue reading Alibaba’s Latest Vision Model Has Advanced Video Capability
By
Paula ParisiAugust 9, 2024
Robotics startup Figure AI — with investors including OpenAI, Nvidia and Microsoft — has released its next-gen humanoid, Figure 02. Its predecessor made a splash earlier this year with a demo that captured it conversing with an interlocutor as it organized household items and prepared a snack. Compared to the Figure 01 prototype, with exposed wiring and limited range of motion, Figure 02 is more polished. The latest iteration boasts skeletal improvements for heavier lifting as well as enhanced visual reasoning to assist with machine learning. The result is characterized as “a major leap” in AI-powered robotics, a category in which players include Tesla and 1X Technologies. Continue reading Humanoid Robot Figure 02 Touts Better Strength, Reasoning
By
Paula ParisiAugust 5, 2024
OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall. Continue reading OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus
By
Paula ParisiJuly 9, 2024
San Francisco-based optics company Solos has debuted its latest smart glasses, the Solos AirGo Vision, which offer a camera that takes photos and provides computer vision, and integrates OpenAI’s GPT-4o. The AirGo Vision can provide real-time information using visual input, recognizing people, objects and places, and providing information such as directions or instructions. Both the camera and AI functionality are hands-free, making the AirGo Vision “especially convenient for visual progress and next steps on activities like cooking, home improvement projects, education and studies, and even shopping,” the company explains. Continue reading Solos AirGo Vision Smart Glasses Tout a Camera and GTP-4o
By
Paula ParisiJune 21, 2024
Anthropic has launched a powerful new AI model, Claude 3.5 Sonnet, that can analyze text and images and generate text. That its release comes a mere three months after Anthropic debuted Claude 3 indicates just how quickly the field is developing. The Google-backed company says Claude 3.5 Sonnet has set “new industry benchmarks for graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval).” Sonnet is Anthropic’s mid-tier model, between Haiku and, on the high-end, Opus. Anthropic says 3.5 Sonnet is twice as fast as 3 Opus, offering “frontier intelligence at 2x the speed.” Continue reading Anthropic’s Claude 3.5: ‘Frontier Intelligence at 2x the Speed’
By
Paula ParisiJune 14, 2024
Northern California startup Luma AI has released Dream Machine, a model that generates realistic videos from text prompts and images. Built on a scalable and multimodal transformer architecture and “trained directly on videos,” Dream Machine can create “action-packed scenes” that are physically accurate and consistent, says Luma, which has a free version of the model in public beta. Dream Machine is what Luma calls the first step toward “a universal imagination engine,” while others are calling it “powerful” and “slammed with traffic.” Though Luma has shared scant details, each posted sequence looks to be about 5 seconds long. Continue reading Luma AI Dream Machine Video Generator in Free Public Beta