Computer Vision Archives

Google Live Gets Computer Vision Screen Sharing This Month

By Paula Parisi
March 6, 2025

Google plans to launch video- and screen-sharing capabilities for Gemini Live by the end of the month as part of the Gemini app on Android, according to discussions coming out of Mobile World Congress in Barcelona this week. Previewed a year ago as Project Astra, the new functionality will allow Gemini Live to accept a video stream captured in real time by the phone’s camera and answer questions about the feed in a conversational way, based on voice input, screen-sharing live videos with Gemini on mobile as Gemini 2.0 currently offers desktop users. Continue reading Google Live Gets Computer Vision Screen Sharing This Month

Amazon’s AI-Powered Alexa+ is Agentic with Computer Vision

By Paula Parisi
February 27, 2025

Over a year after teasing a next-gen Alexa virtual assistant, Amazon is releasing an AI-powered version called Alexa+. The new personal assistant can do things like order groceries for the household, facilitate event planning, manage smart home utilities and security, and, of course, shop online. “She’s smarter, more conversational, more capable,” according to Amazon SVP of Devices & Services Panos Panay. Strategically priced to entice the AI-curious into Amazon membership, Alexa+ costs $20 per month as a standalone service or comes free with Amazon Prime ($15 per month or $139 per year). Continue reading Amazon’s AI-Powered Alexa+ is Agentic with Computer Vision

Hugging Face Has Developed Tiny Yet Powerful Vision Models

By Paula Parisi
February 5, 2025

Most people know Hugging Face as a resource-sharing community, but it also builds open-source applications and tools for machine learning. Its recent release of vision-language models small enough to run on smartphones while outperforming competitors that rely on massive data centers is being hailed as “a remarkable breakthrough in AI.” The new models — SmolVLM-256M and SmolVLM-500M — are optimized for “constrained devices” with less than around 1GB of RAM, making them ideal for mobile devices including laptops and also convenient for those interested in processing large amounts of data cheaply and with a low-energy footprint. Continue reading Hugging Face Has Developed Tiny Yet Powerful Vision Models

Nvidia Targets Consumers with $249 Compact Supercomputer

By Paula Parisi
January 24, 2025

Nvidia is hoping interest in artificial intelligence will translate to consumer sales of a relatively low-priced computer optimized for basic AI functionality. Last month, the company upgraded its Jetson line with a $249 “compact AI supercomputer,” the Jetson Orin Nano Super Developer Kit. At half the price of the original, the model aims to attract students, developers, hobbyists, small- and medium-sized businesses, and anyone who is AI curious. “As the AI world is moving from task-specific models into foundation models, it provides an accessible platform to transform ideas into reality,” according to Nvidia. Continue reading Nvidia Targets Consumers with $249 Compact Supercomputer

Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam

By Paula Parisi
December 18, 2024

Meta has added new features to Ray-Ban Metas in time for the holidays via a firmware update that make the smart glasses “the gift that keeps on giving,” per Meta marketing. “Live AI” adds computer vision, letting Meta AI see and record what you see “and converse with you more naturally than ever before.” Along with Live AI, Live Translation is available for Meta Early Access members. Translation of Spanish, French or Italian will pipe through as English (or vice versa) in real time as audio in the glasses’ open-ear speakers. In addition, Shazam support is added for users interested in easily identifying songs. Continue reading Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam

OpenAI Announces $200 Monthly Subscription for ChatGPT Pro

By Paula Parisi
December 9, 2024

OpenAI has launched ChatGPT Pro, a $200 per month subscription plan that provides unlimited access to the full version of o1, its new large reasoning model, and all other OpenAI models. The toolkit includes o1-mini, GPT-4o and Advanced Voice. It also includes the new o1 pro mode, “a version of o1 that uses more compute to think harder and provide even better answers to the hardest problems,” OpenAI explains, describing the high-end subscription plan as a path to “research-grade intelligence” for a way for scientists, engineers, enterprise, academics and others who use AI to accelerate productivity. Continue reading OpenAI Announces $200 Monthly Subscription for ChatGPT Pro

Microsoft Previews AI-Powered Copilot Vision for Edge Browser

By Paula Parisi
December 9, 2024

Microsoft has launched a new AI-powered feature for its Edge Browser. Copilot Vision is now in preview for a limited number of U.S. Copilot Pro subscribers by opt-in through Copilot Labs. With user permission, Copilot Vision “sees” what is onscreen and can respond to questions about text and images, explains the company. Calling Copilot Vision “the first AI experience of its kind,” Microsoft suggests the experience is “almost like having a second set of eyes as you browse,” adding that when users turn on Copilot Vision it will “instantly scan, analyze, and offer insights based on what it sees.” Continue reading Microsoft Previews AI-Powered Copilot Vision for Edge Browser

Nvidia’s AI Blueprint Develops Agents to Analyze Visual Data

By Paula Parisi
November 6, 2024

Nvidia’s growing AI arsenal now includes video search and summarization tool AI Blueprint, which helps developers build visual AI agents that analyze video and image content. The agents can answer user questions, generate summaries and even enable alerts for specific scenarios. The new feature is part of Metropolis, Nvidia’s developer toolkit for building computer vision applications using generative AI. Globally, enterprises and public organizations increasingly rely on visual information. Cameras, IoT sensors and autonomous vehicles are ingesting visual data at high rates, and visual agents can help monitor and make sense of that workflow. Continue reading Nvidia’s AI Blueprint Develops Agents to Analyze Visual Data

Anthropic’s AI Agents for Claude Sonnet Increase Productivity

By Paula Parisi
October 29, 2024

In its first week of public beta, Anthropic’s “Computer Use” feature is gaining immediate traction, helping people do research and complete coding tasks. Claude works autonomously in Computer Use mode, suggesting broad implications for future productivity and workforce goals. Coming on the heels of OpenAI’s Swarm framework, these early forays into independent AI assistants seem to indicate that implementing such systems will be an area of focus for businesses in 2025. Claude can “see” what’s onscreen and use its “judgment” to adapt to different tasks, segueing across workflows and software. Continue reading Anthropic’s AI Agents for Claude Sonnet Increase Productivity

Nvidia Releases Open-Source Frontier-Class Multimodal LLMs

By Paula Parisi
October 4, 2024

Nvidia has unveiled the NVLM 1.0 family of multimodal LLMs, a powerful open-source AI that the company says performs comparably to proprietary systems from OpenAI and Google. Led by NVLM-D-72B, with 72 billion parameters, Nvidia’s new entry in the AI race achieved what the company describes as “state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models.” Nvidia has made the model weights publicly available and says it will also be releasing the training code, a break from the closed approach of OpenAI, Anthropic and Google. Continue reading Nvidia Releases Open-Source Frontier-Class Multimodal LLMs

Alibaba Cloud Ups Its AI Game with 100 Open-Source Models

By Paula Parisi
September 25, 2024

Alibaba Cloud last week globally released more than 100 new open-source variants of its large language foundation model, Qwen 2.5, to the global open-source community. The company has also revamped its proprietary offering as a full-stack AI-computing infrastructure across cloud products, networking and data center architecture, all aimed at supporting the growing demands of AI computing. Alibaba Cloud’s significant contribution was revealed at the Apsara Conference, the annual flagship event held by the cloud division of China’s e-retail giant, often referred to as the Chinese Amazon. Continue reading Alibaba Cloud Ups Its AI Game with 100 Open-Source Models

Alibaba’s Latest Vision Model Has Advanced Video Capability

By Paula Parisi
September 5, 2024

China’s largest cloud computing company, Alibaba Cloud, has released a new computer vision model, Qwen2-VL, which the company says improves on its predecessor in visual understanding, including video comprehension and text-to-image processing in languages including English, Japanese, French, Spanish, Chinese and others. The company says it can analyze videos of more than 20 minutes in length and is able to respond appropriately to questions about content. Third-party benchmark tests compare Qwen2-VL favorably to leading competitors and the company is releasing two open-source versions with a larger private model to come. Continue reading Alibaba’s Latest Vision Model Has Advanced Video Capability

Humanoid Robot Figure 02 Touts Better Strength, Reasoning

By Paula Parisi
August 9, 2024

Robotics startup Figure AI — with investors including OpenAI, Nvidia and Microsoft — has released its next-gen humanoid, Figure 02. Its predecessor made a splash earlier this year with a demo that captured it conversing with an interlocutor as it organized household items and prepared a snack. Compared to the Figure 01 prototype, with exposed wiring and limited range of motion, Figure 02 is more polished. The latest iteration boasts skeletal improvements for heavier lifting as well as enhanced visual reasoning to assist with machine learning. The result is characterized as “a major leap” in AI-powered robotics, a category in which players include Tesla and 1X Technologies. Continue reading Humanoid Robot Figure 02 Touts Better Strength, Reasoning

OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

By Paula Parisi
August 5, 2024

OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall. Continue reading OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus

Solos AirGo Vision Smart Glasses Tout a Camera and GTP-4o

By Paula Parisi
July 9, 2024

San Francisco-based optics company Solos has debuted its latest smart glasses, the Solos AirGo Vision, which offer a camera that takes photos and provides computer vision, and integrates OpenAI’s GPT-4o. The AirGo Vision can provide real-time information using visual input, recognizing people, objects and places, and providing information such as directions or instructions. Both the camera and AI functionality are hands-free, making the AirGo Vision “especially convenient for visual progress and next steps on activities like cooking, home improvement projects, education and studies, and even shopping,” the company explains. Continue reading Solos AirGo Vision Smart Glasses Tout a Camera and GTP-4o