Google Adds Gemini Flash Thinking to Search, Maps and More

Google has initiated a flurry of AI activity following the recent collection of Chinese AI releases. The Alphabet company has launched an experimental version of a new flagship AI model, Gemini 2.0 Pro. Its premiere coding and complex questions model is now available in Google AI Studio, Vertex AI and the Gemini Advanced app. The company has also made its general-purpose “workhorse” model, Gemini 2.0 Flash, available in general release via the Gemini API in AI Studio and Vertex. This follows last week’s announcement that Gemini 2.0 Flash is powering the Gemini app for desktop and mobile. Continue reading Google Adds Gemini Flash Thinking to Search, Maps and More

Snap Develops a Lightweight Text-to-Video AI Model In-House

Snap has created a lightweight AI text-to-image model that will run on-device, expected to power some Snapchat mobile features in the months ahead. Using an iPhone 16 Pro Max, the model can produce high-resolution images in approximately 1.4 seconds, running on the phone, which reduces computational costs. Snap says the research model “is the continuation of our long-term investment in cutting edge AI and ML technologies that enable some of today’s most advanced interactive developer and consumer experiences.” Among the Snapchat AI features the new model will enhance are AI Snaps and AI Bitmoji Backgrounds. Continue reading Snap Develops a Lightweight Text-to-Video AI Model In-House

ChatGPT ‘Deep Research’ Agent Can Create Detailed Reports

ChatGPT has a new “deep research” agent that OpenAI says uses reasoning to synthesize large amounts of online information and complete multi-step research tasks. “It accomplishes in tens of minutes what would take a human many hours,” OpenAI suggests, claiming it will “synthesize hundreds of online sources to create a comprehensive report at the level of a research analyst.” Powered by a version of the upcoming OpenAI o3 model optimized for web browsing and data analysis, the company says the deep research agent will typically take 5 to 30 minutes to complete its work. The agent is described as an ideal research tool for areas such as finance, science and engineering. Continue reading ChatGPT ‘Deep Research’ Agent Can Create Detailed Reports

Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

An internecine AI battle has erupted between Alibaba and DeepSeek. Days after DeepSeek dominated several news cycles with its affordable DeepSeek-R1 reasoning model and the multimodal Janus-Pro-7B, Alibaba released its latest LLM, Qwen 2.5-Max, available via API from Alibaba Cloud. As with DeepSeek, Alibaba is looking beyond its domestic borders, but the fact that a public-facing AI battle is heating up between Chinese companies indicates the People’s Republic isn’t going to quietly cede the AI race to the U.S. Alibaba claims Qwen 2.5-Max outperforms models from DeepSeek, Meta and OpenAI. Continue reading Alibaba Plans to Take On AI Competitors with Qwen2.5-Max

Codename Goose: Block Unveils Open-Source AI Agent Builder

Jack Dorsey’s financial tech and media firm Block (formerly Square) has released a platform for building AI agents: Codename Goose. Previously available in beta, Goose is primarily designed to build agents for coding and software development, but Block built in many basic features that could be applied to general purpose pursuits. Because it is open source and offered under Apache License 2.0, the hope is that developers will apply it to varied use cases. A leading feature of Codename Goose is its flexibility. It can integrate a wide range of large language models, letting developers use it with their preferred model. Continue reading Codename Goose: Block Unveils Open-Source AI Agent Builder

DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro

Perplexity Bows Real-Time AI Search Tool, Android Assistant

Perplexity joins the list of AI companies launching agents, debuting the Perplexity Assistant for Android. The tool uses reasoning, search, browsers and apps to help mobile users with daily tasks. Concurrently, Perplexity — independently founded in 2022 as a conversational AI search engine — has launched an API called Sonar intended for enterprise and developers who want real-time intelligent search, taking on heavyweights like Google, OpenAI and Anthropic. While to date AI search has largely been limited to answers informed by training data, which freezes their knowledge in time, next-gen tools can pull from the Internet in real time. Continue reading Perplexity Bows Real-Time AI Search Tool, Android Assistant

OpenAI Operator Agent Available to ChatGPT Pro Subscribers

OpenAI has launched Operator, a semi-autonomous AI agent that uses a proprietary web browser to execute tasks like planning a vacation using Tripadvisor or booking restaurant reservations through OpenTable. “It can look at a webpage and interact with it by typing, clicking and scrolling,” explains OpenAI. Operator is powered by a new model called Computer-Using Agent (CUA), and is available in research preview to ChatGPT Pro subscribers in the U.S. Combining GPT-4o’s computer vision capabilities with advanced reasoning, CUA is trained to interact with graphical user interfaces (GUIs) — parsing menus, clicking buttons and reading screen text. Continue reading OpenAI Operator Agent Available to ChatGPT Pro Subscribers

Nvidia Targets Consumers with $249 Compact Supercomputer

Nvidia is hoping interest in artificial intelligence will translate to consumer sales of a relatively low-priced computer optimized for basic AI functionality. Last month, the company upgraded its Jetson line with a $249 “compact AI supercomputer,” the Jetson Orin Nano Super Developer Kit. At half the price of the original, the model aims to attract students, developers, hobbyists, small- and medium-sized businesses, and anyone who is AI curious. “As the AI world is moving from task-specific models into foundation models, it provides an accessible platform to transform ideas into reality,” according to Nvidia. Continue reading Nvidia Targets Consumers with $249 Compact Supercomputer

OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

OpenAI has unveiled a new frontier model, OpenAI o3, which it claims can “reason” through challenges involving math, science and computer programming. Available to safety and research testers, it is expected to be available to individuals and businesses this year. OpenAI o3 is said to be over 20 percent more efficient at common programming tasks than its predecessor OpenAI o1 and beat a company scientist on a programming test. Model o3 is part of a broader effort to create AI systems that can reason through complex problems. In late December Google debuted a similar platform, the experimental Gemini 2.0 Flash Thinking Mode. Continue reading OpenAI Previews Two New Reasoning Models: o3 and o3-Mini

CES: Google TV Integrates Gemini AI for a Conversational Feel

Google TV is incorporating Gemini AI to make it easier to converse with a voice assistant as well as generating helpful onscreen information. These new Google TV devices will also feature an upgraded, Gemini-powered voice experience capable of handling more complex voice commands. “You and your family will be able to gather together and have a natural conversation with your TV,” Google announced at CES 2025, where it shared a preview of the new capabilities. The Gemini model also lets Google TV users create customized artwork, control smart home devices and get an overview of the day’s news. Continue reading CES: Google TV Integrates Gemini AI for a Conversational Feel

CES: AI Pioneer Yann LeCun on AI Agents, Human Intelligence

During CES 2025 in Las Vegas this week, Meta Vice President and Chief AI Scientist Yann LeCun had a compelling conversation with Wing Venture Capital Head of Research Rajeev Chand on the latest hot button topics in the rapidly evolving field of artificial intelligence. Some of the conclusions were that AI agents will become ubiquitous — but not for 10 to 15 years, human intelligence means different things to different AI experts, and nuclear power remains the best and safest source for powering AI. And, for those looking for more of LeCun’s tweets, he said he no longer posts on X. Continue reading CES: AI Pioneer Yann LeCun on AI Agents, Human Intelligence

Microsoft AI Forecast Includes $80B in Data Center Spending

Microsoft anticipates spending $80 billion to construct AI data centers in fiscal 2025, which ends in June. More than half of that investment will fund U.S. infrastructure, according to company Vice Chair and President Brad Smith. The move aims to keep Microsoft, which owns a stake in OpenAI, a leader in artificial intelligence, and bolster the nation’s position in the global AI race, which Smith says it currently leads, “thanks to the investment of private capital and innovations by American companies of all sizes, from dynamic startups to well-established enterprises.” Continue reading Microsoft AI Forecast Includes $80B in Data Center Spending

Veo 2 Is Unveiled Weeks After Google Debuted Veo in Preview

Attempting to stay ahead of OpenAI in the generative video race, Google announced Veo 2, which it says can output 4K clips of two-minutes-plus at 4096 x 2160 pixels. Competitor Sora can generate video of up to 20 seconds at 1080p. However, TechCrunch says Veo 2’s supremacy is “theoretical” since it is currently available only through Google Labs’ experimental VideoFX platform, which is limited to videos of up to 8-seconds at 720p. VideoFX is also waitlisted, but Google says it will expand access this week (with no comment on expanding the cap). Continue reading Veo 2 Is Unveiled Weeks After Google Debuted Veo in Preview

Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam

Meta has added new features to Ray-Ban Metas in time for the holidays via a firmware update that make the smart glasses “the gift that keeps on giving,” per Meta marketing.  “Live AI” adds computer vision, letting Meta AI see and record what you see “and converse with you more naturally than ever before.” Along with Live AI, Live Translation is available for Meta Early Access members. Translation of Spanish, French or Italian will pipe through as English (or vice versa) in real time as audio in the glasses’ open-ear speakers. In addition, Shazam support is added for users interested in easily identifying songs. Continue reading Ray-Ban Meta Gets Live AI, RT Language Translation, Shazam