Google Releases Gemini 2.0 in Shift Toward Agentic Era of AI

Google has introduced Gemini 2.0, the latest version of its multimodal AI model, signaling a shift toward what the company is calling “the agentic era.” The upgraded model promises not only to outperform previous iterations on standard benchmarks but also introduces more proactive, or agentic, functions. The company announced that “Project Astra,” its experimental assistant, would receive updates that allow it to use Google Search, Lens, and Maps, and that “Project Mariner,” a Chrome extension, would enable Gemini 2.0 to navigate a user’s web browser to complete tasks autonomously. Continue reading Google Releases Gemini 2.0 in Shift Toward Agentic Era of AI

World Labs AI Lets Users Create 3D Worlds from Single Photo

World Labs, the AI startup co-founded by Stanford AI pioneer Fei-Fei Li, has debuted a “spatial intelligence” system that can generate 3D worlds from a single image. Although the output is not photorealistic, the tech could be a breakthrough for animation companies and video game developers. Deploying what it calls Large World Models (LWMs), World Labs is focused on transforming 2D images into turnkey 3D environments with which users can interact. Observers say that reciprocity is what sets World Labs’ technology apart from offerings by other AI companies that transform 2D to 3D. Continue reading World Labs AI Lets Users Create 3D Worlds from Single Photo

DeepMind Genie 2 Creates Worlds That Emulate Video Games

Google DeepMind’s new Genie 2 is a large foundation world model that generates interactive 3D worlds that are being likened to video games. “Games play a key role in the world of artificial intelligence research,” says Google DeepMind, noting “their engaging nature, challenges and measurable progress make them ideal environments to safely test and advance AI capabilities.” Based on a simple prompt image, Genie 2 is capable of producing “an endless variety of action-controllable, playable 3D environments” — suitable for training and evaluating embodied agents — that can be played by a human or AI agent using keyboard and mouse inputs. Continue reading DeepMind Genie 2 Creates Worlds That Emulate Video Games

Qwen with Questions: Alibaba Previews New Reasoning Model

Alibaba Cloud has released the latest entry in its growing Qwen family of large language models. The new Qwen with Questions (QwQ) is an open-source competitor to OpenAI’s o1 reasoning model. As with competing large reasoning models (LRMs), QwQ can correct its own mistakes, relying on extra compute cycles during inference to assess its responses, making it well suited for reasoning tasks like math and coding. Described as an “experimental research model,” this preview version of QwQ has 32-billion-parameters and a 32,000-token context, leading to speculation that a more powerful iteration is in the offing. Continue reading Qwen with Questions: Alibaba Previews New Reasoning Model

AI Boom Boosts Nvidia Sales by 94 Percent as Profits Double

Nvidia sales were up 94 percent to $35 billion in the most recent quarter when profits more than doubled, to $19.3 billion, telegraphing the strength of the artificial intelligence boom that took the company from the top supplier of graphics boards for gaming PCs to the world’s most valuable public company with a market cap of $3.59 trillion. Nvidia founder and CEO Jensen Huang told analysts that demand for the company’s latest AI chip, Blackwell, has been “incredible,” driving projections of $3.59 trillion in revenue for the current quarter as customers begin to take shipments. Continue reading AI Boom Boosts Nvidia Sales by 94 Percent as Profits Double

Google Offers New AI-Powered Vids App to Workspace Users

Google announced it is rolling out its Gemini AI-powered video presentation app that enables users to easily create video presentations. Vids is a productivity app featured in the company’s suite of Google Workspace products. The new app uses AI model Gemini to automatically insert royalty-free stock video footage, create storyboards and scripts, and generate music and voiceovers. It allows users to add documents, slides, visuals, audio and transitions to the presentation’s timeline. “Personalize your content with Vids recording studio to deliver employee training, share company-wide announcements, meeting updates, and more,” suggests Google. Continue reading Google Offers New AI-Powered Vids App to Workspace Users

Amazon Prime Video Offers AI-Powered Recaps of TV Shows

Amazon Prime Video has begun offering X-Ray Recaps, summaries of favorite TV shows that catch you up without risk of spoilers. The generative AI-powered feature can create snapshots of any requested view — episodes, pieces of episodes or full seasons of TV shows. “Whether you’re a few minutes into a new episode, halfway through a season” or took a break to get popcorn and need a quick refresher, X-Ray Recaps will catch you up “personalized down to the exact minute of where you are watching,” according to Amazon, which assures “guardrails are applied” to ensure the generation of spoiler-free summaries. Continue reading Amazon Prime Video Offers AI-Powered Recaps of TV Shows

MIT Intros LLM-Inspired Teacher for General Purpose Robots

The Massachusetts Institute of Technology has come up what it thinks is a better way to teach robots general purpose skills. Derived from LLM techniques, the method provides robot intelligence access to an enormous amount of data at once, rather than exposing it to individual programs for specific tasks. Faster and more cost efficient, the approach has been referred to as a “brute force” approach to problem-solving, and machine learners have taken to it in lieu of individualized, task-specific “imitation learning.” Early tests show it outperforming traditional training by more than 20 percent under simulation and real-world conditions. Continue reading MIT Intros LLM-Inspired Teacher for General Purpose Robots

OpenAI: sCM Generates Media 50x Faster Than Other Models

OpenAI is taking a new approach to generating media that it says is 50 times faster than the models commonly used today. Called sCM, the approach is a “consistency model,” a variation on the diffusion method used by many leading systems. OpenAI claims its new model is ideal for training for large scale datasets and generating video, audio and images that are of “comparable sample quality to leading diffusion models.” Such models often require hundreds of steps, creating challenges when it comes to real-time applications. OpenAI aims to change this with a faster system that requires less power. Continue reading OpenAI: sCM Generates Media 50x Faster Than Other Models

Penguin Random House Warns All Against AI Model Training

Penguin Random House, the world’s largest commercial book publisher, has updated the copyright disclaimer that appears in every book to say “no part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.” The warning will roll out globally on all new releases as well as backlist titles that are reprinted. Tom Weldon, CEO of Penguin Random House UK, has told staff the company will at its discretion “use generative AI tools selectively and responsibly, where we see a clear case that they can advance our goals.” Continue reading Penguin Random House Warns All Against AI Model Training

Nvidia Releases Open-Source Frontier-Class Multimodal LLMs

Nvidia has unveiled the NVLM 1.0 family of multimodal LLMs, a powerful open-source AI that the company says performs comparably to proprietary systems from OpenAI and Google. Led by NVLM-D-72B, with 72 billion parameters, Nvidia’s new entry in the AI race achieved what the company describes as “state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models.” Nvidia has made the model weights publicly available and says it will also be releasing the training code, a break from the closed approach of OpenAI, Anthropic and Google. Continue reading Nvidia Releases Open-Source Frontier-Class Multimodal LLMs

Cloudflare Tool Can Prevent AI Bots from Scraping Websites

Cloudflare has released AI Audit, a free set of new tools designed to help websites analyze and control how their content is used by artificial intelligence models. Described as “one-click blocking” to prevent unauthorized AI scraping, Cloudflare says it will also make it easier to identify the content bots scan most, so they can wall it off and negotiate payment in exchange for access. Helping its clients toward a sustainable future, Cloudflare is also creating a marketplace for sites to negotiate fees based on AI audits that trace cyber footprints on server files. Continue reading Cloudflare Tool Can Prevent AI Bots from Scraping Websites

White House Launches Effort to Fill 500,000 Technology Jobs

The White House has implemented a program to help fill roughly 500,000 open tech positions across the United States. The program, Service for America, was developed by the White House Office of the National Cyber Director (ONCD) in partnership with the Office of Management and Budget (OMB) and Office of Personnel Management (OPM) to help connect Americans with available jobs in cybersecurity, technology and artificial intelligence. “Our nation has a critical need for cyber talent,” explains ONCD Director Harry Coker, Jr., who notes many of the open cyber positions do not require a computer science degree or deeply technical background. Continue reading White House Launches Effort to Fill 500,000 Technology Jobs

Anthropic Publishes Claude Prompts, Sharing How AI ‘Thinks’

In a move toward increased transparency, San Francisco-based AI startup Anthropic has published the system prompts for three of its most recent large language models: Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku. The information is now available on the web and in the Claude iOS and Android apps. The prompts are instruction sets that reveal what the models can and cannot do. Anthropic says it will regularly update the information, emphasizing that evolving system prompts do not affect the API. Examples of Claude’s prompts include “Claude cannot open URLs, links, or videos” and, when dealing with images, “avoid identifying or naming any humans.” Continue reading Anthropic Publishes Claude Prompts, Sharing How AI ‘Thinks’

OpenAI Pushes GPT-4o Customization with Free Token Offer

OpenAI announced its newest model, GPT-4o, can now be customized. The company said that the ability to fine-tune the multimodal GPT-4o has been “one of the most requested features from developers.” Customization can move the model toward more specific structure and tone of responses or allow it to follow specific instruction sets geared toward individual use cases. Developers can now implement custom datasets, aiming for better performance at a lower cost. The ChatGPT maker is rolling out the welcome mat by offering 1 million training tokens per day “for free for every organization” through September 23. Continue reading OpenAI Pushes GPT-4o Customization with Free Token Offer