By
Paula ParisiMarch 24, 2025
OpenAI has debuted three new models for transcription and voice generation — gpt-4o-transcribe, gpt-4o-mini-transcribe and gpt-4o-mini-tts. The text-to-speech and speech-to-text AI models are designed to help developers create AI agents with highly customizable voices. OpenAI claims these models will power natural and responsive voice agents, moving AI out of the text-based communications stage and into intuitive spoken conversations. The suite outperforms existing solutions in accuracy and reliability, OpenAI says, especially with “accents, noisy environments, and varying speech speeds,” making them well-suited for customer call centers and meeting notes. Continue reading OpenAI Pushes Conversational Agents with Three New Models
By
Paula ParisiMarch 21, 2025
A new Discord Social SDK allows developers to integrate the platform in-app for games. Discord is massively popular with gamers; the company estimates PC players alone spend more than 1.5 billion hours each month on the platform. This free SDK can extend the user experience beyond the third-party content in which it becomes embedded to reach the platform’s community of over 200 million monthly active users. “Developers can power friends lists, cross-platform messaging, voice and more for all players — with or without a Discord account,” the company announced. Continue reading New Discord Social SDK Integrates Platform In-App for Games
By
Paula ParisiMarch 19, 2025
San Mateo, California-based game developer Roblox has released a 3D object generator called Cube 3D, the first of several models the company plans to make available. Cube currently generates 3D models and environments from text, and in the future the company plans to add image inputs. Roblox says it is open-sourcing the tool, making it available to users on and off the platform. Cube will serve as the core generative AI system for Roblox’s 3D and 4D plans, the latter referring to interactive responsiveness. The launch coincides with the Game Developers Conference, running through Friday in San Francisco. Continue reading Roblox Reveals Its Generative AI System Cube for 3D and 4D
By
Paula ParisiMarch 18, 2025
Baidu has launched two new AI systems, the native multimodal foundation model Ernie 4.5 and deep-thinking reasoning model Ernie X1. The latter supports features like generative imaging, advanced search and webpage content comprehension. Baidu is touting Ernie X1 as of comparable performance to another Chinese model, DeepSeek-R1, but says it is half the price. Both Baidu models are available to the public, including individual users, through the Ernie website. Baidu, the dominant search engine in China, says its new models mark a milestone in both reasoning and multimodal AI, “offering advanced capabilities at a more accessible price point.” Continue reading Baidu Releases New LLMs that Undercut Competition’s Price
By
Paula ParisiMarch 13, 2025
Feeling the pressure from the “open agent” movement and specifically Chinese startup Butterfly Effect and its new product Manus, OpenAI has expanded the capabilities of its own AI technology, launching new tools to help businesses and developers build their own agents. The company’s new Responses API has the functionality of two earlier tools, the Chat Completions API (facilitating ChatGPT queries and responses) and the Assistants API (for multi-step reasoning and file access). The company is also issuing an Agents SDK, a suite of tools for creating and deploying agents that bundles the Responses API. Continue reading OpenAI Ramps Up Its Agent Functions as Competition Surges
By
Paula ParisiMarch 10, 2025
Google has added Gemini Embedding to its Gemini developer API. This new experimental model for text translates words, phrases and other text inputs into numerical representations, otherwise known as embeddings, which capture their semantic meaning. Embeddings are used in a wide range of applications including document retrieval and classification, potentially reducing costs and improving latency. Google is also testing an expansion of its AI Overviews search feature as part of a Gemini 2.0 update. Called AI Mode, it helps explain complex topics by generating search results that use advanced reasoning and thinking capabilities. Continue reading Google Updates AI Search and Intros Gemini Text Embedding
By
Paula ParisiMarch 4, 2025
OpenAI is releasing a research preview of what it calls its “largest and best” chat model to date, GPT‑4.5, which scales unsupervised learning in pre-training and post-training. As a result, the new chat model has the ability to recognize patterns, draw connections, and generate creative insights without having to draw on time and energy consuming “reasoning.” GPT‑4.5 is currently available to ChatGPT Pro subscribers ($200 per month) and developers subscribing to OpenAI’s API tier. ChatGPT Plus and ChatGPT Team customers are expected to gain access this week. Continue reading OpenAI’s GPT-4.5 Model Sees Patterns and Thinks Creatively
By
Paula ParisiJanuary 30, 2025
Jack Dorsey’s financial tech and media firm Block (formerly Square) has released a platform for building AI agents: Codename Goose. Previously available in beta, Goose is primarily designed to build agents for coding and software development, but Block built in many basic features that could be applied to general purpose pursuits. Because it is open source and offered under Apache License 2.0, the hope is that developers will apply it to varied use cases. A leading feature of Codename Goose is its flexibility. It can integrate a wide range of large language models, letting developers use it with their preferred model. Continue reading Codename Goose: Block Unveils Open-Source AI Agent Builder
By
Paula ParisiJanuary 27, 2025
Perplexity joins the list of AI companies launching agents, debuting the Perplexity Assistant for Android. The tool uses reasoning, search, browsers and apps to help mobile users with daily tasks. Concurrently, Perplexity — independently founded in 2022 as a conversational AI search engine — has launched an API called Sonar intended for enterprise and developers who want real-time intelligent search, taking on heavyweights like Google, OpenAI and Anthropic. While to date AI search has largely been limited to answers informed by training data, which freezes their knowledge in time, next-gen tools can pull from the Internet in real time. Continue reading Perplexity Bows Real-Time AI Search Tool, Android Assistant
By
Rob ScottJanuary 24, 2025
Just weeks after Nvidia announced the availability of its $249 “compact AI supercomputer,” the Jetson Orin Nano Super Developer Kit for startups and hobbyists, CEO Jensen Huang revealed the company is planning to launch a personal AI supercomputer called Project Digits with a starting price of $3,000. The desktop-sized system features the GB10 Grace Blackwell Superchip, which enables it to handle AI models with up to 200 billion parameters. Nvidia claims there is enough processing power to run high-end AI models (performing up to one quadrillion AI calculations per second) while the compact system can run from a standard power outlet. Continue reading CES: Nvidia Will Launch a $3,000 Personal AI Supercomputer
By
Paula ParisiJanuary 14, 2025
Nvidia Cosmos, a platform of generative world foundation models (WFMs) and related tools to advance the development of physical AI systems like autonomous vehicles and robots, was introduced at CES 2025. Cosmos WFMs are designed to provide developers a way to generate massive amounts of photo-real, physics-based synthetic data to train and evaluate their existing models. The goal is to reduce costs by streamlining real-world testing with a ready data pipeline. Developers can also build custom models by fine-tuning Cosmos WFMs. Cosmos integrates Nvidia Omniverse, a physics simulation tool used for entertainment world-building. Continue reading CES: Nvidia’s Cosmos Models Teach AI About Physical World
By
Douglas ChanJanuary 9, 2025
During the “Speed, Customization, Innovation: AI in Gaming” panel during CES this week, game publishers and developers shared their latest insights regarding how they use generative AI tools. A prevailing question involved the impact of AI’s ability to generate pixels and video frames efficiently — especially in light of Nvidia’s keynote the prior evening announcing its new Blackwell RTX 50 Series GPUs’ enormous ability to do so. Other opinions shared during the panel included thoughts on whether AI is overhyped for gaming and wish lists for fixing the limitations of AI tools. Continue reading CES: Thoughts on the Benefits and Limitations of AI in Gaming
By
Debra KaufmanJanuary 8, 2025
In an era of tremendous innovation and an explosion of new lines of products, the creation of standards has never been so important. UL Standards & Engagement (ULSE) created its first standard in 1903 and now boasts a portfolio of 1,700 standards; other standards-setting bodies include the Consumer Technology Association (CTA) and the Connectivity Standards Alliance (CSA). Moderated by ULSE Director of Insights Sayon Deb, a CES panel of experts underscored the critical importance of such standards for developing and marketing innovative products. According to Deb, 60 percent of consumers express greater confidence in certified products. Continue reading CES: Standards Are Increasingly Vital for Fostering Innovation
By
Paula ParisiDecember 12, 2024
World Labs, the AI startup co-founded by Stanford AI pioneer Fei-Fei Li, has debuted a “spatial intelligence” system that can generate 3D worlds from a single image. Although the output is not photorealistic, the tech could be a breakthrough for animation companies and video game developers. Deploying what it calls Large World Models (LWMs), World Labs is focused on transforming 2D images into turnkey 3D environments with which users can interact. Observers say that reciprocity is what sets World Labs’ technology apart from offerings by other AI companies that transform 2D to 3D. Continue reading World Labs AI Lets Users Create 3D Worlds from Single Photo
By
Paula ParisiDecember 6, 2024
Google DeepMind’s new Genie 2 is a large foundation world model that generates interactive 3D worlds that are being likened to video games. “Games play a key role in the world of artificial intelligence research,” says Google DeepMind, noting “their engaging nature, challenges and measurable progress make them ideal environments to safely test and advance AI capabilities.” Based on a simple prompt image, Genie 2 is capable of producing “an endless variety of action-controllable, playable 3D environments” — suitable for training and evaluating embodied agents — that can be played by a human or AI agent using keyboard and mouse inputs. Continue reading DeepMind Genie 2 Creates Worlds That Emulate Video Games