By
Paula ParisiMarch 18, 2025
Baidu has launched two new AI systems, the native multimodal foundation model Ernie 4.5 and deep-thinking reasoning model Ernie X1. The latter supports features like generative imaging, advanced search and webpage content comprehension. Baidu is touting Ernie X1 as of comparable performance to another Chinese model, DeepSeek-R1, but says it is half the price. Both Baidu models are available to the public, including individual users, through the Ernie website. Baidu, the dominant search engine in China, says its new models mark a milestone in both reasoning and multimodal AI, “offering advanced capabilities at a more accessible price point.” Continue reading Baidu Releases New LLMs that Undercut Competition’s Price
By
Paula ParisiFebruary 19, 2025
YouTube Shorts has upgraded its Dream Screen AI background generator to incorporate Google DeepMind’s latest video model, Veo 2, which will also generate standalone video clips that users can post to Shorts. “Need a specific scene but don’t have the right footage? Want to turn your imagination into reality and tell a unique story? Simply use a text prompt to generate a video clip that fits perfectly into your narrative, or create a whole new world,” coaxes YouTube, which seems to be trying out “Dream Screen” branding as an umbrella for its genAI efforts. Continue reading YouTube Shorts Updates Dream Screen with Google Veo 2 AI
By
Paula ParisiJanuary 30, 2025
Less than a week after sending tremors through Silicon Valley and across the media landscape with an affordable large language model called DeepSeek-R1, the Chinese AI startup behind that technology has debuted another new product — the multimodal Janus-Pro-7B with an aptitude for image generation. Further mining the vein of efficiency that made R1 impressive to many, Janus-Pro-7B utilizes “a single, unified transformer architecture for processing.” Emphasizing “simplicity, high flexibility and effectiveness,” DeepSeek says Janus Pro is positioned to be a frontrunner among next-generation unified multimodal models. Continue reading DeepSeek Follows Its R1 LLM Debut with Multimodal Janus-Pro
By
Paula ParisiJanuary 27, 2025
Perplexity joins the list of AI companies launching agents, debuting the Perplexity Assistant for Android. The tool uses reasoning, search, browsers and apps to help mobile users with daily tasks. Concurrently, Perplexity — independently founded in 2022 as a conversational AI search engine — has launched an API called Sonar intended for enterprise and developers who want real-time intelligent search, taking on heavyweights like Google, OpenAI and Anthropic. While to date AI search has largely been limited to answers informed by training data, which freezes their knowledge in time, next-gen tools can pull from the Internet in real time. Continue reading Perplexity Bows Real-Time AI Search Tool, Android Assistant
By
Paula ParisiJanuary 13, 2025
The Xreal One Pro AR glasses have raised the stakes for those competing in the wearable augmented reality space, according to some CES 2025 attendees. The eyewear, which debuted at the show, updates the Xreal One, released in the U.S. last month. The Pro’s cinematic virtual display (of up to 447 inches) comes with 57-degree field of view, an improvement over the Xreal One’s 50-degree FOV. Xreal says the Pro model offers “professional-grade color accuracy.” An optional detachable 12MP camera, Xreal Eye, captures photos and video. The new model will sell for $500 starting in March. Continue reading CES: Xreal One Pro AR Glasses Are Thinner, with Greater FOV
By
Paula ParisiDecember 18, 2024
Twelve Labs has raised $30 million in funding for its efforts to train video-analyzing models. The San Francisco-based company has received strategic investments from notable enterprise infrastructure providers Databricks and SK Telecom as well as Snowflake Ventures and HubSpot Ventures. Twelve Labs targets customers using video across a variety of fields including media and entertainment, professional sports leagues, content creators and business users. The funding coincides with the release of Twelve Labs’ new video foundation model, Marengo 2.7, which applies a multi-vector approach to video understanding. Continue reading Twelve Labs Creating AI That Can Search and Analyze Video
By
Paula ParisiNovember 22, 2024
Microsoft’s expansion of AI agents within the Copilot Studio ecosystem was a central focus of the company’s Ignite conference. Since the launch of Copilot Studio, more than 100,000 enterprise organizations have created or edited AI agents using the platform. Copilot Studio is getting new features to increase productivity, including multimodal capabilities that take agents beyond text and Retrieval Augmented Generation (RAG) enhancements to enable agents with real-time knowledge from multiple third-party sources, such as Salesforce, ServiceNow, and Zendesk. Integration with Azure is expanded as 1,800 large language models in the Azure catalog are made available. Continue reading Microsoft Pushes Copilot Studio Agents, Adds Azure Models
By
Paula ParisiOctober 4, 2024
Nvidia has unveiled the NVLM 1.0 family of multimodal LLMs, a powerful open-source AI that the company says performs comparably to proprietary systems from OpenAI and Google. Led by NVLM-D-72B, with 72 billion parameters, Nvidia’s new entry in the AI race achieved what the company describes as “state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models.” Nvidia has made the model weights publicly available and says it will also be releasing the training code, a break from the closed approach of OpenAI, Anthropic and Google. Continue reading Nvidia Releases Open-Source Frontier-Class Multimodal LLMs
By
Paula ParisiOctober 2, 2024
Snap Inc. is leveraging its relationship with Google Cloud to use Gemini for powering generative AI experiences within Snapchat’s My AI chatbot. The multimodal capabilities of Gemini on Vertex AI will greatly increase the My AI chatbot’s ability to understand and operate across different types of information such as text, audio, image, video and code. Snapchatters can use My AI to take advantage of Google Lens-like features, including asking the chatbot “to translate a photo of a street sign while traveling abroad, or take a video of different snack offerings to ask which one is the healthiest option.” Continue reading Snapchat: My AI Goes Multimodal with Google Cloud, Gemini
By
Paula ParisiAugust 27, 2024
OpenAI announced its newest model, GPT-4o, can now be customized. The company said that the ability to fine-tune the multimodal GPT-4o has been “one of the most requested features from developers.” Customization can move the model toward more specific structure and tone of responses or allow it to follow specific instruction sets geared toward individual use cases. Developers can now implement custom datasets, aiming for better performance at a lower cost. The ChatGPT maker is rolling out the welcome mat by offering 1 million training tokens per day “for free for every organization” through September 23. Continue reading OpenAI Pushes GPT-4o Customization with Free Token Offer
By
Paula ParisiAugust 20, 2024
Google has released its AI assistant, Gemini Live, and is positioning it to replace Google Assistant on mobile. Gemini Live is rolling out on Android to subscribers of Gemini Advanced, which is part of the $20 monthly Google One AI Premium plan. Those consumers who purchase the new Pixel 9 Pro — which begins shipping this week — will get the assistant as part of a year of free access to Gemini Advanced, a $240 value, according to the company. Google claims that Gemini Live technology enables natural, flowing conversations with the AI assistant, putting “a sidekick in your pocket.” Continue reading Google Rolls Out Its Gemini Live, Challenging ChatGPT Voice
By
Paula ParisiAugust 5, 2024
OpenAI has released its new Advanced Voice Mode in a limited alpha rollout for select ChatGPT Plus users. The feature, which is being implemented for the ChatGPT mobile app on Android and iOS, aims for more natural dialogue with the AI chatbot. Powered by GPT-4o, which is multimodal, Advanced Voice Mode is said to be able to sense emotional inflections, including excitement, sadness or singing. According to an OpenAI post on X, the company plans to “continue to add more people on a rolling basis” so that everyone using ChatGPT Plus will have access to the new feature in the fall. Continue reading OpenAI Brings Advanced Voice Mode Feature to ChatGPT Plus
By
Paula ParisiJuly 19, 2024
U.S. tech companies are fighting back against what they feel are overly oppressive European Union regulations by withholding products from that market. Meta Platforms will not release its next Llama multimodal AI model there, along with future products. Apple last month said certain Apple Intelligence AI features will not be released in the EU. Previously, tech companies would accommodate regional laws by adapting global strategies so they could do business everywhere with the same products. Given the restrictions of the Digital Markets Act and other EU rules, Big Tech is signaling that may no longer be possible. Continue reading Tough EU Laws Prompt Meta, Apple to Withhold New Products
By
Paula ParisiJuly 3, 2024
Apple has released a public demo of the 4M AI model it developed in collaboration with the Swiss Federal Institute of Technology Lausanne (EPFL). The technology debuts seven months after the model was first open-sourced, allowing informed observers the opportunity to interact with it and assess its capabilities. Apple says 4M was built by applying masked modeling to a single unified Transformer encoder-decoder “across a wide range of input/output modalities — including text, images, geometric and semantic modalities, as well as neural network feature maps.” Continue reading Apple Launches Public Demo of Its Multimodal 4M AI Model
By
Paula ParisiJune 19, 2024
Runway ML has introduced a new foundation model, Gen-3 Alpha, which the company says can generate high-quality, realistic scenes of up to 10 seconds long from text prompts, still images or a video sample. Offering a variety of camera movements, Gen-3 Alpha will initially roll out to Runway’s paid subscribers, but the company plans to add a free version in the future. Runway says Gen-3 Alpha is the first of a new series of models trained on the company’s new large-scale multimodal infrastructure, which offers improvements “in fidelity, consistency, and motion over Gen-2,” released last year. Continue reading Runway’s Gen-3 Alpha Creates AI Videos Up to 10-Seconds