Luma AI Dream Machine Video Generator in Free Public Beta

Northern California startup Luma AI has released Dream Machine, a model that generates realistic videos from text prompts and images. Built on a scalable and multimodal transformer architecture and “trained directly on videos,” Dream Machine can create “action-packed scenes” that are physically accurate and consistent, says Luma, which has a free version of the model in public beta. Dream Machine is what Luma calls the first step toward “a universal imagination engine,” while others are calling it “powerful” and “slammed with traffic.” Though Luma has shared scant details, each posted sequence looks to be about 5 seconds long. Continue reading Luma AI Dream Machine Video Generator in Free Public Beta

Musk Said to Envision Supercomputer as xAI Raises $6 Billion

Elon Musk’s xAI has secured $6 billion in Series B funding. While the company says the funds will be “used to take xAI’s first products to market, build advanced infrastructure, and accelerate the research and development,” some outlets are reporting a significant portion is earmarked to build an AI supercomputer to power the next generation of its foundation model Grok. The company publicly released the open-source Grok-1 as a chatbot on X social in November, and recently debuted Grok-1.5 and 1.5V iterations with long-context capability and image understanding. Continue reading Musk Said to Envision Supercomputer as xAI Raises $6 Billion

Meta Advances Multimodal Model Architecture with Chameleon

Meta Platforms has unveiled its first natively multimodal model, Chameleon, which observers say can make it competitive with frontier model firms. Although Chameleon is not yet released, Meta says internal research indicates it outperforms the company’s own Llama 2 in text-only tasks and “matches or exceeds the performance of much larger models” including Google’s Gemini Pro and OpenAI’s GPT-4V in a mixed-modal generation evaluation “where either the prompt or outputs contain mixed sequences of both images and text.” In addition, Meta calls Chameleon’s image generation “non-trivial,” noting that’s “all in a single model.” Continue reading Meta Advances Multimodal Model Architecture with Chameleon

Google Ups AI Quotient with Search-Optimized Gemini Model

Google has infused search with more Gemini AI, adding expanded AI Overviews and more planning and research capabilities. “Ask whatever’s on your mind or whatever you need to get done — from researching to planning to brainstorming — and Google will take care of the legwork” culling from “a knowledge base of billions of facts about people, places and things,” explained Google and Alphabet CEO Sundar Pichai at the Google I/O developer conference. AI Overviews will roll out to all U.S. users this week. Coming soon are customizable AI Overview options that can simplify language or add more detail. Continue reading Google Ups AI Quotient with Search-Optimized Gemini Model

OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

OpenAI CTO Mira Murati announced during a live-streamed event today that the company is launching an updated version of its GPT-4 model that powers OpenAI’s popular chatbot. The new flagship AI model, GPT-4o is reportedly “much faster” and offers improved text, voice and vision capabilities. Murati said GPT-4o will be free to all users, while Plus users will enjoy “up to five times the capacity limits” available to free users. According to OpenAI, the new AI model “can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” Continue reading OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

Meta Launches Enhanced Generative AI Tools for Advertisers

Meta Platforms announced an expanded collection of generative AI features, tools and services for advertisers and businesses. The enhanced AI features include full image and text generation, text overlay capabilities, and image expansion for Reels and the Feed in Facebook and Instagram. The updated tools will be available via Meta Ads Manager through Advantage+ creative. According to Meta: “Our goal is to help you at every step of your journey, whether that’s improving ad performance by helping you develop creative variations, automating certain parts of the ad creation process, or increasing your credibility and engagement through Meta Verified.” Continue reading Meta Launches Enhanced Generative AI Tools for Advertisers

Adobe Considers Sora, Pika and Runway AI for Premiere Pro

Adobe plans to add generative AI capabilities to its Premiere Pro editing platform and is exploring the update with third-party AI technologies including OpenAI’s Sora, as well as models from Runway and Pika Labs, making it easier “to draw on the strengths of different models” within everyday workflows, according to Adobe. Editors will gain the ability to generate and add objects into scenes or shots, remove unwanted elements with a click, and even extend frames and footage length. The company is also developing a video model for its own Firefly AI for video and audio work in Premiere Pro. Continue reading Adobe Considers Sora, Pika and Runway AI for Premiere Pro

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Meta Tests Image-Generating Social Chatbot on Its Platforms

Meta is testing a new large language chatbot, Meta AI, on social platforms in parts of India and Africa. The chatbot was introduced in late 2023, and began testing on U.S. WhatApp users in March. The test is expanding to include more territories and the addition of Instagram and Facebook Messenger. India is reported to be Meta’s largest social market, with more than 500 million Facebook and WhatsApp users, and has big implications as the company scales up its AI plans to compete against OpenAI and others. The Meta AI chatbot answers questions and generates photorealistic images. Continue reading Meta Tests Image-Generating Social Chatbot on Its Platforms

Google Adding Free AI Photo Editing Tools to Google Photos

Beginning May 15, Google Photos users can start accessing a suite of free AI-powered Magic Editor tools like Magic Eraser and Portrait Light. The features will also be accessible on more devices, including Pixel tablets. Last year, Google launched Magic Editor on Pixel 8 and Pixel 8 Pro phones. In addition to making the features available on all Pixel devices, all Google Photos users on Android and iOS will get baseline access to 10 Magic Editor saves per month. Additionally, those with a Pixel device or Premium Google One plan of at least 2TB will have unlimited use. Continue reading Google Adding Free AI Photo Editing Tools to Google Photos

Google Introduces Faster, More Efficient JPEG Coding Library

Google is attacking slow-loading web pages with the new JPEG image encoder/decoder Jpegli, which offers a 35 percent compression ratio improvement using high quality compression settings, the Alphabet company says. The Jpegli JPEG coding library offers backward compatibility via “a fully interoperable encoder and decoder complying with the original JPEG standard and its most conventional 8-bit formalism, and API/ABI compatibility with libjpeg-turbo and MozJPEG,” Google says. The resulting images compressed using Jpegli are “more precise and psychovisually effective” as a result of computations that make images “look clearer” with “fewer observable artifacts.” Continue reading Google Introduces Faster, More Efficient JPEG Coding Library

OpenAI Integrates New Image Editor for DALL-E into ChatGPT

OpenAI has updated the editor for DALL-E, the artificial intelligence image generator that is part of the ChatGPT premium tiers. The update, based on the DALL-E 3 model, makes it easier for users to adjust their generated images. Shortly after DALL-E 3’s September debut, OpenAI integrated it into ChatGPT, enabling paid subscribers to generate images from text or image prompts. The new DALL-E editor interface lets users edit images “by selecting an area of the image to edit and describing your changes in chat” without using the selection tool. Desired changes can also be prompted “in the conversation panel,” according to OpenAI. Continue reading OpenAI Integrates New Image Editor for DALL-E into ChatGPT

New Tech from MIT, Adobe Advances Generative AI Imaging

Researchers from the Massachusetts Institute of Technology and Adobe have unveiled a new AI acceleration tool that makes generative apps like DALL-E 3 and Stable Diffusion up to 30x faster by reducing the process to a single step. The new approach, called distribution matching distillation, or DMD, maintains or enhances image quality while greatly streamlining the process. Theoretically, the technique “marries the principles of generative adversarial networks (GANs) with those of diffusion models,” consolidating “the hundred steps of iterative refinement required by current diffusion models” into one step, MIT PhD student and project lead Tianwei Yin says. Continue reading New Tech from MIT, Adobe Advances Generative AI Imaging

Stable Video 3D Generates Orbital Animation from One Image

Stability AI has released Stable Video 3D, a generative video model based on the company’s foundation model Stable Video Diffusion. SV3D, as it’s called,  comes in two versions. Both can generate and animate multi-view 3D meshes from a single image. The more advanced version also let users set “specified camera paths” for a “filmed” look to the video generation. “By adapting our Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object,” the company explains. Continue reading Stable Video 3D Generates Orbital Animation from One Image

Apple Unveils Progress in Multimodal Large Language Models

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models