Image Archives - Page 5 of 14

Meta Advances Multimodal Model Architecture with Chameleon

By Paula Parisi
May 28, 2024

Meta Platforms has unveiled its first natively multimodal model, Chameleon, which observers say can make it competitive with frontier model firms. Although Chameleon is not yet released, Meta says internal research indicates it outperforms the company’s own Llama 2 in text-only tasks and “matches or exceeds the performance of much larger models” including Google’s Gemini Pro and OpenAI’s GPT-4V in a mixed-modal generation evaluation “where either the prompt or outputs contain mixed sequences of both images and text.” In addition, Meta calls Chameleon’s image generation “non-trivial,” noting that’s “all in a single model.” Continue reading Meta Advances Multimodal Model Architecture with Chameleon

Google Ups AI Quotient with Search-Optimized Gemini Model

By Paula Parisi
May 16, 2024

Google has infused search with more Gemini AI, adding expanded AI Overviews and more planning and research capabilities. “Ask whatever’s on your mind or whatever you need to get done — from researching to planning to brainstorming — and Google will take care of the legwork” culling from “a knowledge base of billions of facts about people, places and things,” explained Google and Alphabet CEO Sundar Pichai at the Google I/O developer conference. AI Overviews will roll out to all U.S. users this week. Coming soon are customizable AI Overview options that can simplify language or add more detail. Continue reading Google Ups AI Quotient with Search-Optimized Gemini Model

OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

By Rob Scott
May 13, 2024

OpenAI CTO Mira Murati announced during a live-streamed event today that the company is launching an updated version of its GPT-4 model that powers OpenAI’s popular chatbot. The new flagship AI model, GPT-4o is reportedly “much faster” and offers improved text, voice and vision capabilities. Murati said GPT-4o will be free to all users, while Plus users will enjoy “up to five times the capacity limits” available to free users. According to OpenAI, the new AI model “can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.” Continue reading OpenAI Unveils Faster AI Model, Desktop Version of ChatGPT

Meta Launches Enhanced Generative AI Tools for Advertisers

By Rob Scott
May 10, 2024

Meta Platforms announced an expanded collection of generative AI features, tools and services for advertisers and businesses. The enhanced AI features include full image and text generation, text overlay capabilities, and image expansion for Reels and the Feed in Facebook and Instagram. The updated tools will be available via Meta Ads Manager through Advantage+ creative. According to Meta: “Our goal is to help you at every step of your journey, whether that’s improving ad performance by helping you develop creative variations, automating certain parts of the ad creation process, or increasing your credibility and engagement through Meta Verified.” Continue reading Meta Launches Enhanced Generative AI Tools for Advertisers

Adobe Considers Sora, Pika and Runway AI for Premiere Pro

By ETCentric Staff
April 24, 2024

Adobe plans to add generative AI capabilities to its Premiere Pro editing platform and is exploring the update with third-party AI technologies including OpenAI’s Sora, as well as models from Runway and Pika Labs, making it easier “to draw on the strengths of different models” within everyday workflows, according to Adobe. Editors will gain the ability to generate and add objects into scenes or shots, remove unwanted elements with a click, and even extend frames and footage length. The company is also developing a video model for its own Firefly AI for video and audio work in Premiere Pro. Continue reading Adobe Considers Sora, Pika and Runway AI for Premiere Pro

Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

By ETCentric Staff
April 22, 2024

Microsoft has developed VASA, a framework for generating lifelike virtual characters with vocal capabilities including speaking and singing. The premiere model, VASA-1, can perform the feat in real time from a single static image and a vocalization clip. The research demo showcases realistic audio-enhanced faces that can be fine-tuned to look in different directions or change expression in video clips of up to one minute at 512 x 512 pixels and up to 40fps “with negligible starting latency,” according to Microsoft, which says “it paves the way for real-time engagements with lifelike avatars that emulate human conversational behaviors.” Continue reading Microsoft’s VASA-1 Can Generate Talking Faces in Real Time

Meta Tests Image-Generating Social Chatbot on Its Platforms

By ETCentric Staff
April 17, 2024

Meta is testing a new large language chatbot, Meta AI, on social platforms in parts of India and Africa. The chatbot was introduced in late 2023, and began testing on U.S. WhatApp users in March. The test is expanding to include more territories and the addition of Instagram and Facebook Messenger. India is reported to be Meta’s largest social market, with more than 500 million Facebook and WhatsApp users, and has big implications as the company scales up its AI plans to compete against OpenAI and others. The Meta AI chatbot answers questions and generates photorealistic images. Continue reading Meta Tests Image-Generating Social Chatbot on Its Platforms

Google Adding Free AI Photo Editing Tools to Google Photos

By ETCentric Staff
April 12, 2024

Beginning May 15, Google Photos users can start accessing a suite of free AI-powered Magic Editor tools like Magic Eraser and Portrait Light. The features will also be accessible on more devices, including Pixel tablets. Last year, Google launched Magic Editor on Pixel 8 and Pixel 8 Pro phones. In addition to making the features available on all Pixel devices, all Google Photos users on Android and iOS will get baseline access to 10 Magic Editor saves per month. Additionally, those with a Pixel device or Premium Google One plan of at least 2TB will have unlimited use. Continue reading Google Adding Free AI Photo Editing Tools to Google Photos

Google Introduces Faster, More Efficient JPEG Coding Library

By ETCentric Staff
April 10, 2024

Google is attacking slow-loading web pages with the new JPEG image encoder/decoder Jpegli, which offers a 35 percent compression ratio improvement using high quality compression settings, the Alphabet company says. The Jpegli JPEG coding library offers backward compatibility via “a fully interoperable encoder and decoder complying with the original JPEG standard and its most conventional 8-bit formalism, and API/ABI compatibility with libjpeg-turbo and MozJPEG,” Google says. The resulting images compressed using Jpegli are “more precise and psychovisually effective” as a result of computations that make images “look clearer” with “fewer observable artifacts.” Continue reading Google Introduces Faster, More Efficient JPEG Coding Library

OpenAI Integrates New Image Editor for DALL-E into ChatGPT

By ETCentric Staff
April 5, 2024

OpenAI has updated the editor for DALL-E, the artificial intelligence image generator that is part of the ChatGPT premium tiers. The update, based on the DALL-E 3 model, makes it easier for users to adjust their generated images. Shortly after DALL-E 3’s September debut, OpenAI integrated it into ChatGPT, enabling paid subscribers to generate images from text or image prompts. The new DALL-E editor interface lets users edit images “by selecting an area of the image to edit and describing your changes in chat” without using the selection tool. Desired changes can also be prompted “in the conversation panel,” according to OpenAI. Continue reading OpenAI Integrates New Image Editor for DALL-E into ChatGPT

New Tech from MIT, Adobe Advances Generative AI Imaging

By ETCentric Staff
March 28, 2024

Researchers from the Massachusetts Institute of Technology and Adobe have unveiled a new AI acceleration tool that makes generative apps like DALL-E 3 and Stable Diffusion up to 30x faster by reducing the process to a single step. The new approach, called distribution matching distillation, or DMD, maintains or enhances image quality while greatly streamlining the process. Theoretically, the technique “marries the principles of generative adversarial networks (GANs) with those of diffusion models,” consolidating “the hundred steps of iterative refinement required by current diffusion models” into one step, MIT PhD student and project lead Tianwei Yin says. Continue reading New Tech from MIT, Adobe Advances Generative AI Imaging

Stable Video 3D Generates Orbital Animation from One Image

By ETCentric Staff
March 25, 2024

Stability AI has released Stable Video 3D, a generative video model based on the company’s foundation model Stable Video Diffusion. SV3D, as it’s called, comes in two versions. Both can generate and animate multi-view 3D meshes from a single image. The more advanced version also let users set “specified camera paths” for a “filmed” look to the video generation. “By adapting our Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object,” the company explains. Continue reading Stable Video 3D Generates Orbital Animation from One Image

Apple Unveils Progress in Multimodal Large Language Models

By ETCentric Staff
March 19, 2024

Apple researchers have gone public with new multimodal methods for training large language models using both text and images. The results are said to enable AI systems that are more powerful and flexible, which could have significant ramifications for future Apple products. These new models, which Apple calls MM1, support up to 30 billion parameters. The researchers identify multimodal large language models (MLLMs) as “the next frontier in foundation models,” which exceed the performance of LLMs and “excel at tasks like image captioning, visual question answering and natural language inference.” Continue reading Apple Unveils Progress in Multimodal Large Language Models

Midjourney Creates a Feature to Advance Image Consistency

By ETCentric Staff
March 15, 2024

Artificial intelligence imaging service Midjourney has been embraced by storytellers who have also been clamoring for a feature that enables characters to regenerate consistently across new requests. Now Midjourney is delivering that functionality with the addition of the new “–cref” tag (short for Character Reference), available for those who are using Midjourney v6 on the Discord server. Users can achieve the effect by adding the tag to the end of text prompts, followed by a URL that contains the master image subsequent generations should match. Midjourney will then attempt to repeat the particulars of a character’s face, body and clothing characteristics. Continue reading Midjourney Creates a Feature to Advance Image Consistency

TikTok Updates Its Code to Sync to Separate ‘TikTok Photos’

By ETCentric Staff
March 14, 2024

Having fended off challenges in the short-form video sphere since its late 2016 launch, it now appears TikTok is playing offense, laying the groundwork for a photo-sharing app that has drawn comparisons to Instagram and Pinterest. Avid TikTok users are probably familiar with a feature that lets them post still images as moving images that can be examined by advancing frame-by-frame. Now TikTok seems to want to improve that approach by building a separate TikTok Photos app to which users of the primary platform can export and showcase their still images to Android and iOS. Continue reading TikTok Updates Its Code to Sync to Separate ‘TikTok Photos’