Meta’s FAIR Team Announces a New Collection of AI Models

Meta Platforms is publicly releasing five new AI models from its Fundamental AI Research (FAIR) team, which has been experimenting with artificial intelligence since 2013. These models including image-to-text, text-to-music generation, and multi-token prediction tools. Meta is introducing a new technique called AudioSeal, an audio watermarking technique designed for the localized detection of AI-generated speech. “AudioSeal makes it possible to pinpoint AI-generated segments within a longer audio snippet,” according to Meta. The feature is timely in light of concern about potential misinformation surrounding the fall presidential election.

MIT Technology Review suggests AudioSeal “could eventually help tackle the growing use of voice cloning tools for scams” as well as deepfakes.

Based on an interview with Meta Research Scientist Hady Elsahar, Technology Review writes that in tests AudioSeal has achieved “between 90 percent and 100 percent accuracy in detecting the watermarks, much better results than in previous attempts at watermarking audio.”

There are, however, “some big caveats,” since “audio watermarks are not yet adopted widely, and there is no single agreed industry standard for them. And watermarks for AI-generated content tend to be easy to tamper with — for example, by removing or forging them.”

Meta is releasing an audio creation model called JASCO (Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation). “JASCO can take different audio inputs, such as a chord or a beat, to improve the final AI-generated sound,” VentureBeat reports, noting that a FAIR research paper explains how “JASCO lets users adjust features of a generated sound like chords, drums, and melodies to hone in on the final sound they want all through text.”

Meta is also releasing key components of its Chameleon “mixed modal” models under a research-only license. Chameleon can understand and generate both images and text, taking “any combination of text and images as input” and outputting any combination the two as well.

Possibilities include feeding Chameleon images and asking it to generate captions, “or using a mix of text prompts and images to create an entirely new scene,” Meta notes in a news post.

VentureBeat says FAIR will release Chameleon in two sizes, Chameleon 7B and 34B, pointing out that while they “understand” images, Meta will not be releasing the Chameleon image generation model “at this time,” only the text-related ones.

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.