Mark Zuckerberg Unveils SAM 2 AI Tech for Segmenting Video

Meta Platforms CEO Mark Zuckerberg unveiled the latest version of computer vision platform SAM 2, an update on the company’s Segment Anything Model that automates for video what the original SAM did for still images — identifying the edges of an object and isolating it in the frame. Zuckerberg demonstrated SAM 2 as part of a SIGGRAPH 2024 keynote session in which he was interviewed by Nvidia CEO Jensen Huang. “Being able to do this in video and have it be zero shot and tell it what you want, it’s pretty cool,” Zuckerberg said. Meta is sharing the code and model weights for SAM 2 with a permissive Apache 2.0 license.

Like the original SAM, which debuted last year, SAM 2 is open-source and free to use, though currently no web-based version exists. It works with video and stills. The source code can be downloaded at GitHub.

“SAM 2 can segment any object and consistently follow it across all frames of a video in real time — unlocking new possibilities for video editing and new experiences in mixed reality,” Meta explains in a news post with video examples. Meta has also posted a free demo.

Considering the computational demands of processing video, TechCrunch calls SAM 2 a testament to efficiency that it “can run without melting the data center,” adding that “it’s still a huge model that needs serious hardware to work, but fast, flexible segmentation was practically impossible even a year ago.”

Segmentation is a term of art for applying vision models to identifying different parts of a picture. “‘This is a dog, this is a tree behind the dog’ hopefully, and not ‘this is a tree growing out of a dog,’” is the example used by TechCrunch, which says the technique has been applied “for decades,” but has recently gotten better and faster, with Segment Anything serving as “a major step forward.”

A Meta blog post says the company is also “sharing the SA-V dataset, which includes approximately 51,000 real-world videos and more than 600,000 masklets (spatio-temporal masks).”

TechCrunch reports that “another database of over 100,000 ‘internally available’ videos was also used for training, and this one is not being made public,” speculating that the unreleased material is “sourced from public Instagram and Facebook profiles.”

Speaking at SIGGRAPH, Zuckerberg said “I kind of dream of one day like you can almost imagine all of Facebook or Instagram being like a single AI model that has unified all these different content types and systems together,” as quoted on an Nvidia blog. The session is available for replay on Nvidia’s YouTube channel.

SiliconANGLE writes that SAM 2 “has big implications in video editing, as it can cut down on the time it takes to edit.” Zuckerberg also “talked about his vision of a future where Facebook and Instagram might be able to generate AI doubles of social media influencers and content creators that act like ‘an agent or assistant that their community can interact with.’”

No Comments Yet

You can be the first to comment!

Leave a comment

You must be logged in to post a comment.