By
Paula ParisiDecember 12, 2024
Hundreds of thousands more YouTube channels are gaining access to its AI-powered auto-dubbing feature, which generates audio translation tracks for YouTube videos, helping to make the platform’s content more accessible to viewers around the world. The expanded rollout targets informational channels in the Partner Program, such as tutorials on cooking, sewing, tourism and home improvement. Availability “will expand to other types of content soon,” according to video streamer, which began testing the feature with select creators last year. Based on technology developed by Aloud, YouTube’s auto-dubbing emerged from the Area 120 internal incubator program. Continue reading YouTube Expands Access to Improved AI-Powered Dubbing
By
Paula ParisiDecember 3, 2024
Amazon Web Services has opened AWS Data Transfer Terminals in Los Angeles and New York. These secure physical locations allow customers to bring their storage devices for fast uploads to the AWS Cloud. The enterprise service can significantly reduce data ingestion time for use cases including uploads of “large datasets from fleets of vehicles collecting data in metro areas for training machine learning models” as well as “digital audio and video files from content creators for media processing workloads” and local government organizations compiling geographical and other smart city data. Continue reading AWS Opens Physical Locations for Fast, Secure Data Uploads
By
Paula ParisiNovember 27, 2024
Nvidia has unveiled an AI sound model research project called Fugatto that “can create any combination of music, voices and sounds” based on text and audio inputs. Described by Nvidia as “the world’s most flexible sound machine,” many appear to agree that the new model represents an audio breakthrough, with the potential to generate a wide array of sounds that have not previously existed. While popular sound models from companies including Suno and ElevenLabs “can compose a song or modify a voice, none have the dexterity of the new offering,” Nvidia claims. Continue reading Nvidia AI Model Fugatto a Breakthrough in Generative Sound
By
Paula ParisiNovember 22, 2024
Microsoft’s expansion of AI agents within the Copilot Studio ecosystem was a central focus of the company’s Ignite conference. Since the launch of Copilot Studio, more than 100,000 enterprise organizations have created or edited AI agents using the platform. Copilot Studio is getting new features to increase productivity, including multimodal capabilities that take agents beyond text and Retrieval Augmented Generation (RAG) enhancements to enable agents with real-time knowledge from multiple third-party sources, such as Salesforce, ServiceNow, and Zendesk. Integration with Azure is expanded as 1,800 large language models in the Azure catalog are made available. Continue reading Microsoft Pushes Copilot Studio Agents, Adds Azure Models
By
Paula ParisiNovember 18, 2024
YouTube has added a new feature to its Dream Track toolset, which lets select U.S. creators use AI to generate songs using the vocals of artists including John Legend, Demi Lovato, Charli XCX, Charlie Puth and others. Now users can remix Dream Track songs using natural language to describe the changes they would like, stylistic and otherwise. Selecting the “restyle a track” option will steer users to creating a 30-second generative snippet for use in YouTube Shorts. The remixed snippets will credit the original song with “clear attribution” through the Short itself and the Shorts audio pivot page. It will also clearly indicate that the track was restyled with AI, according to Google. Continue reading YouTube Dream Track Toolset Introduces an AI Remix Feature
By
Rob ScottNovember 8, 2024
In a move that may appeal to music fans as well as marketing professionals, ByteDance-owned video platform TikTok just announced the availability of a new promotional feature called “Share to TikTok” that enables users to share music, podcasts and audiobooks from Apple Music and Spotify to TikTok, directly from the share menus of the streaming services. Content shared to TikTok will feature links directing users to the original sources to foster discovery and engagement on the partnering services. The new feature follows the launch of “Add to Music App,” a tool for users to save songs they discover on the TikTok app to their streaming service of choice. Continue reading TikTok Introduces New Feature to Share and Promote Music
By
Paula ParisiOctober 31, 2024
Yahoo News has signed up to use San Jose-based cybersecurity company McAfee’s deepfake image detection technology. The scalable McAfee system can “quickly identify images that may have been produced or modified using AI, including deepfake images,” flagging them for the Yahoo News editorial standards team for human review. The standards team then “determines whether the flagged images meet the platform’s editorial guidelines.” The partnership provides news aggregator Yahoo with an extra layer of protection as it deals with a large network of global publishers in addition to policing its original content. Continue reading Yahoo Using McAfee’s Modified Image Detector to Flag Fakes
By
Paula ParisiOctober 28, 2024
OpenAI is taking a new approach to generating media that it says is 50 times faster than the models commonly used today. Called sCM, the approach is a “consistency model,” a variation on the diffusion method used by many leading systems. OpenAI claims its new model is ideal for training for large scale datasets and generating video, audio and images that are of “comparable sample quality to leading diffusion models.” Such models often require hundreds of steps, creating challenges when it comes to real-time applications. OpenAI aims to change this with a faster system that requires less power. Continue reading OpenAI: sCM Generates Media 50x Faster Than Other Models
By
Paula ParisiOctober 25, 2024
Runway is launching Act-One motion capture system that uses video and voice recordings to map human facial expressions onto characters using the company’s latest model, Gen-3 Alpha. Runway calls it “a significant step forward in using generative models for expressive live action and animated content.” Compared to past facial capture techniques — which typically require complex rigging — Act-One is driven directly and only by the performance of an actor, requiring “no extra equipment,” making it more likely to capture and preserve an authentic, nuanced performance, according to the company. Continue reading Runway’s Act-One Facial Capture Could Be a ‘Game Changer’
By
Paula ParisiOctober 22, 2024
Amazon has launched a new Fire TV Stick HD, supplanting the Fire TV Stick and Fire TV Stick Lite as its entry level television device. Priced at $34.99 the black stick plugs into the HDMI port at the back or side of most TVs. A micro USB cable and power plug are included. The Fire TV Stick HD streams at up to 1080p HD and also supports HDR, HDR 10, HDR10+ and HLG. While the new device does not feature support for Dolby Vision or Dolby Atmos, its HDMI port will support Dolby-encoded audio. The platform streamlines access to all major streaming services, which of course require independent subscriptions. Continue reading Amazon’s Entry Level Fire Stick HD Adds Alexa Voice Control
By
Paula ParisiOctober 15, 2024
Qobuz, an audio platform in operation since 2007 that has been growing in popularity worldwide, recently added the Direct Stream Digital (DSD) and Digital eXtreme Definition (DXD) audio formats as download options in its online store. Described as the ultimate in hi-res digital audio capture and playback, the two are not commonly offered. And while there is not yet a large catalog of songs, audiophiles welcome the news as a harbinger of things to come. Qobuz has already added about 22,500 mainly DSD tracks to its catalog of more than 100 million songs, and says there are more on tap. Continue reading Qobuz Queues Up Hi-Resolution DSD and DXD Music Tracks
By
Paula ParisiOctober 8, 2024
Meta Platforms has unveiled Movie Gen, a new family of AI models that generates video and audio content. Coming to Instagram next year, Movie Gen also allows a high degree of editing and effects customization using text prompts. Meta CEO Mark Zuckerberg demonstrated its abilities last week in an example shared on his Instagram account, where he sends a leg press machine at the gym through transformations as a steam punk machine and one made of molten gold. The models have been trained on a combination of licensed and publicly available datasets. Continue reading Meta’s Movie Gen Model is a Powerful Content Creation Tool
By
Paula ParisiOctober 7, 2024
Beginning October 15, YouTube Shorts will extend its maximum length to 3 minutes. The move competitively positions the Google unit against TikTok, which allows for videos of up to 10 minutes when recording, or an hour when uploading. Regular YouTube accommodates videos of up to 12 hours for verified accounts and 15 minutes for unverified accounts, whether live or uploaded. But in terms of marketing focus, the current attention is on short-form video. YouTube is also updating the Shorts player, adding templates, and introducing a Shorts trends page for mobile. Continue reading YouTube Updates Shorts Player, Extends Length to 3 Minutes
By
Paula ParisiOctober 2, 2024
Snap Inc. is leveraging its relationship with Google Cloud to use Gemini for powering generative AI experiences within Snapchat’s My AI chatbot. The multimodal capabilities of Gemini on Vertex AI will greatly increase the My AI chatbot’s ability to understand and operate across different types of information such as text, audio, image, video and code. Snapchatters can use My AI to take advantage of Google Lens-like features, including asking the chatbot “to translate a photo of a street sign while traveling abroad, or take a video of different snack offerings to ask which one is the healthiest option.” Continue reading Snapchat: My AI Goes Multimodal with Google Cloud, Gemini
By
Paula ParisiOctober 1, 2024
Google has updated its AI assistant, NotebookLM, allowing the AI note-taking and research tool to find summaries of audio files and YouTube videos. First released at the Google I/O developer conference in 2023, NotebookLM even creates sharable AI-generated audio discussions and podcasts. It allows users to upload file formats including PDFs, Google Docs, Google Slides and websites. The items, including text, can be stored in shareable “notebooks,” organizing material in a central location, and users can ask Google’s Gemini AI questions about the notebook material. Initially embraced by students and educators, it has become equally popular among business users. Continue reading Google Unveils New Updates to Its AI-Powered NotebookLM