By
ETCentric StaffFebruary 21, 2024
Researchers at Amazon have trained what they are calling the largest text-to-speech model ever created, which they claim is exhibiting “emergent” qualities — the ability to inherently improve itself at speaking complex sentences naturally. Called BASE TTS, for Big Adaptive Streamable TTS with Emergent abilities, the new model could pave the way for more human-like interactions with AI, reports suggest. Trained on 100,000 hours of public domain speech data, BASE TTS offers “state-of-the-art naturalness” in English as well as some German, Dutch and Spanish. Text-to-speech models are used in developing voice assistants for smart devices and apps and accessibility. Continue reading Amazon Claims ’Emergent Abilities’ for Text-to-Speech Model
By
ETCentric StaffFebruary 16, 2024
Apple has taken a novel approach to animation with Keyframer, using large language models to add motion to static images through natural language prompts. “The application of LLMs to animation is underexplored,” Apple researchers say in a paper that describes Keyframer as an “animation prototyping tool.” Based on input from animators and engineers, Keyframer lets users refine their work through “a combination of prompting and direct editing,” the paper explains. The LLM can generate CSS animation code. Users can also use natural language to request design variations. Continue reading Apple’s Keyframer AI Tool Uses LLMs to Prototype Animation
By
ETCentric StaffFebruary 9, 2024
Apple has released MGIE, an open-source AI model that edits images using natural language instructions. MGIE, short for MLLM-Guided Image Editing, can also modify and optimize images. Developed in conjunction with University of California Santa Barbara, MGIE is Apple’s first AI model. The multimodal MGIE, which understands text and image input, also crops, resizes, flips, and adds filters based on text instructions using what Apple says is an easier instruction set than other AI editing programs, and is simpler and faster than learning a traditional program, like Apple’s own Final Cut Pro. Continue reading Apple Launches Open-Source Language-Based Image Editor
By
Paula ParisiJanuary 12, 2024
Santa Monica-based AI startup Rabbit Inc. is offering a virtual assistant in the form of a pocket device that the company says can improve upon mobile phones by learning to use your apps and running them for you. Heavily publicized at CES 2024 in Las Vegas this week, the initial run of the company’s r-1 units had as of Tuesday sold out at $199 each. The retro-looking device with a 2.88-inch touchscreen is continuing to take preorders; shipments are scheduled to begin in late March. The company says its proprietary Rabbit OS is the first operating system built on a Large Action Model (LAM) foundation. LAMs are LLMs trained on datasets of actions and consequences. Continue reading CES: Rabbit Launches AI-Powered Pocket Controller for Apps
By
Paula ParisiDecember 12, 2023
Google personalized AI assistant NotebookLM is an experimental product that has been in early access since July. Now the company is integrating its new Gemini Pro LLM with NotebookLM and making it available to U.S. residents 18 and older. NotebookLM is engineered “to help you do your best thinking,” Google says, with documents uploaded to the service making it “an instant expert in the information you need,” allowing it to answer questions about your data. Unlike generic chatbots, NotebookLM draws responses from the documents you feed it, meaning it will be hyper-focused — a lite version of a custom trained model. Continue reading Google’s NotebookLM is a Personalized Lite Language Model
By
Paula ParisiNovember 10, 2023
Startup Flip AI has built a custom LLM to run its observability platform. Observability is the act of monitoring corporate IT systems, ferreting out issues or identifying potential problems before they occur. It’s a 24/7 process, and can slow down sites or apps, sometimes causing crashes. Not to be confused with the PDF reader app, Flip AI has trained an LLM specifically to monitor new and emerging challenges. Concurrently, Flip AI has announced $6.5 million in seed funding led by Factory with participation from Morgan Stanley Next Level Fund and GTM Capital. Continue reading Startup Flip AI Creates Custom LLM to Address Observability
By
Paula ParisiNovember 8, 2023
Now anyone can make their own GPT chatbot, for fun or productivity — no coding skills necessary — and soon will be able to list it on a marketplace called the GPT Store. This was among the news announcements to come out of OpenAI’s first developer conference — OpenAI DevDay in San Francisco — where a new, lower-priced model called GPT-4 Turbo with 128K context, was unveiled, along with a new Assistants API, GPT-4 Turbo with Vision and the DALL-E 3 API. Now in preview, GPT-4 Turbo “is more capable and has knowledge of world events up to April 2023,” according to OpenAI. Continue reading OpenAI Intros GPT-4 Turbo, Creator Chatbots at Dev Confab
By
Paula ParisiOctober 27, 2023
The University of Science and Technology of China (USTC) and Tencent YouTu Lab have released a research paper on a new framework called Woodpecker, designed to correct hallucinations in multimodal large language AI models. “Hallucination is a big shadow hanging over the rapidly evolving MLLMs,” writes the group, describing the phenomenon as when MLLMs “output descriptions that are inconsistent with the input image.” Solutions to date focus mainly on “instruction-tuning,” a form of retraining that is data and computation intensive. Woodpecker takes a training-free approach that purports to correct hallucinations from the basis of the generated text. Continue reading Woodpecker: Chinese Researchers Combat AI Hallucinations
By
Paula ParisiOctober 11, 2023
OpenAI began previewing vision capabilities for GPT-4 in March, and the company is now starting to roll out the image input and output to users of its popular ChatGPT. The multimodal expansion also includes audio functionality, with OpenAI proclaiming late last month that “ChatGPT can now see, hear and speak.” The upgrade vaults GPT-4 into the multimodal category with what OpenAI is apparently calling GPT-4V (for “Vision,” though equally applicable to “Voice”). “We’re rolling out voice and images in ChatGPT to Plus and Enterprise users,” OpenAI announced. Continue reading ChatGPT Goes Multimodal: OpenAI Adds Vision, Voice Ability
By
Paula ParisiSeptember 26, 2023
Amazon has entered into a strategic investment in San Francisco-based Anthropic, founded by former members of OpenAI. The AI startup will train and deploy future models using AWS Trainium and Inferentia chips to train and deploy future foundation models with AWS as its primary cloud provider. In turn, Amazon says it will invest up to $4 billion in Anthropic, as it strives to compete with other technology firms in the race to develop generative AI, seeding growth for what is shaping up to be an entirely new economic and social landscape. Continue reading Amazon Plans to Invest Up to $4 Billion in AI Startup Anthropic
By
Paula ParisiSeptember 11, 2023
Financial software giant Intuit is adding a customer-facing AI assistant to work with individuals and small businesses. Intuit Assist is being integrated across Intuit products starting with TurboTax and expanding to QuickBooks, Credit Karma and Mailchimp. Assist will be embedded across Intuit’s products via a common user interface, allowing customers to get personalized recommendations via contextual datasets. The generative AI assistant was built using Intuit’s Generative AI Operating System, a proprietary corporate model dubbed GenOS, launched in June. Intuit is working with OpenAI to accelerate GenAI app development on GenOS. Continue reading Intuit’s GenOS Spawns Its First Customer AI Product: ‘Assist’