Meta Announces New AI Tools For Building Fully Independent AI Meta’s CEO, Mark Zuckerberg, has announced a series of new AI models through its research arm, Fundamental AI Research, or FAIR. The latest models consist of the “Self-Taught Evaluator,” designed to minimize human intervention in creating AI, and another that combines text-to-speech fluidly.
This statement follows a report from Meta last August regarding how such AI models apply what has been termed a “chain of thought” approach—similar to what was used by OpenAI for its recent models, which consider context with care before providing answers. Google and Anthropic have also purported experiments into this application of Reinforcement Learning from AI Feedback, though findings have not yet been publicized.
This new AI fits Meta’s FAIR team’s vision of furthering machine intelligence through open research and complete transparency. These new models include the revised and updated Segment Anything Model 2 for images and videos, Meta Spirit LM, Layer Skip, SALSA, Meta Lingua, OMat24, MEXMA, and the Self-Taught Evaluator.
Self-Taught Evaluator
Meta’s Self-Taught Evaluator is a new way to test the validity of other AI models. It has been defined as “a strong generative reward model with synthetic data.” This method derives training data for the reward models to eliminate human labeling. It gives varied outcomes through the AI models and uses another AI system to analyze and refine the output in an endless cycle. Meta claims that it surpasses all human-labeled-data-based models, including GPT-4.
Meta Spirit LM
Meta has developed its newest open-source language model, Spirit LM, to combine speech and text in a very fluid motion. Traditional language models usually convert speech into text when responding and miss the authentic natural expression of speaking. To solve this, Meta created Spirit LM, which doesn’t lose the quality of the natural expression of speech by processing both text and spoken words.
While most AI voice systems depend on automatic speech recognition before creating responses, Spirit LM uses phonetic, pitch, and tone markers. This will enable it to make more natural sounds and be trained in speech recognition, text-to-speech, or speech classification tasks. Meta maintains that Spirit LM easily flips between speech and text, giving users a much more authentic experience.
Meta has created two variants of this model: Spirit LM Base, which focuses on the essential elements of speech, and the full Spirit LM, which will also capture emotional tones to make the output more real-like speech. It also learns tasks like speech recognition, text-to-speech conversion, and categorizing different kinds of speech.
These new AI tools from Meta are intended to challenge the limits of independent development, with some open-source research and collaboration thrown in.