Saturday, May 17, 2025
HomeTech NewsMeta has launched an AI model capable of evaluating other AI models'...

Meta has launched an AI model capable of evaluating other AI models’ work, along with Spirit LM, which seamlessly integrates text and speech.

Mark Zuckerberg’s company Meta announced on Friday that its research division, Fundamental AI Research (FAIR), is releasing some Meta new AI model. These models include a ‘Self-Taught Evaluator’ that shows the potential for less human involvement in the AI ​​development process and another model that freely mixes text and speech.

The latest announcements come after Meta’s paper in August said these models would be based on the ‘chain of thought’ mechanism, also used in OpenAI’s recent O1 models, in which models think before responding. Google and Anthropic have also published research on the concept of Reinforcement Learning from AI Feedback, but these models are not yet available for public use.

Meta’s FAIR AI researchers say these new releases support the company’s goal of advanced machine intelligence while promoting open science and reproducibility. Newly released models include the updated Segment Anything Model 2 for images and videos, Meta Spirit LM, Layer Skip, SALSA, Meta Lingua, OMat24, MEXMA, and Self-Taught Evaluator.

Self-Taught Evaluator

Meta describes its new model as a “strong generative reward model with synthetic data.” The company claims that this is a new technique that generates preference data to train reward models without relying on human annotations. “This approach generates contrasting model outputs and trains LLM-as-a-Judge for evaluation and final judgments, with an iterative self-improvement scheme,” Meta explained in its official blog post.

Essentially, the Self-Taught Evaluator is a new technique that generates its data to train reward models without requiring human labelling. Meta means that this model generates outputs from different AI models and then uses another AI to assess and improve them. This is an iterative process. According to Meta, this model is powerful and performs better than models that rely on human-labeled data, such as GPT-4 and other models.

Meta Spirit LM

Spirit LM is an open-source language model that seamlessly integrates speech and text. Large language models are often used to convert speech to text or convert text to speech. But this can eliminate the natural expressiveness that real speech has. Meta has developed Spirit LM, its first open-source model that can work with text and speech more naturally.

“Today’s AI voice experiences often use ASR techniques to process speech, and then synthesize with LLM to generate text – but these approaches compromise the expressive aspects of speech. Spirit LM models use phonetic, pitch, and tone tokens to overcome these limitations for both inputs and outputs and generate more natural-sounding speech, while learning new tasks across ASR, TTS and speech classification,” Meta said in its tweet.

Meta LM is trained on both speech and text data, which allows effortless switching between the two. Meta has created two versions – Spirit LM Base which focuses on speech sounds, and Spirit LM which captures the tone and emotion in speech, such as anger, and excitement, so that speech sounds more realistic. Meta says this model can create more natural-sounding speech and also learns tasks such as speech recognition, text-to-speech conversion, and classifying different speech types.

Also Read: How to Invest in Stocks: Tips and Strategies for Maximum Returns

Most Popular

Mark Zuckerberg's company Meta announced on Friday that its research division, Fundamental AI Research (FAIR), is releasing some Meta new AI model. These models include a 'Self-Taught Evaluator' that shows the potential for less human involvement in the AI ​​development process and another model that freely mixes text and speech.

The latest announcements come after Meta's paper in August said these models would be based on the 'chain of thought' mechanism, also used in OpenAI's recent O1 models, in which models think before responding. Google and Anthropic have also published research on the concept of Reinforcement Learning from AI Feedback, but these models are not yet available for public use.

Meta's FAIR AI researchers say these new releases support the company's goal of advanced machine intelligence while promoting open science and reproducibility. Newly released models include the updated Segment Anything Model 2 for images and videos, Meta Spirit LM, Layer Skip, SALSA, Meta Lingua, OMat24, MEXMA, and Self-Taught Evaluator.

Self-Taught Evaluator

Meta describes its new model as a "strong generative reward model with synthetic data." The company claims that this is a new technique that generates preference data to train reward models without relying on human annotations. "This approach generates contrasting model outputs and trains LLM-as-a-Judge for evaluation and final judgments, with an iterative self-improvement scheme," Meta explained in its official blog post.

Essentially, the Self-Taught Evaluator is a new technique that generates its data to train reward models without requiring human labelling. Meta means that this model generates outputs from different AI models and then uses another AI to assess and improve them. This is an iterative process. According to Meta, this model is powerful and performs better than models that rely on human-labeled data, such as GPT-4 and other models.

Meta Spirit LM

Spirit LM is an open-source language model that seamlessly integrates speech and text. Large language models are often used to convert speech to text or convert text to speech. But this can eliminate the natural expressiveness that real speech has. Meta has developed Spirit LM, its first open-source model that can work with text and speech more naturally.

"Today's AI voice experiences often use ASR techniques to process speech, and then synthesize with LLM to generate text – but these approaches compromise the expressive aspects of speech. Spirit LM models use phonetic, pitch, and tone tokens to overcome these limitations for both inputs and outputs and generate more natural-sounding speech, while learning new tasks across ASR, TTS and speech classification," Meta said in its tweet.

Meta LM is trained on both speech and text data, which allows effortless switching between the two. Meta has created two versions – Spirit LM Base which focuses on speech sounds, and Spirit LM which captures the tone and emotion in speech, such as anger, and excitement, so that speech sounds more realistic. Meta says this model can create more natural-sounding speech and also learns tasks such as speech recognition, text-to-speech conversion, and classifying different speech types.

Also Read: How to Invest in Stocks: Tips and Strategies for Maximum Returns