Blending: The Remarkable New Tech that outperforms ChatGPT

Blending LLMs

Summary

"Blending" is a revolutionary tech that combines multiple smaller language models to outperform larger ones like ChatGPT, while using less compute power

Small Models, Big Results: The Power of Blending

Recent research has uncovered a groundbreaking technique called “Blending” that’s challenging the notion that bigger is always better in the world of AI language models. This innovative approach combines multiple smaller language models to create a system that can outperform Generative AI giants like ChatGPT, all while using fewer computational resources.

The study compared the performance of several language models, including:

  • Pygmalion 6B
  • Chai Model 6B
  • Vicuna 13B
  • OpenAI’s GPT-3.5 (175B+ parameters)
  • A Blended model combining Pygmalion, Chai Model, and Vicuna

The researchers used two key metrics to evaluate performance:

  1. User Retention: The fraction of users returning to the platform k days after joining.
  2. User Engagement: The average time spent per visiting user.

Surprisingly, the Blended model, with a total of just 25B parameters, outperformed OpenAI’s GPT-3.5 (175B+ parameters) in both retention and engagement metrics.

The Science Behind Blending: How It Works

Blending isn’t about creating one massive neural network. Instead, it’s a method of integrating multiple chat AIs by randomly selecting which model generates each response in a conversation. Here’s how it works:

  1. Start with multiple moderately-sized, specialized language models.
  2. For each response in a conversation, randomly select one model to generate the answer.
  3. Repeat this process throughout the conversation.

This approach allows for diverse, dynamic dialogues that benefit from each model’s strengths. The researchers found that this simple method led to significantly higher user engagement and retention compared to using individual models or even much larger models like GPT-4

 

 

Learn more !

Get on a 1:1 call with our experts to discuss how Generative AI can add value to your organization !

Thank you ! You will hear back from us shortly.

 

Data Speaks: Blending vs. The Giants

The results of the large-scale A/B tests on the “Blended models” are striking:

  1. Engagement Improvement:

    • Blended (13B, 6B, 6B): 120%
    • GPT-3.5 (175B): 80%
    • Vicuna+ (13B): 20%
    • ChaiLLM (6B): 40%
  2. Retention Improvement:

    • Blended (13B, 6B, 6B): 40%
    • GPT-3.5 (175B): 20%
    • Vicuna+ (13B): 10%
    • ChaiLLM (6B): 20%

These percentages represent improvements over the control model (Pygmalion 6B) after 30 days.

The researchers also developed metrics to summarize a chat AI’s performance:

  • ∆α and ∆γ for engagement ratio
  • ∆ζ and ∆β for retention ratio

The Blended model showed the highest relative initial engagement (∆α) and the best engagement ratio decay rate (∆γ). While Vicuna had a better retention ratio decay rate (∆β), its significantly lower initial retention ratio (∆ζ) meant it would take an extended period (estimated around one year) to reach Blended’s retention score.

Perhaps most importantly, the Blended model achieved these results with an inference speed similar to that of the smaller models. This means it offers significant performance gains without increasing computational costs, making it a game-changer for companies and researchers working on AI applications.

In conclusion, the Blending technique presents a promising alternative to the trend of developing ever-larger language models. By cleverly combining smaller, specialized models, it’s possible to create AI systems that are not only more engaging and effective but also more efficient and accessible. This breakthrough could democratize advanced AI technology, making it available to a wider range of businesses and researchers who previously couldn’t afford the computational costs of running large models.

You may also like

The Digital Clerk: Transitioning to Autonomous Court Filings in 2026

The legal industry has long been haunted by the “administrative tax”—the thousands of non-billable hours consumed by the high-stakes, low-variability tasks of document assembly, metadata tagging, and jurisdictional filing. Historically, the “Clerk of the Court” was a human gatekeeper, and the “Legal Assistant” was the manual bridge between an attorney’s work product and the judicial record. However, as we move through 2026, the volume of litigation and the complexity of multi-district electronic filing systems (e-filing) have surpassed the limits of manual human processing.

read more

Market Access Agents: Navigating the Global Reimbursement Labyrinth with Agentic Intelligence

In the pharmaceutical landscape of 2026, the “moment of truth” has shifted. It is no longer found solely in the laboratory or even in the successful conclusion of a Phase III clinical trial. Instead, the survival of a therapeutic asset—and by extension, the patients who rely on it—is decided in the boardrooms of Health Technology Assessment (HTA) bodies and national payers. We have entered the era of the “Value-Based Mandate,” where scientific efficacy is merely the entry fee, and the true currency is evidence of cost-effectiveness and real-world impact.

read more

Wealth Management Agents: Redefining Fiduciary Duty in the Age of Autonomy

The transition from traditional digital wealth management to Agentic Financial Advisory represents the most significant shift in fiduciary responsibility since the passage of the Investment Advisers Act of 1940. In 2026, the financial services sector has moved beyond the “Chatbot Era.” We have entered an age where autonomous agents do not merely suggest portfolios; they execute trades, manage tax-loss harvesting, and negotiate complex private market entries on behalf of clients. For BFSI (Banking, Financial Services, and Insurance) leaders, this shift necessitates a fundamental re-evaluation of Fiduciary Duty.

read more