Virtualization Technology News and Information
Cleanlab Announces Billion-Dollar Breakthrough in Detecting AI Hallucinations

A startup born in a quantum computing lab has unveiled the solution to one of generative AI's biggest problems.

Cleanlab launched the Trustworthy Language Model (TLM), a fundamental advance in generative AI that detects when large language models (LLMs) are hallucinating.

Steven Gawthorpe, PhD, Associate Director and Senior Data Scientist at Berkeley Research Group, called the Trustworthy Language Model "the first viable answer to LLM hallucinations that I've seen."

Generative AI is poised to transform every industry and profession, but it faces a major challenge in "hallucinations," when LLMs generate incorrect or misleading results. A given LLM response might sound convincing. But is it correct? Is it based in reality? LLMs offer no way to be sure. This makes automating sensitive tasks with generative AI all but impossible.

The lack of trust is the major obstacle to business adoption of LLMs. Billions of dollars of productivity gains are locked up behind this dilemma. Cleanlab is the first to crack it.

Cleanlab's TLM combines world-class uncertainty estimation, auto-ML ensembling and quantum information algorithms repurposed for general computing to add trust to generative AI. Its API wraps around any LLM, producing a reliable trustworthiness score for every response.

In industry-standard benchmarks for LLM reliability, the TLM beats other methods across the board. It delivers performance that's not just superior, but consistently superior, giving businesses the confidence to rely on generative AI for important jobs.

For example, businesses can use the TLM to automate customer refunds, bringing a human reviewer into the loop whenever an LLM's response falls below a predetermined level of trustworthiness.

"Cleanlab's TLM gives us the power of thousands of data scientists to enrich data and strengthen LLM outputs, providing 10x to 100x ROI for many of our clients. Compared to what Cleanlab is doing, other tools aren't even on the same playing field," Gawthorpe said.

"Cleanlab's TLM is a truly pioneering solution for effectively addressing hallucinations," added Akshay Pachaar, AI Engineer at "The integration of Cleanlab's trustworthiness scores transforms human-in-the-loop workflows, enabling up to 90% automation. It not only conserves hundreds of manpower hours weekly but augments our efficiency in processing substantial datasets for data enrichment, document and chat-log analysis and other large-scale tasks. It has the potential to revolutionize how we manage and derive value from data."

In addition to making LLMs more trustworthy, the TLM makes them more accurate as well. It functions as a sort of super-LLM, checking LLMs' output to deliver better results than LLMs on their own. In benchmarks comparing the accuracy of GPT4 with GPT4 + TLM, the combination of GPT4 and the TLM outperforms GPT4 by itself every time. This makes the TLM ideal for scenarios such as:

  • RAG (Retrieval Augmented Generation): Providing LLMs with more-reliable context
  • Business chatbots: Accurately answering questions from customers and employees
  • Data extraction: Extracting complex information from PDFs
  • Securities analysis: Scanning stock reviews to find the strongest buy signal.

Like other Cleanlab products, the TLM has its roots in the founders' groundbreaking research on uncertainty in AI datasets. Its CEO, Curtis Northcutt, spent eight years working with the inventor of the quantum computer to understand how to extract reliable computation from arbitrary data. Its Chief Scientist, Jonas Mueller, led the development of AutoGluon, the open-source and industry-standard Auto-ML platform for AWS. Its CTO, Anish Athlaye, is one of the world's most renowned ML developers, with more than 30,000 GitHub stars for his personal projects.

AWS, Google, JPMorgan Chase, Tesla and Walmart are a few of the Fortune 500 companies using Cleanlab's technology to improve their data inputs. Now Cleanlab is applying that same expertise to the output of LLMs - with economic implications that are, if anything, even greater.

"This is a pivot point for generative AI in the enterprise," said Cleanlab CEO Curtis Northcutt. "Adding trust to LLMs will change the calculus around their use. We'll always have some version of hallucinations. The difference is that now we have a powerful solution to detect and manage them. That means businesses can deploy generative AI for use cases that were previously undreamt of, and unlock a significant new source of productivity-and revenue."

Published Friday, April 26, 2024 8:00 AM by David Marshall
Filed under:
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<April 2024>