Virtualization Technology News and Information
Article
RSS
Groq: A Game Changing AI Chip Company You Need to Know

groq-itpresstour 

In the world of artificial intelligence, there's a seismic shift underway. We are moving from the era of deep learning to the age of generative AI. Technologies like large language models (LLMs) such as ChatGPT are capturing the public's imagination and will soon power a wide range of AI applications that drive productivity gains across industries.

And during the recent 53rd IT Press Tour, we came across a Silicon Valley startup that says its poised to enable this AI revolution in a way we haven't seen before: that company is called Groq, an AI solutions company and the inventor of the Language Processing Unit accelerator that is purpose-built and software-driven to power (LLMs). During the event, Jonathan Ross, Groq's Founder and CEO, laid out how their LPU and software architecture would simplify AI adoption going forward.

Founded in 2016 by former Google engineers, Groq has invented a new type of processor tailor-made for the unique demands of LLM inference. Inference refers to the process of taking a trained AI model and running queries against it to generate text, code, images or other outputs. Unlike training models which happens periodically, inference needs to happen continuously at low latency to power real-time apps. And it's this killer combination of extremely low latency and low cost at scale that should set Groq apart.

Ross said that the company's purpose is to preserve human agency while building the AI economy. And it's mission is to drive the cost of compute to zero.

The Secret Sauce Behind Groq

At the heart of the Groq difference is their unique LPU Inference Engine, with LPU standing for Language Processing Unit, a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications with a sequential component to them, such as AI language applications (LLMs).

groqchip

While GPUs excel at the math intensity of model training, architectural limitations leave them ill-suited for low-latency inference. Workloads bottleneck on memory bandwidth and latency as data shuttles on and off chip. Groq flips the script through a host of custom innovations.

Starting from scratch, they designed the GroqChip, which houses thousands of multi-threaded processors chugging in parallel to crunch inference queries. Surrounding each chip is a unique, deterministic dataflow architecture that maximizes throughput while minimizing latency and power consumption.

Groq's TSP processors bypass the caches and control logic that create timing unpredictability. Instead, results flow directly from one execution unit to the next in a software-defined sequence, taking mere microseconds from input to output.

For large scale deployments, GroqNode servers provide a rack-ready scalable compute system. GroqNode, an eight GroqCard accelerator set, features integrated chip-to-chip connections alongside dual server-class CPUs and up to 1 TB of DRAM in a 4U server chassis. GroqNode is built to enable high performance and low latency deployment of large deep learning models.

Finally, for data center deployments, GroqRacks provides an extensible accelerator network. Combining the power of an eight GroqNode set, GroqRack features up to 64 interconnected chips. The result is a deterministic network with an end-to-end latency of only 1.6µs for a single rack, ideal for massive workloads and designed to scale out to an entire data center.

In total, Groq's LPU provides a complete inference solution that outperforms GPUs. And roadmap advancements will cement this gap further in the coming years, providing them with an opportunity to take the lead. The special sauce is thinking inference first from the ground up.

Game Changing Performance Where it Matters

In head-to-head benchmarks, Groq systems demonstrate up to 100x better latency at 1/5th the cost compared to GPU-based systems for large language model inference. Where GPU performance suffers from batch processing requirements and trips over its memory hierarchy, Groq's architecture was built from the ground up to minimize latency for individual queries.

And by eliminating costly data movement, GroqChips draw mere watts of power rather than hundreds like GPUs. This leads to 10x better energy efficiency critical for controlling exploding AI compute costs.

Partners Extending the Platform

In addition to their own capabilities, Groq is fostering an ecosystem of partners to extend the platform.

groq-aixplain

Together with aiXplain, they offer multi-language explanatory interfaces allowing users to ask "why" to understand model reasoning. "What aiXplain is doing is nothing short of creating magic for their customers," said Ross. "At Groq, we aim to create a sense of awe by accelerating generative AI applications to the point that they become immersive experiences. Thanks to the partnership between aiXplain and Groq, truly interactive engagement with AI is here, today."

Embodied uses Groq hardware and platform services for their Moxie companion robot. The low latency ensures responsiveness critical for lifelike conversational interactions. During our meeting, Ross demonstrated the interactivity enabled by their solution through a quick chat conversation with Moxie. "We're taking all of these models that already exist, and we're making them interactive. This is going to be the year that AI becomes real," said Ross.

On the commercial side, partnerships simplify go-to-market motions. Groq is working with virtual digital assistants, AU solution-building platforms, and speaking with medical imaging companies, banks, video game developers, FinTech/trading technology providers, and vector databases among others. National labs, like Argonne, are early adopters using Groq systems for research and science applications with their proprietary models for Drug Discovery and CyberSecurity, as well as language applications.

Investor Confidence Fuels Rapid Growth

Having closed around $362 million in funding to date from top VCs, Groq has invested heavily in its technology and team. The results produced by internal simulations led to several high-profile customers installing pre-production systems last year.

Now with finalized backing and ramping of production, Groq expects breakneck shipment growth over 2023 and 2024. Installed capacity is projected to grow 100x this year alone as systems deploy for production usage.

To support these volumes, operations are scaling globally. The company's lab and offices are in San Jose, situated within 17,500 sq. ft. And they've opened a new 5,150 sq. ft. corporate headquarters in Mountain View. Additional satellite offices provide access to specialized talent pools domestically and abroad.

Ross also highlighted his company's commitment to US-based manufacturing, saying "Our chip is fabricated in the US, by Global Foundries. Packaged in Canada, and assembled in California."

With technology supply chains under geopolitical pressures, Groq's local supply chain provides supply assurance as well as another differentiator with many of its competitors manufacturing overseas.

Poised to Capitalize on Exploding LLM Demand

LLMs like ChatGPT hint at what becomes possible by infusing enterprise apps with AI. Groq is ready to satisfy the burgeoning appetite for AI with high performance hardware and a mature software stack. They make deploying LLMs as straightforward as deploying a traditional microservice.

While AI training will remain important, the greatest productivity gains this decade will come from inference. It will ultimately become the dominant workload as AI permeates business processes.

Groq is uniquely positioned to drive this shift as the first company singularly focused on making AI inference fast, efficient and affordable at scale. They flip the economics from scarce GPU cycles costing thousands of dollars an hour to inferences seamlessly generated for pennies. This unlocks a rich palette for developers to paint from as they conjure the next generation of AI applications.

The Bottom Line

For CIOs and IT leaders tasked with enabling organizational AI plans, pay attention to Groq. Their technology overcomes critical barriers that previously restricted enterprises to narrow, siloed use cases.

With Groq, you can finally deploy LLMs pervasively to amplify human capabilities. The company makes it feasible both technically and economically at scale. If you're not closely tracking Groq already, they have the pedigree, product and partners to become a pillar in enterprise AI infrastructure overnight. The time to evaluate is now before demand makes availability scarce. Because in the world of generative AI, Groq is poised to give GPU dominance a run for its money.

##

Published Wednesday, February 07, 2024 7:34 AM by David Marshall
Filed under: ,
Comments
@VMblog - (Author's Link) - March 1, 2024 2:26 PM

Groq, a generative AI solutions company, has acquired Definitive Intelligence, a company redefining how businesses utilize data and empowering organizations to unlock actionable insights - all powered by AI.

To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<February 2024>
SuMoTuWeThFrSa
28293031123
45678910
11121314151617
18192021222324
252627282912
3456789