Groq logo
AIaiinferencechips

GROQ

Netfigo Verdict
on Groq

Groq didn't set out to build a better AI model. They built a better engine to run everyone else's models — faster than anything on the planet. Their Language Processing Unit, the LPU, can spit out tokens so fast it feels like the model is thinking in real time. While OpenAI and Anthropic are locked in a race to train smarter models, Groq is betting the next war is about speed and cost per token. That's either a genius flanking move or the most expensive infrastructure bet in AI history. Early signs suggest the former.

Founded

2016

HQ

Mountain View, USA

Total Raised

$1.09 billion

Founder

Jonathan Ross

Status

Private

Website

groq.com

THE ORIGIN STORY

Jonathan Ross was one of the engineers behind Google's first Tensor Processing Unit — the TPU, the custom chip Google built to run its own AI workloads faster and cheaper than GPUs could. He left Google in 2016 convinced that the rest of the world would eventually face the same bottleneck Google had solved internally: general-purpose GPUs are incredibly powerful but wildly inefficient at the specific math that AI inference requires.

He founded Groq — yes, named after a verb Douglas Adams coined in Stranger in a Strange Land, meaning to understand something so thoroughly you become one with it — with the idea of building a chip purpose-built for inference, not training. Most of the AI chip hype at the time was focused on training: making models smarter.

Ross was thinking one step ahead. Once the models exist, someone has to run them billions of times a day, cheaply and fast.

That someone could be Groq.

The company spent years in near-stealth building the LPU and quietly assembling a team of chip architects who'd worked at Google, AMD, Intel, and NVIDIA. They weren't trying to beat NVIDIA at training.

They were trying to make NVIDIA irrelevant at inference. In 2024, when they opened GroqCloud to developers and demos started circulating showing Llama 3 running at 800 tokens per second — roughly 10x what most GPU-based services could deliver — the internet noticed.

Groq went from a chip company most people had never heard of to the fastest AI inference platform on the planet, seemingly overnight.

WHAT THEY ACTUALLY DO

Groq makes money by running AI models for other people — faster and cheaper than the competition. That's the entire pitch.

Developers and companies come to GroqCloud when they need to run open-source models like Meta's Llama, Mistral, or Google's Gemma at scale. Instead of paying for GPU time on AWS or Azure, they pay Groq per million tokens processed.

The LPU architecture means Groq can deliver those tokens faster and at a lower cost than GPU-based alternatives, so the economics work for both sides: customers get better performance, Groq gets the volume.

For individuals and small developers, GroqCloud has a generous free tier — deliberately so. Groq wants developers to build on their infrastructure, write integrations, and get hooked on the speed before enterprise pricing kicks in.

The enterprise side is where the real money is: large companies running AI assistants, coding tools, customer service bots, or data pipelines that need fast, reliable inference at scale. Those customers pay for API access at volume pricing.

Groq also sells LPU-based hardware directly to hyperscalers and large enterprises who want to run inference on-premises. This is a smaller but high-margin part of the business, particularly relevant for government and defense customers where data sovereignty means cloud inference isn't an option.

The key insight in the business model is that Groq doesn't need to win the model race. They need everyone who wins the model race to need fast inference.

If open-source models keep getting better — and they are — Groq's market grows automatically.

THE PRODUCTS

GroqCloud is the main product — an API platform that lets developers run open-source AI models at Groq's signature speed. You call it the same way you'd call OpenAI's API.

The difference is the response comes back so fast it feels synchronous. Supports Llama 3, Mistral, Mixtral, Gemma, and a growing list of open models.

Free tier available. Paid tiers priced per million tokens.

Groq's Language Processing Unit (LPU) is the hardware underneath everything. It's a custom chip designed specifically for inference workloads — the kind of repetitive, sequential matrix math that running a language model requires.

Unlike GPUs, which have to juggle memory in complex ways to handle AI inference, the LPU has a deterministic execution model that means it always runs at predictable, maximum speed. No variance.

No cold starts. Just fast.

Groq also offers on-premises LPU deployments for enterprises that can't or won't use cloud infrastructure. This is particularly relevant for financial services and government clients.

You get the speed of GroqCloud without your data leaving your environment.

GroqChat is a consumer-facing chat interface — basically a ChatGPT-style UI sitting on top of GroqCloud. It's mostly a demo tool and developer playground rather than a serious consumer product, but it's been genuinely useful for showing non-technical stakeholders what inference speed actually feels like in practice.

The contrast with ChatGPT is visceral the first time you use it.

HOW THEY GREW

Groq's growth came from a single, viral moment that money couldn't have bought: speed demos.

In early 2024, videos started circulating on X and Reddit of Groq running Llama 2 and Mistral at speeds that looked fake. Tokens streaming in so fast the text was hard to read.

Developers who'd spent months dealing with GPT-4's latency watched the demos and immediately signed up for GroqCloud. The waiting list exploded.

The product went viral on pure technical merit, which almost never happens in enterprise infrastructure.

Groq leaned into this hard. They published benchmarks.

They made it easy to compare their token-per-second speed against OpenAI and Anthropic directly. They built a dead-simple API that any developer who'd used OpenAI's API could switch to in about ten minutes.

That low switching cost was intentional — the faster a developer could try it, the faster they'd get addicted to the speed.

The other counterintuitive move was leaning into open-source models rather than competing with them. Groq doesn't build foundation models.

They run Meta's, Mistral's, and Google's open models — which means every time Meta improves Llama, Groq's product gets better for free. They turned the open-source ecosystem into their R&D department.

Enterprise growth followed developer love. When developers inside large companies started using GroqCloud on personal projects and reporting the speed back to their teams, procurement conversations started from a position of enthusiasm rather than skepticism.

Bottom-up developer adoption converting to top-down enterprise contracts is the exact playbook Stripe and Twilio used. Groq is running the same play, one token at a time.

THE HARD PART

Groq's biggest challenge is one that every chip startup eventually faces: NVIDIA doesn't stay still.

NVIDIA's H100 and Blackwell chips are designed for training but they're also formidably capable at inference — and NVIDIA keeps optimizing its software stack (CUDA) to close the gap on specialized inference chips. Every quarter that passes gives NVIDIA more time to match what Groq does natively.

If inference speed becomes 'good enough' on standard GPUs, Groq's core differentiation shrinks.

There's also the capital intensity problem. Building and manufacturing custom silicon is extraordinarily expensive.

Groq uses TSMC for fabrication, which means they're competing with Apple, AMD, NVIDIA, and every other fabless chip company for capacity. Scaling up to meet enterprise demand requires enormous upfront hardware investment before the revenue comes in.

The $640 million Series D in 2024 was partly to fund exactly this — buying more LPU capacity to match growing demand.

And then there's the model consolidation risk. Groq's business depends on open-source models being widely adopted.

If the market consolidates around proprietary models from OpenAI or Anthropic — models Groq doesn't have permission to run — the addressable market shrinks. Meta keeping Llama open is critical to Groq's long-term story.

That's a dependency they can't fully control.

Finally, Groq is not alone in the inference infrastructure race. Cerebras, SambaNova, and a half-dozen stealth startups are all chasing the same opportunity.

The window to establish infrastructure lock-in is real but it's also closing.

MONEY TRAIL

Series A

2017 · Led by Social Capital

$10M raised

Series B

2019 · Led by Tiger Global Management

$52M raised

Series C

2021 · Led by Tiger Global Management

$300M raised

$1.0B valuation

Series D

2024 · Led by BlackRock

$640M raised

$2.8B valuation

WHO BACKED THEM

Groq's investor list reads like a who's who of people who understand what infrastructure bets look like before everyone else does.

The Series D in August 2024 — $640 million at a $2.8 billion valuation — was led by BlackRock, which is notable because BlackRock doesn't usually write checks into AI infrastructure startups. When the world's largest asset manager starts making direct bets on AI chip companies, it signals something about how seriously the institutional world is taking the inference compute shortage.

Saudi Arabia's sovereign wealth vehicles, via NEOM and related entities, have also participated in Groq's fundraising — reflecting the Gulf states' aggressive push to secure AI infrastructure positions before the market matures. Tiger Global, Neuberger Berman, and a mix of strategic and financial investors round out the cap table.

Earlier rounds included backing from Social Capital (Chamath Palihapitiya's fund), which has a track record of early bets on infrastructure plays that look obvious in hindsight. The fact that Groq raised over a billion dollars without a consumer product, without proprietary models, and without the brand recognition of its competitors says something about how compelling the underlying technical story is to people who've seen enough chip cycles to know what a real architectural advantage looks like.