What to know
- Nvidia posted $215.9 billion in revenue last fiscal year — up 65% — mostly from AI workloads built on its CUDA software.
- The real moat isn't hardware: universities teach CUDA, researchers code in CUDA, and rewriting that code costs millions.
- AMD and Intel's AI chip revenue forecasts may be structurally overestimated — and cloud-giant capex isn't the threat to Nvidia it looks like, because hyperscalers' own enterprise customers demand CUDA compatibility.
Every time you hear about a new AI model — ChatGPT, Gemini, Claude, whatever comes next — there's a piece of software running beneath it. Almost nobody outside Silicon Valley talks about it. It's called CUDA.
CUDA isn't a chip. It's a programming language — or more precisely, a software platform — that tells Nvidia's chips what to do. And it's the reason Nvidia doesn't just sell the best GPUs. It sells the only GPUs most AI developers know how to use.
Imagine if every recipe in the world were written in French. You could build the most beautiful kitchen on Earth, but if your chefs only read French, they're going to keep cooking with French cookbooks. That's CUDA. And it might be the most important competitive advantage in all of tech right now.
Let's trace the dominoes.
What just happened
Nvidia just closed its fiscal year 2026 (which ended January 25, 2026) with staggering numbers. Total revenue hit $215.9 billion, up 65% from the prior year. The Compute & Networking segment — the part most directly tied to AI — generated $193.5 billion of that, up 67%.
Operating income reached $130.4 billion, up 60%. Even after a one-time $4.5 billion charge related to excess H20 chip inventory, the company's gross margin (the percentage of revenue left after direct costs) was 71.1%. For context, most hardware companies would celebrate a 40% gross margin.
These numbers are impressive on their own. But they only make sense when you understand the invisible layer underneath: CUDA, the software platform that makes Nvidia's hardware irreplaceable for most AI workloads.
First domino: The iOS of AI chips — technical lock-in
Over the past decade, the entire AI research community has built its tools, libraries, and workflows on CUDA. When a company trains a large language model, the code that orchestrates thousands of GPUs working in parallel is written for CUDA.
Rewriting that code for a rival platform — like AMD's ROCm or Intel's oneAPI — means re-testing every step. It means fixing new bugs and risking delays on projects worth hundreds of millions of dollars.
This lock-in is why Nvidia can charge premium prices and still see customers line up. The company spent $18.5 billion on R&D in fiscal 2026, up 43% from the prior year. A big chunk of that goes into making CUDA better — adding new features, supporting new model architectures, optimizing performance.
More profit funds more R&D, which improves the platform, which attracts more developers, which sells more chips. This cycle is accelerating every year.
The result shows up in the financials. Even with gross margins dipping from 75% to 71.1% — partly because of that $4.5 billion H20 inventory write-down — Nvidia is still printing money at a rate that would make most software companies jealous, let alone a hardware maker.

Second domino: AMD's real path isn't replacing CUDA — it's flanking it
The non-obvious insight is that AMD's best shot isn't replacing CUDA head-on. It's winning inference (using a trained model to make predictions, not training it) workloads at the edge — the part of AI where a trained model answers questions, generates images, or makes predictions in real time. Inference doesn't require the same deep CUDA integration that training does. It's more standardized and more cost-sensitive. It increasingly runs on lean software where CUDA's extra baggage is a liability, not an advantage.
For a company spending $100 million to train an AI model, the risk calculus is brutal. Even a 5% chance of delay or performance regression isn't worth the savings on cheaper chips. But for deploying that model to millions of users? Cost per query matters enormously, and that's where AMD could carve out real revenue.
The investment implication: Wall Street's AI chip revenue forecasts for AMD are heavily weighted toward training workloads, where CUDA lock-in caps the addressable market. If AMD's growth instead comes from inference and edge deployment (running AI on devices near the user, not in big data centers) — lower-margin, higher-volume work — the revenue mix and profit margins will look very different from what the stock price assumes today.
What would change the picture entirely? A top-10 AI lab publicly running a flagship training job on ROCm without performance penalties. Until that happens, AMD's AI story is an inference story — and investors should price it accordingly.
GPU Software Ecosystems: Maturity vs. Traction
| Metric | CUDA | AMD ROCm | Intel oneAPI |
|---|---|---|---|
| Training workloads | Dominant (97%+ of LLMs) | Emerging, niche | Minimal adoption |
| Inference workloads | Strong, but vulnerable | Competitive at edge | Cost-focused advantage |
| Developer base | Universities, enterprises, startups | AMD ecosystem only | Limited to Intel partners |
| Lock-in strength | Very high (code rewrite cost) | Low to moderate | Low |
Third domino: Hyperscalers are Nvidia's distribution channel — and they can't quit
In fiscal 2026, Nvidia's largest direct customer accounted for 22% of total revenue. The second-largest accounted for 14%. These are almost certainly hyperscale cloud providers — the exact companies building their own chips.
So why do they keep buying? Why? Because their business customers demand cloud servers that run CUDA. When a bank or a pharma company rents cloud GPUs to train an AI model, their engineers write CUDA code. If a cloud provider only offered custom chips with a different software stack, those enterprise customers would go to a competitor that offers Nvidia.
Custom chips (called ASICs) trade flexibility for efficiency. They're great at narrow tasks — like running a specific model in production. But they can't replace general-purpose GPUs for the messy, experimental work of training new models. Cloud providers will use their custom chips for internal workloads, but they'll keep buying Nvidia for the instances they sell to everyone else.
This setup means Nvidia profits from the big cloud companies' reach — even as those same companies build their own rival chips. Big cloud companies keep buying Nvidia chips because their business customers demand CUDA support. That happens even as those same cloud giants build their own custom chips.

Fourth domino: The university pipeline locks in the next generation
Computer science programs around the world teach GPU programming using CUDA. When a PhD student spends four years writing thesis code in CUDA, she carries that expertise into her new employer — which then builds on CUDA because it's what the team already knows.
This creates a self-reinforcing cycle through the labor market. Companies hire CUDA-trained engineers, which means their projects run on Nvidia hardware, which means the next generation of students learns CUDA to be employable.
This human-capital lock-in may be even harder to break than the technical lock-in described in domino one. You can rewrite code. You can't easily retrain an entire generation of AI researchers.
Nvidia sits on $62.6 billion in cash and investments (as of fiscal year end 2026). Last fiscal year alone, the company poured $17.5 billion into private companies and infrastructure investments. Some of that money flows into university partnerships, research grants, and developer programs. That deepens the talent pipeline, which feeds right back into more CUDA adoption.
Fifth domino: Export controls turn CUDA into a geopolitical chokepoint
When the U.S. Restricts chip exports to China, the conversation focuses on hardware. But CUDA is the software those chips need to be useful. Controlling it means controlling the programming tools, the performance tuning, and the entire developer ecosystem.
Nvidia's H20 chip — designed as a lower-performance product for the Chinese market — generated only about $60 million in revenue in fiscal 2026, after the company took a $4.5 billion inventory charge on unsold units and purchase commitments. International revenue overall was 31% of Nvidia's total.
Tightening export controls or foreign governments pushing for AI sovereignty could create headwinds for Nvidia's international business. China and the EU both have incentives to reduce dependency on a single U.S. Company's proprietary software stack.
But building a CUDA alternative isn't just a technical challenge — it's an ecosystem challenge. And ecosystems take decades to build. This is a real long-term risk, but not one that changes the story in the next two to three years.
The last time this happened
The closest historical parallel isn't another chip company — it's Microsoft in the 1990s.
Windows didn't become the dominant operating system by being the best technology out there. It won because developers wrote software for Windows. More developers attracted more users. More users attracted more developers. By the time Linux offered a credible alternative, the ecosystem was too deep to escape.
Microsoft's Windows moat eventually eroded — but it took a full computing paradigm shift (the move from desktops to mobile and cloud) to do it. Direct competitors like OS/2 and BeOS never made a dent. The moat didn't break from the front. It was flanked.
The critical question is: what's the browser equivalent for CUDA? In the Windows era, the browser became a cross-platform runtime that made the underlying OS irrelevant for more and more tasks. Today, projects like OpenAI's Triton compiler, Apache TVM, and JAX are trying to do the same thing. They're building translation layers that let AI code run on any chip — no CUDA-specific rewrites needed. If one of these projects reaches the point where a major AI lab can train a frontier model without touching CUDA, that's the browser moment. It hasn't happened yet. But the analogy tells us where to look: the threat won't come from a better GPU. It'll come from a layer that makes the GPU's software stack irrelevant.
What could go wrong
Gross margin compression accelerates. Nvidia's gross margin already fell from 75% to 71.1% in one year. Competition, pricing pressure from hyperscalers, or inventory charges could push margins into the low 60s. If gross margins fall below 65% within the next two fiscal years, the valuation premium embedded in CUDA lock-in becomes hard to justify at current multiples — even if revenue keeps growing.
Open-source abstraction layers gain traction. Projects like OpenAI's Triton compiler, Apache TVM, and Google's JAX are all designed to let developers write AI code that runs on any hardware — not just Nvidia's. Here's the trigger to watch: if a top-10 AI lab publicly moves a major training run off CUDA using one of these translation layers, that's the signal to rethink Nvidia's lock-in premium — the extra valuation investors pay because customers can't easily switch. These efforts are years away from mattering at scale, but worth monitoring closely.
A computing paradigm shift makes GPUs obsolete. If AI training moves to a fundamentally different architecture — neuromorphic chips, optical computing, or something not yet invented — CUDA's relevance evaporates along with the GPU. This is unlikely in the next five years, but it's the only scenario that would invalidate the entire thesis.
Geopolitical escalation fragments the market. If China develops a domestic CUDA alternative and mandates its use, it could permanently lock Nvidia out of the world's second-largest economy. That's 31% of international revenue at risk — not fatal, but enough to compress the multiple.
Watchlist
| Ticker | Level | Status | Why |
|---|---|---|---|
| NVDA | 15–20% pullback from current levels | watching for pullback entry | As of April 2026, NVDA trades near $183. The CUDA moat justifies a premium, but gross margin compression and export risks mean timing matters. A meaningful pullback would offer a better risk/reward entry point. |
| AMD | 15–20% pullback from current levels | watching for inference-revenue catalysts | AMD's ROCm needs a breakout customer win to prove it can compete with CUDA at scale. The more realistic near-term catalyst is inference workload wins — watch for edge deployment contracts and revenue mix disclosures. |
| AVGO | 10–15% pullback from current levels | watching as indirect beneficiary | Broadcom designs custom AI chips for hyperscalers. If cloud giants accelerate their 'build our own' strategy, Broadcom benefits even if Nvidia stays dominant in training. |
| INTC | significant pullback from current levels | watching for turnaround signals | Intel's oneAPI is a long shot against CUDA, but Intel's foundry business could benefit from custom chip demand regardless of the software war. |
| SMH | 10–15% pullback from current levels | watching as broad semiconductor exposure | The VanEck Semiconductor ETF (SMH) gives broad exposure to the AI chip buildout without betting on a single winner in the CUDA vs. alternatives fight. |
Get the next chain map in your inbox
Free weekly research. No spam. Unsubscribe anytime.


