Snapshot
AI inference chips run trained models for end users — every chatbot reply, image generation, code completion, and agent action. Edge AI chips (NPUs — neural processing units — small accelerators embedded in phones, laptops, cars, and robots) run smaller models locally without a data-center round trip. NVIDIA dominates data-center inference with the same GPUs it sells for training; AMD sells MI-series GPUs as the #2 merchant alternative; Qualcomm (QCOM) leads mobile/edge AI silicon with Snapdragon NPUs; Intel (INTC) ships Xeon CPUs with built-in AI acceleration plus Gaudi accelerators for data centers. Unlike training — a one-time capital cost per model — inference is a recurring cost that scales with every query, every user, every autonomous agent.
~$250B+ est.
Rough 2026 AI inference chip spend (data center + edge), not live-verified
>50% est.
Share of total AI compute going to inference vs training, growing each year
4 names
NVDA · AMD · QCOM · INTC — data center to pocket
5nm–3nm / 4nm–7nm
DC inference uses leading-edge nodes; edge uses mature nodes with better availability
Data-center inference GPUs sell for $20,000–$40,000+ each est. and are consumed proportionally to total AI queries served. Edge NPUs ship inside phone/PC/automotive SoCs at $30–$150 per chip est. in billion-unit volumes. DC inference carries 60–75% gross margins; edge SoCs carry 50–55% margins but far higher unit counts. NVIDIA reported $75.2B of data-center revenue in Q1 FY27 alone (quarter ended April 2026), a growing fraction of which is inference; Qualcomm ships roughly 700M+ Snapdragon units annually est..
Some market-size and growth figures are directional estimates, not live-verified. Company financials are from most recent public filings. For SEC-verified deep dives, see
Stock Reports.
The product & how money is made
Data-center inference chips
The same GPU/accelerator hardware used for training — NVIDIA H100, H200, B200, B300; AMD MI300X, MI350 — also runs inference. The buyer is a cloud provider or enterprise. The chip is identical to a training chip; what differs is the workload: inference optimizes for latency and tokens per second per watt, whereas training optimizes for raw aggregate throughput. The operator buys the chip once as capex, then earns recurring revenue selling compute via API pricing (e.g., $X per million tokens). The chip maker captures the one-time sale; the inference revenue accrues to whoever operates the chip.
Edge / on-device inference chips (NPUs)
Small, power-efficient AI accelerators embedded inside a larger SoC alongside CPU, GPU, and (for phones) cellular modem. Qualcomm's Hexagon NPU inside Snapdragon is the highest-volume example. The buyer is a device OEM. Qualcomm designs the SoC, TSMC manufactures it, Qualcomm sells the chip to Samsung / Xiaomi / OEM at $30–$150 per unit est.. The NPU enables on-device features (real-time translation, photo enhancement, local AI assistants) that let OEMs command higher device prices. Qualcomm also collects patent licensing royalties on every 3G/4G/5G handset worldwide — ~$5B/yr at ~70% pre-tax margins est..
Cash-generation shape: NVDA, AMD, and QCOM are fabless — no factories, modest capital needs, high FCF conversion. Intel owns and operates fabs (Intel Foundry), spending $25.8B on capex TTM, producing negative FCF of –$8.3B.
Demand
Contracted / observable
- NVIDIA data-center revenue — $75.2B in Q1 FY2027 (quarter ended April 2026), up 92% YoY; Q2 guidance $91B. NVIDIA has stated inference is now a larger workload than training on its installed base. contracted
Source: NVIDIA Q1 FY27 press release, May 28 2026
- Hyperscaler capex — Amazon, Microsoft, Alphabet, Meta combined ~$410B in 2025, guiding ~$715B in 2026 (+74% YoY). contracted
Source: OfficeChai, citing company guidance
- AMD Data Center segment — $14.1B TTM revenue; MI300X/MI350 the fastest-growing piece. AMD guided $7B+ AI GPU revenue for CY2025. contracted
- Qualcomm QCT — $38.6B TTM chip revenue. ~700M+ Snapdragon units/yr est.; premium tiers increasingly require on-device NPU. contracted
Forecast
- Inference scales with usage, not model releases. Every new user, agent, and API integration multiplies inference demand. If AI query volumes grow 5–10x over 2–3 years est., inference chip demand grows proportionally, offset partially by hardware efficiency (~2x inference throughput per watt per chip generation est.).
- Edge AI penetration. ~80% of 2026 smartphones expected to contain a dedicated NPU est., up from ~50% in 2024. The "AI PC" refresh cycle (Windows requiring NPU for Copilot+ features) adds volume for Qualcomm (Snapdragon X) and Intel (Core Ultra).
- Autonomous vehicles. Self-driving runs inference continuously — NVIDIA DRIVE Thor targets 2,000 TOPS per vehicle; Qualcomm Snapdragon Ride serves ADAS. Early-stage but high-unit-volume if mass adoption occurs.
Supply
Capacity
- DC inference GPUs share the training GPU supply chain: TSMC 3nm/5nm wafers + CoWoS advanced packaging + HBM. CoWoS is the #1 supply constraint — TSMC is tripling capacity but still cannot keep up. Every B200/B300 and MI300X/MI350 competes for the same CoWoS slots.
- Edge NPUs are less constrained. Snapdragon and Core Ultra use 4nm–7nm nodes with broader fab availability across TSMC, Samsung Foundry, and Intel.
- Intel Foundry manufactures Intel's own chips and is attempting to win external customers with Intel 18A (roughly TSMC 2nm equivalent). Capex: $25.8B TTM. contracted
Bottlenecks
- CoWoS + HBM gate DC inference GPU production identically to training GPUs. Lead times 6–12 months. HBM sold out through 2027.
- Edge: no hard bottleneck. The constraint is device sell-through (consumer demand for new phones/PCs), not fab capacity. Combined smartphone + PC market is ~1.4B units/yr est..
The gap
| Signal | Data-center inference | Edge / on-device |
| Lead times | 6–12 months (same as training GPUs) | Normal (weeks); fab capacity available |
| Forward bookings | Blackwell / Rubin sold out years ahead | No shortages; OEMs order seasonally |
| Bottleneck | CoWoS + HBM (physical, multi-year to expand) | Consumer device demand, not silicon |
| Pricing power | Strong — chips sell at list or above est. | Competitive — QCOM, MediaTek, Apple compete |
| Read | Short (demand > supply) | Balanced |
Pricing direction: DC inference GPU ASPs have risen each generation — H100 ~$25K–$30K, B200 ~$30K–$40K est.. Cloud rental rates for inference ($1.50–$3.50/hr H100 on neoclouds, $6–$12/hr on hyperscalers) are falling as supply expands. Edge SoC ASPs are flat-to-declining — NPUs add die area but OEMs resist consumer price increases.
Cloud pricing: Spheron, May 2026; chip ASPs: general knowledge est.
Oversupply risk: CoWoS/HBM expansions landing while hyperscalers pause capex. Additional inference-specific factor: software efficiency gains from quantization, speculative decoding, and distillation can reduce compute per query by 2–10x est.. If efficiency gains outrun query growth, demand-per-chip falls even as total usage rises.
The players
| Company | Inference products | Revenue (TTM) | Mkt cap | EV/Rev | Gross margin | FCF (TTM) | Position |
| NVDA |
H100/H200/B200/B300 DC GPUs; DRIVE Thor (auto) |
$253.5B |
$5.40T |
21.1x |
74.1% |
$46.3B |
~80–90% DC share est.; CUDA lock-in |
| AMD |
MI300X/MI350 DC GPUs; Ryzen AI NPU (PCs); Xilinx FPGAs |
$37.5B |
$850B |
22.5x |
53.1% |
$7.2B |
#2 DC GPU; main NVDA alternative |
| QCOM |
Snapdragon NPUs (phone/PC/auto); Cloud AI 100 (DC) |
$44.5B |
$254B |
5.8x |
54.8% |
$9.6B |
Edge leader; ~700M+ units/yr est.; patent royalty kicker |
| INTC |
Core Ultra NPU (AI PCs); Gaudi (DC); Xeon AMX |
$53.8B |
$542B |
10.6x |
37.2% |
–$8.3B |
Only US fab owner; AI share small; turnaround-dependent |
Source: yfinance, June 2, 2026 contracted
NVDA/AMD/QCOM are fabless — high margins, modest capex, strong FCF. QCOM adds a ~$5B/yr patent licensing annuity est.. INTC owns fabs, spent $25.8B capex TTM, and is FCF-negative.
The price of exposure
| Metric | NVDA | AMD | QCOM | INTC |
| Price | $222.82 | $521.54 | $240.84 | $107.93 |
| Mkt cap | $5.40T | $850B | $254B | $542B |
| EV / Revenue | 21.1x | 22.5x | 5.8x | 10.6x |
| Trailing P/E | 34.1x | 172.7x | 25.9x | N/A (loss) |
| Forward P/E | 17.6x | 39.9x | 22.6x | 69.9x |
| Price / Book | 34.4x | 13.2x | 9.4x | 4.9x |
| FCF yield | 0.9% | 0.8% | 3.8% | –1.5% |
| Net cash (debt) | +$40.4B | +$8.5B | –$5.5B | –$12.2B |
| Rev growth (YoY) | +85% | +38% | –3.5% | +7% |
| Dividend yield | ~0% | 0% | 1.5% | 0% |
Source: yfinance, June 2, 2026 contracted
- NVDA — 21x revenue, 17.6x forward earnings. 74% gross / 66% operating margins. Forward P/E implies consensus expects ~2x earnings growth (forward EPS $12.63 vs trailing $6.53). Revenue +85% YoY. Net cash +$40.4B.
- AMD — 22.5x revenue on lower margins (53% gross, 14% operating). Trailing P/E of 173x reflects small current earnings; forward P/E of 40x implies a large ramp (forward EPS $13.08 vs trailing $3.02). Revenue +38% YoY. Net cash +$8.5B.
- QCOM — 5.8x revenue. Revenue flat-to-down (–3.5% YoY). FCF yield of 3.8% is highest in group. Pays 1.5% dividend. Net debt $5.5B is manageable against $9.6B annual FCF.
- INTC — 10.6x revenue, 37% gross margins, negative FCF (–$8.3B). Forward P/E of 70x implies a turnaround to profitability (forward EPS $1.55 vs trailing –$0.60). Net debt $12.2B; capex consuming cash faster than operations generate it.
What to deep-dive next
- NVDA — its earnings are the single best real-time signal of total AI compute demand. Key question: what fraction of DC revenue is inference vs training, and how fast is the mix shifting?
- QCOM — at 5.8x EV/Rev and 3.8% FCF yield. Key question: does on-device AI (phones, PCs, autos) grow fast enough to re-rate QCOM, or does it remain a mature handset chip company with a patent kicker?
- INTC — negative FCF, turnaround-dependent, only US company that designs and manufactures advanced chips. Leopold Aschenbrenner holds 20.2M INTC call options (pure options bet on US chip sovereignty). Key question: does Intel 18A achieve competitive yields and win external foundry customers?
- AMD — key question: can MI350/MI400 take DC inference share from NVIDIA given CUDA lock-in, and do Xilinx FPGAs become inference accelerators?
Sources & confidence
- Primary grounding: 500-stocks semiconductor scan, "AI Inference & Edge Chips" sub-section —
/Users/ravf/projects/work/.claude/worktrees/sector-b2/research/investments/500-stocks/02-semiconductors.html.
- Live-verified contracted: all market caps, revenue, margins, FCF, EPS, P/E, P/B, EV/Rev, net cash/debt — yfinance, June 2, 2026. NVIDIA Q1 FY27 from May 28 2026 press release. Hyperscaler capex from company guidance via OfficeChai.
- Not live-verified est.: total inference market size (~$250B+), inference share of compute (>50%), GPU ASPs ($25K–$40K), edge SoC ASPs ($30–$150), NPU penetration (~80%), Snapdragon volumes (~700M+), efficiency multipliers (2–10x), QCOM licensing revenue (~$5B/yr), NVDA DC share (~80–90%).
- Hard vs approximate: HARD = yfinance financials, NVIDIA press release, CoWoS/HBM mechanics, fabless-vs-fab distinction. APPROXIMATE = every market size, growth rate, unit volume, ASP, and share figure not from a filing.