AI Inference & Edge Chips
Chips  Demand vs supply & the price of exposure · unit of demand: inference chips / tokens per second
NVDAAMDQCOMINTC
V2 · factsJun 2026
Sector scan: Semiconductors Group-level demand/supply Updated Jun 2, 2026 Facts only · no recommendation
Snapshot Product Demand Supply The gap The players The price Deep-dive next Sources

Snapshot

AI inference chips run trained models for end users — every chatbot reply, image generation, code completion, and agent action. Edge AI chips (NPUs — neural processing units — small accelerators embedded in phones, laptops, cars, and robots) run smaller models locally without a data-center round trip. NVIDIA dominates data-center inference with the same GPUs it sells for training; AMD sells MI-series GPUs as the #2 merchant alternative; Qualcomm (QCOM) leads mobile/edge AI silicon with Snapdragon NPUs; Intel (INTC) ships Xeon CPUs with built-in AI acceleration plus Gaudi accelerators for data centers. Unlike training — a one-time capital cost per model — inference is a recurring cost that scales with every query, every user, every autonomous agent.

~$250B+ est.
Rough 2026 AI inference chip spend (data center + edge), not live-verified
>50% est.
Share of total AI compute going to inference vs training, growing each year
4 names
NVDA · AMD · QCOM · INTC — data center to pocket
5nm–3nm / 4nm–7nm
DC inference uses leading-edge nodes; edge uses mature nodes with better availability
Data-center inference GPUs sell for $20,000–$40,000+ each est. and are consumed proportionally to total AI queries served. Edge NPUs ship inside phone/PC/automotive SoCs at $30–$150 per chip est. in billion-unit volumes. DC inference carries 60–75% gross margins; edge SoCs carry 50–55% margins but far higher unit counts. NVIDIA reported $75.2B of data-center revenue in Q1 FY27 alone (quarter ended April 2026), a growing fraction of which is inference; Qualcomm ships roughly 700M+ Snapdragon units annually est..
Some market-size and growth figures are directional estimates, not live-verified. Company financials are from most recent public filings. For SEC-verified deep dives, see Stock Reports.

The product & how money is made

Data-center inference chips

The same GPU/accelerator hardware used for training — NVIDIA H100, H200, B200, B300; AMD MI300X, MI350 — also runs inference. The buyer is a cloud provider or enterprise. The chip is identical to a training chip; what differs is the workload: inference optimizes for latency and tokens per second per watt, whereas training optimizes for raw aggregate throughput. The operator buys the chip once as capex, then earns recurring revenue selling compute via API pricing (e.g., $X per million tokens). The chip maker captures the one-time sale; the inference revenue accrues to whoever operates the chip.

Edge / on-device inference chips (NPUs)

Small, power-efficient AI accelerators embedded inside a larger SoC alongside CPU, GPU, and (for phones) cellular modem. Qualcomm's Hexagon NPU inside Snapdragon is the highest-volume example. The buyer is a device OEM. Qualcomm designs the SoC, TSMC manufactures it, Qualcomm sells the chip to Samsung / Xiaomi / OEM at $30–$150 per unit est.. The NPU enables on-device features (real-time translation, photo enhancement, local AI assistants) that let OEMs command higher device prices. Qualcomm also collects patent licensing royalties on every 3G/4G/5G handset worldwide — ~$5B/yr at ~70% pre-tax margins est..

Cash-generation shape: NVDA, AMD, and QCOM are fabless — no factories, modest capital needs, high FCF conversion. Intel owns and operates fabs (Intel Foundry), spending $25.8B on capex TTM, producing negative FCF of –$8.3B.

Demand

Contracted / observable

Forecast

Supply

Capacity

Bottlenecks

The gap

SignalData-center inferenceEdge / on-device
Lead times6–12 months (same as training GPUs)Normal (weeks); fab capacity available
Forward bookingsBlackwell / Rubin sold out years aheadNo shortages; OEMs order seasonally
BottleneckCoWoS + HBM (physical, multi-year to expand)Consumer device demand, not silicon
Pricing powerStrong — chips sell at list or above est.Competitive — QCOM, MediaTek, Apple compete
ReadShort (demand > supply)Balanced

Pricing direction: DC inference GPU ASPs have risen each generation — H100 ~$25K–$30K, B200 ~$30K–$40K est.. Cloud rental rates for inference ($1.50–$3.50/hr H100 on neoclouds, $6–$12/hr on hyperscalers) are falling as supply expands. Edge SoC ASPs are flat-to-declining — NPUs add die area but OEMs resist consumer price increases.

Cloud pricing: Spheron, May 2026; chip ASPs: general knowledge est.

Oversupply risk: CoWoS/HBM expansions landing while hyperscalers pause capex. Additional inference-specific factor: software efficiency gains from quantization, speculative decoding, and distillation can reduce compute per query by 2–10x est.. If efficiency gains outrun query growth, demand-per-chip falls even as total usage rises.

The players

CompanyInference productsRevenue (TTM)Mkt capEV/RevGross marginFCF (TTM)Position
NVDA H100/H200/B200/B300 DC GPUs; DRIVE Thor (auto) $253.5B $5.40T 21.1x 74.1% $46.3B ~80–90% DC share est.; CUDA lock-in
AMD MI300X/MI350 DC GPUs; Ryzen AI NPU (PCs); Xilinx FPGAs $37.5B $850B 22.5x 53.1% $7.2B #2 DC GPU; main NVDA alternative
QCOM Snapdragon NPUs (phone/PC/auto); Cloud AI 100 (DC) $44.5B $254B 5.8x 54.8% $9.6B Edge leader; ~700M+ units/yr est.; patent royalty kicker
INTC Core Ultra NPU (AI PCs); Gaudi (DC); Xeon AMX $53.8B $542B 10.6x 37.2% –$8.3B Only US fab owner; AI share small; turnaround-dependent

Source: yfinance, June 2, 2026 contracted

NVDA/AMD/QCOM are fabless — high margins, modest capex, strong FCF. QCOM adds a ~$5B/yr patent licensing annuity est.. INTC owns fabs, spent $25.8B capex TTM, and is FCF-negative.

The price of exposure

MetricNVDAAMDQCOMINTC
Price$222.82$521.54$240.84$107.93
Mkt cap$5.40T$850B$254B$542B
EV / Revenue21.1x22.5x5.8x10.6x
Trailing P/E34.1x172.7x25.9xN/A (loss)
Forward P/E17.6x39.9x22.6x69.9x
Price / Book34.4x13.2x9.4x4.9x
FCF yield0.9%0.8%3.8%–1.5%
Net cash (debt)+$40.4B+$8.5B–$5.5B–$12.2B
Rev growth (YoY)+85%+38%–3.5%+7%
Dividend yield~0%0%1.5%0%

Source: yfinance, June 2, 2026 contracted

What to deep-dive next

Sources & confidence