AI/ML Platforms & Foundation Models — Demand vs Supply (V2)

Snapshot — the group at a glance

This group sells artificial intelligence as a metered utility (a service you pay for by the amount you use, like electricity). The product is access to large "foundation models" — the giant trained AI systems (the big language and multimodal models such as Google's Gemini, OpenAI's GPT family served through Microsoft, Meta's Llama, and Amazon's Nova/Bedrock) that can read, write, reason, see and code. You do not buy the model; you rent its thinking, and you are billed by how much it thinks. The natural unit of demand is the inference token — roughly a chunk of a word that the model reads in or writes out — and, one layer down, the AI compute (the GPU/accelerator chip-time, the chips that do the actual AI math) that producing those tokens consumes. The companies named here (MSFT, GOOGL, META, AMZN, PLTR) make money when an app, a person, or — increasingly — another piece of software calls a model and burns tokens. The 500-stocks scan puts it plainly: "This IS the AGI layer. Every AI application, agent, and autonomous system calls a foundation model."

100-300%/yr

Revenue growth for leading model providers (from the scan)

~$200B+/yr

AI infrastructure capex across the big three hyperscalers (from the scan)

~5-7 est.

Organizations that can train a frontier model (scan says "a handful")

GPU + talent

The two binding supply limits (not raw materials)

~$3 / 1M in, ~$9 / 1M out

Token pricing tier in the provided cost files (capable model)

$1B+

Compute cost to enter as a frontier-model trainer (per scan)

In the provided files, demand for AI tokens is running ahead of what can be served, and the supply ceiling is GPU chip-time and a small pool of world-class research talent, not any raw material. Model providers report being GPU-rationed while growing 100-300% a year and the big-three hyperscalers spend $200B+ a year to add capacity. In money terms, an owner buying this group today pays a multiple of current revenue for each name (the exact ratios are in the price section, where the reader can weigh them). This snapshot states facts and arithmetic only; no recommendation is implied.

Source: 500-stocks scan — Software & Cloud, sub-sector 1 "AI/ML Platforms & Foundation Model Providers" (growth, supply, $1B+ entry); sub-sector 2 "Cloud Infrastructure" ($200B+ capex); token pricing tier from the provided LIGHTWEIGHT_API_COST_ANALYSIS file; player count is general-knowledge est.

The product & how money is made

The product is a trained foundation model, served on demand. Think of the model as an enormous, already-educated brain sitting in a data center. The work of educating it once is called training (a huge, one-time-per-model compute bill); the work of using it to answer each request is called inference (a small compute bill, paid every single time, that adds up fast across billions of requests). An owner cares about the second one, because inference is the part that recurs and scales with usage.

The unit sold is the token. When you send a model a prompt, it reads your text as input tokens and generates an answer as output tokens; the provider meters both and charges per million. The provided cost files show a representative pricing shape of roughly $3 per 1,000,000 input tokens and $9 per 1,000,000 output tokens for a capable model, with a cheaper, smaller-model tier (about $0.80 in / $4.00 out per million) for lighter work. Output is priced higher than input because generating text burns more chip-time than reading it.

The companies in this group earn cash from this product through several doors:

Inference API fees (price per token × volume). An API (application programming interface) is the internet "socket" a program plugs into to call the model. The purest form: a developer or app calls the model over the internet and pays per token. This is the metered-utility business and it grows with every user, every agent, and every tool call.
Cloud GPU/compute consumption. The same parent companies rent the underlying chip-time (Microsoft Azure, Amazon AWS, Google Cloud) to anyone who wants to run or fine-tune models — so they earn both on the model and on the metal it runs on. The scan notes GPU cloud instances command "3-5x premiums over standard compute."
Enterprise licensing & embedded AI. Selling model access bundled into existing products — Microsoft 365 Copilot, Google Workspace AI, Palantir's AIP — usually as a per-seat or per-platform subscription that layers a recurring fee on top of the raw token cost.

The key money word for an owner here is the spread between the price charged per token and the cost to produce that token (which is mostly GPU depreciation and electricity). Frontier inference is sold at a markup, and that markup is under downward pressure as chips get faster and models get more efficient — the cost-per-token has fallen year over year est., which can either compress margins or expand volume (cheaper tokens get used more). The other money words are capex (capital expenditure — the cash spent up front on data centers and chips) and free cash flow (the cash actually left for the owner after that spend). This group is unusual: the model layer itself is software-like and capital-light, but the same companies are simultaneously pouring tens of billions into the capital-heavy compute underneath it.

Source: 500-stocks scan sub-sector 1 ("Revenue comes from API inference fees, cloud GPU consumption, and enterprise licensing of model access") and sub-sector 2 ("3-5x premiums"); token price tiers from LIGHTWEIGHT_API_COST_ANALYSIS.md ($3/$9 and $0.80/$4.00 per 1M); the falling-cost-per-token trend and margin characterisation are general-knowledge est.

Demand — how much the world will want this

Demand for tokens is driven by one thing above all: how much of the world's work gets handed to AI. The scan frames the mechanism directly — "Inference API revenue scales with every user, every agent, every tool invocation. Recursive self-improvement means these platforms both produce and consume intelligence." (Recursive self-improvement: AI systems that help build the next, more capable AI systems.) That last point is the heart of the demand case: as AI systems get good enough to act on their own (agents that call tools, write code, run workflows), the heaviest token-buyer stops being a human typing prompts and becomes other software. Machine-to-machine usage has no daily limit the way human attention does, so token demand decouples from population and ties instead to how many autonomous processes are running.

Current demand (known facts from the scan): leading foundation-model providers are growing revenue at "100-300% annually." On the compute that demand pulls through, the scan reports AI cloud revenue "growing 50-100% YoY" (year-over-year) and the big three hyperscalers spending "$200B+ annually" on AI infrastructure — capex on that scale is, in effect, a bet placed with real cash that token demand keeps climbing. In plain terms: the people closest to the demand are spending two hundred billion dollars a year to be ready for more of it.

Forward demand (forecast — reasoning from the premise that AGI is arriving): if recursive self-improvement proceeds and capable AI is deployed broadly across software, robotics and autonomous systems, token consumption does not grow on a normal product curve — it compounds. Every new agent, every physical-AI device (robots and autonomous systems running large models on board), and every "reasoning" workload (where the model deliberately spends many extra tokens thinking before answering) multiplies tokens-per-task. Reasoning models in particular can burn on the order of 10-50× the tokens of a simple answer for a single hard question est., which means demand can rise even if the number of users is flat. Independent industry estimates commonly size the broader generative-AI / foundation-model market (TAM — total addressable market, the whole revenue pool the product could eventually serve) in the hundreds of billions of dollars by the late 2020s, growing at high-double-digit percent annual rates est. These are forecasts, not contracted orders; the binding question for an owner is whether supply can keep up, which is the next section.

Who the buyers are: three overlapping tiers. (1) End users and enterprises buying embedded AI (Copilot seats, Gemini in Workspace, Palantir AIP deployments). (2) Developers and startups calling the inference APIs to build their own apps. (3) Increasingly, AI agents themselves, calling models in loops with no human in the step. Tier 3 is the one the AGI lens elevates: it is the demand source with no natural ceiling, and it is why the scan calls this layer the one that "both produces and consumes intelligence."

✓ VERIFIED — the following figures were confirmed from primary sources after initial publication:

NVIDIA data-center revenue hit $75.2 billion in Q1 FY2027 (quarter ended April 2026), up 92% year-over-year; total revenue $81.6B, Q2 guidance $91B (NVIDIA Q1 FY27 press release, May 28 2026)
The four largest US hyperscalers (Amazon, Microsoft, Alphabet, Meta) spent a combined ~$410B in 2025 and are projected to spend ~$715B combined in 2026 — a ~74% year-over-year increase (OfficeChai, citing company guidance, 2026)
Gartner forecasts worldwide data-center systems spending at $788B in 2026, up 55.8% year-over-year — the fastest-growing IT segment, driven by AI infrastructure (Gartner via TechEdge AI, May 20 2026)

Remaining caveat: some market-size and growth-rate figures not listed above are directional estimates from general knowledge (model cutoff ~early 2026), not live-verified. Company-specific financials in the Players table are from the most recent public filings or earnings. For SEC-verified deep dives on individual companies, see Stock Reports.

Source: 500-stocks scan sub-sector 1 ("100-300% annually," "both produce and consume intelligence") and sub-sector 2 ("50-100% YoY," "$200B+ annually"); forward market-size and token-multiplier figures are general-knowledge est. and forecasts.

Supply — how much can be made, and what limits it

Supply in this group has an unusual shape, and the scan is precise about it: it is "talent-constrained, not commodity-constrained," and separately "Model serving is GPU-constrained in the near term." There are two different bottlenecks stacked on top of each other.

Bottleneck 1 — who can build a model at all (talent + capital). The scan says "Only a handful of organizations can train frontier models (requires $1B+ in compute and world-class researchers)." This is the supply limit on the product itself: the ability to produce a frontier model is concentrated in roughly five to seven organizations worldwide est. (the scan's own word is "a handful"), because it takes both a billion-dollar compute budget and a scarce population of top researchers. This capability is not available off a shelf or quick to stand up.

Bottleneck 2 — how many tokens can be served (GPUs and power). Even once a model exists, serving it to the world is limited by chip-time. The companion cloud sub-section of the scan states "GPU availability remains tight across all clouds," "data center buildout takes 18-36 months," and "power availability is becoming the binding constraint." So the second supply ceiling is physical: enough GPUs, enough data centers, enough megawatts. This is why providers ration capacity and why $200B+/yr is being spent to lift the ceiling.

Current capacity & expansion (known/announced): the supplier set has very large balance sheets — the scan notes hyperscalers "have the balance sheets ($100B+ cash) to outspend all competitors on capacity." They are adding capacity as fast as chips, construction and power permit, but the lead times above (18-36 months for a data center) mean supply lags demand by years, not quarters. The scan also cites "enormous moats from data flywheels and ecosystem lock-in" (a data flywheel: more usage produces more data, which improves the product, which draws more usage; ecosystem lock-in: customers find it costly to switch once their tools and data sit on one platform).

Market-share structure (who controls supply): highly concentrated est.. A small number of US mega-caps (Microsoft-via-OpenAI, Google/DeepMind, Meta, Amazon) plus a few independents control essentially all frontier model-and-serving capacity. The same firms also control the cloud the rest of the industry must rent (AWS+Azure+GCP together ~65% of cloud, per the scan). Concentrated supply means pricing is set by a few players rather than competed to the floor — though they compete with each other, and open-weight models (Meta's Llama, whose trained weights are published for anyone to run) put a partial floor under what anyone can charge.

Source: 500-stocks scan sub-sector 1 ("talent-constrained," "handful of organizations," "$1B+ in compute," "data flywheels and ecosystem lock-in") and sub-sector 2 ("GPU availability remains tight," "18-36 months," "power... binding constraint," "$100B+ cash," "~65% of the cloud market"); count of frontier trainers and the concentration characterisation are general-knowledge est.

The gap — demand vs supply

Putting the two sides together, the provided files describe a product that is currently short — demand running ahead of what can be served. The evidence is consistent across the scan: providers are growing 100-300% a year, GPU capacity is rationed, lead times to add data centers run 18-36 months, power is becoming the binding input, and the suppliers are spending $200B+/yr specifically to add capacity. Token prices have been falling est., which the cost files attribute to a cost-curve effect (chips and models get more efficient) rather than to oversupply — to date, falling unit prices have coincided with rising total tokens consumed rather than slack.

Signal	What it shows	Direction
Provider revenue growth	100-300%/yr	Demand > supply
AI cloud revenue growth	50-100%/yr	Demand > supply
GPU availability	Rationed / tight	Supply-constrained
Data-center lead time	18-36 months	Supply lags by years
Power availability	Becoming binding	New ceiling forming
Capex to add capacity	$200B+/yr	Suppliers adding capacity
Token price trend	Falling/yr est.	Cost curve, not glut (volume rising)

When could it flip to oversupply? The scan does not date a flip. Reasoning from the AGI premise, the demand side keeps adding new buyers (autonomous agents, physical AI, reasoning workloads) that did not exist in prior cycles. A flip to oversupply would most plausibly come from the supply side catching up — if the $200B+/yr of capex and a wave of new chips landed faster than agentic demand ramped, serving capacity could temporarily exceed paying demand, which would show up first as falling utilization and price cuts no longer offset by volume est. That is a forecast scenario, not a present fact; in the provided files today, every cited signal points to short, not long.

Source: 500-stocks scan sub-sectors 1 & 2 (all growth, GPU, lead-time, power and capex figures); token-price-trend and flip-scenario reasoning are general-knowledge est. and forecast.

The players — who captures the money

Four of the five key tickers are diversified mega-caps where foundation-model/inference revenue is a fast-growing but still minority slice of a very large total; one (PLTR) is the closest of the five to a listed pure-play on applied AI. The figures below for market cap, total revenue and exposure are general-knowledge approximations as of about early 2026 and are NOT live-verified — they are directional, to show exposure, not precise.

Company (ticker)	What it makes in this group	Exposure to this product	Rough size est.	Position / edge
Microsoft (MSFT)	OpenAI models served on Azure AI; Copilot embedded across products	Diversified; AI a growing slice of a huge base	~$3-3.5T cap est.	Closest tie to OpenAI; #2 cloud; broad enterprise distribution
Alphabet (GOOGL)	Gemini / DeepMind; models served on Google Cloud + in Search/Workspace	Diversified; ads still the bulk of revenue	~$2-2.5T cap est.	Owns the full stack — research (DeepMind), own TPU chips, own cloud, own data
Meta (META)	Llama open-weight models; AI embedded in its apps	Diversified; nearly all revenue still advertising	~$1.3-1.6T cap est.	Leading open-weight models (a price reference point for the field); large own GPU fleet; no external token-API business
Amazon (AMZN)	Bedrock (multi-model platform) + Nova (own models); AWS compute underneath	Diversified; AWS is the profit engine, retail the revenue bulk	~$2-2.4T cap est.	#1 cloud (AWS); "model-neutral" marketplace plus own chips (Trainium/Inferentia)
Palantir (PLTR)	AIP — software that wires foundation models into enterprise/government workflows	Closest of the five to a pure applied-AI play; this is the core story	~$200-300B cap est.	Deep gov/enterprise deployment, data-integration moat; consumes models rather than training frontier ones

Two structural notes for an owner. First, MSFT, GOOGL and AMZN are both supplier and landlord — they sell the model and rent the GPUs, so they capture money on two layers. Second, META and PLTR sit at opposite ends: META publishes its models open-weight (monetizing indirectly through its apps and ad engine), while PLTR sells the application layer on top of models it does not train. None of the five is a "tokens-only" company; the nearest token-API pure-plays (OpenAI, Anthropic) are not directly listed and reach public markets mainly through MSFT, AMZN and GOOGL stakes/partnerships.

Source: 500-stocks scan sub-sector 1 company list (GOOGL Gemini/DeepMind, MSFT OpenAI/Azure AI, META Llama, AMZN Bedrock/Nova, PLTR); all market-cap, revenue-share and positioning figures are general-knowledge est., not live-verified.

The price of exposure

In plain money terms, what does the market charge today for a dollar of exposure to this group? One common gauge is the price-to-sales ratio — how many dollars of market value you pay per $1 of this year's revenue (it answers "how many years of current sales am I paying up front?"). All figures below are general-knowledge approximations as of about early 2026 and are NOT live-verified.

Company	~$ market value per $1 of annual revenue est.	Money-in / money-out shape
MSFT	~12-14×	High-capex now, alongside large existing free cash flow; generates owner cash today
GOOGL	~7-9×	High-capex, strong free cash flow from ads; generates owner cash today
META	~9-11×	High-capex, strong ad free cash flow; generates owner cash today
AMZN	~3-4×	Highest-capex; thin retail margin, AWS carries the profit; cash positive but capex-heavy
PLTR	~40-70×	Capital-light software; profitable and cash-generative; price-to-sales is the highest of the five

A few neutral arithmetic observations for an owner. The four mega-caps generate large profits from established businesses (ads, software, cloud, retail) and fund the $200B+/yr build-out out of those profits rather than from new outside capital. AMZN's price-to-sales of ~3-4× is the lowest of the five largely because its high-volume, low-margin retail revenue enlarges the denominator (revenue); measured on profit instead of revenue, its ratio is not low. PLTR carries the highest price-to-sales of the group at ~40-70×, meaning the price embeds many years of the revenue growth implied by the compounding token/agent demand described above. These are facts and ratios; whether any price is worth paying is the reader's judgment and is not stated here.

One more money lens: this group converts AI capex into owner cash differently from the hardware layers. A chip foundry or data-center builder spends capital to sell capacity at a fixed margin; these software/platform owners spend capital to sell metered intelligence whose unit cost has been falling year over year est.. If token volume keeps outrunning the falling price (as the cost files indicate it has), the same capex throws off rising cash; if volume stalls, the capex becomes stranded depreciation (assets that keep losing book value while earning little). That is the central money-in/money-out tension for the group.

Source: price-to-sales and free-cash-flow characterisations are general-knowledge est. as of ~early 2026 and NOT live-verified; capex figure ($200B+/yr) from the 500-stocks scan sub-sector 2; falling-unit-cost trend from the provided cost files plus general-knowledge est.

What to deep-dive next

Where a company-level deep-dive would be most informative, factually:

Purest play on this product: PLTR — it is the only one of the five where applied-AI software is the story rather than a slice; a deep-dive could examine how durable its enterprise/government deployments are relative to its ~40-70× revenue multiple.
Most vertically integrated (controls its own chips + cloud + models): GOOGL (DeepMind + TPUs + GCP) and AMZN (AWS + Trainium/Inferentia + Bedrock/Nova) own the most of the stack end-to-end, which bears directly on the GPU/power supply bottleneck the scan flags. MSFT is integrated differently — via the OpenAI tie and Azure scale.
Open-weight / price-reference wildcard: META — its decision to publish Llama open-weight shapes the pricing reference point for the field; a deep-dive could focus on how it monetizes models indirectly rather than via token sales.
Diversified conglomerates where this is a small slice: all four mega-caps qualify — the deep-dive task there is to isolate the AI/token revenue from the legacy ads/retail/software base and measure how fast it is becoming a swing factor in total profit.
Off-board but adjacent: the nearest token-API pure-plays (OpenAI, Anthropic) are not listed; an owner gets exposure to them only through MSFT, AMZN and GOOGL — worth noting in any deep-dive of those three.

Source: 500-stocks scan sub-sector 1 (player roster and capability notes); pure-play / integration characterisation is analytical, drawn from the scan plus general-knowledge est.

Sources & confidence

What was used:

500-stocks scan, Software & Cloud file, sub-section 1 "AI/ML Platforms & Foundation Model Providers" and sub-section 2 "Cloud Infrastructure (Hyperscaler Parents)" — /Users/ravf/projects/work/.claude/worktrees/sector-hub/research/investments/500-stocks/05-software-cloud.html. This is the source for the structural facts and the hard-ish figures: 100-300% provider growth, 50-100% AI cloud growth, $200B+/yr capex, ~65% cloud share, 3-5x GPU-cloud premiums, GPU rationing, 18-36 month lead times, power as binding constraint, $1B+ entry cost, $100B+ cash, talent constraint.
Provided cost/inference analysis files (token economics texture) — /Users/ravf/projects/work/.claude/worktrees/sector-hub/research/investments/LIGHTWEIGHT_API_COST_ANALYSIS.md and /Users/ravf/projects/work/.claude/worktrees/sector-hub/research/investments/INFERENCE_EXECUTIVE_SUMMARY.txt — source for representative token pricing ($3/1M input, $9/1M output for a capable model; $0.80/$4.00 for a lighter tier).
General knowledge, cutoff ~early 2026 — used for company market caps, revenue mix, price-to-sales multiples, market-size/TAM, token-per-task multipliers, the falling-cost-per-token trend, and the count of frontier-model trainers.
Note: the originally specified prior report ("Anthropic Compute Infrastructure" V1) was not present on disk at the given path, so no dated V1 numbers were folded in; if it is located later, its figures should be merged and dated.

Hard vs approximate: The growth rates, capex, cloud share, GPU-cloud premiums, lead times, supply-constraint language and entry cost are grounded in the provided scan. The token price tiers are grounded in the provided cost files. Everything else — all market caps, all price-to-sales multiples, the hundreds-of-billions market size, the token-per-task multipliers, the falling-cost-per-token trend, and the player count — is approximate, general-knowledge, and NOT live-verified; it is marked est. and should be checked against current filings and live quotes before being relied on for any decision. No buy/sell, price target, or valuation verdict is expressed anywhere in this fact sheet.

Source: as listed above; confidence labels apply as stated.