Together AI — Deep Dive

1. One-line

Together AI sells open-source model inference, fine-tuning, and GPU cluster training as a hosted cloud — positioning itself as the "anti-OpenAI" infrastructure for teams that want to run Llama / DeepSeek / Mistral / Mamba / Qwen / Cartesia voice etc. at scale without owning GPUs. Roughly $300M ARR (Sep 2025), $3.3B post-money (Feb 2025), reportedly raising again at materially higher mark.

2. Founders & key people

Name	Role	Background
Vipul Ved Prakash	Co-founder & CEO	Previously founded Topsy (acquired Apple, $200M), founded Cloudmark. Long-time SF infra entrepreneur.
Ce Zhang	Co-founder & CTO	Was professor at ETH Zürich and Wisconsin, ML systems researcher.
Chris Ré	Co-founder	Stanford ML systems professor. MacArthur Fellow. Mentored Tri Dao, Dan Fu, many others. Spun out Snorkel, SambaNova, ColdSpark.
Tri Dao	Co-founder & Chief Scientist	FlashAttention (1/2/3/4) author. Mamba co-author. Princeton CS faculty. Ré's PhD student. Arguably the single most-cited applied ML systems researcher of the past 3 years.
Percy Liang	Co-founder	Stanford CS prof, director of Stanford HAI / CRFM. Co-led HELM, Alpaca, many open-model efforts.
Dan Fu	VP of Kernels	Ré PhD; H3, Hyena, Monarch Mixer co-author.
Max Ryabinin	VP of Model Shaping	Ex-Yandex / HSE; SWARM Parallelism, Petals (distributed training of large models on consumer GPUs).
Albert Meixner	SVP Engineering	Ex-Google (TPU / Borg-era infra).
Mahadev Konar	SVP Engineering Infrastructure	Original Apache ZooKeeper co-creator. Yahoo / Hortonworks lineage.
Charles Zedlewski	CPO	Ex-Cloudera CPO.

This is one of the most credentialed ML-systems benches in the industry. The combination of Ré + Dao + Liang + Zhang gives them an unusually strong claim that they ship state-of-the-art kernels and architectures, not just resell GPUs. This is the moat vs. CoreWeave / Lambda / Crusoe (pure GPU rental).

3. Funding history

Round	Date	Amount	Lead	Post-money
Seed	Nov 2022	$20M	Kleiner Perkins	—
Series A	Nov 2023	$102.5M	Kleiner Perkins, NVIDIA	~$1.25B
Series A extension	Mar 2024	$106M	Salesforce Ventures, NVIDIA	~$1.25B
Series B	Feb 2025	$305M	General Catalyst, Prosperity7	~$3.3B
Rumored next round	Reportedly mid-2025/26	~$1B target	Unconfirmed	Reported $7.5B target — treat as rumor

Series B participants: General Catalyst, Prosperity7, Salesforce Ventures, DAMAC Capital, NVIDIA, Kleiner Perkins, March Capital, Emergence, Lux Capital, SE Ventures, Greycroft, Coatue, Definition, Cadenza, Long Journey, Brave, Scott Banister, SK Telecom, John Chambers. Total raised: ~$1.18B.

NVIDIA is a repeat strategic investor. This matters for supply — Together gets NVIDIA GPUs (Blackwell, GB200 NVL72) at scale that smaller competitors can't access. It also creates a coupling risk: Together's strategic position depends on NVIDIA continuing to view them as a preferred channel partner.

4. Product & business model

What they sell

Inference API — OpenAI-compatible REST API for 200+ open-source models (Llama, DeepSeek-R1, Mistral, Qwen, Mamba, Stable Diffusion, voice via Cartesia). Per-token pricing.
Fine-tuning — LoRA + full fine-tuning on hosted infra.
GPU Clusters — dedicated H100 / H200 / GB200 / B200 clusters with InfiniBand, sold by GPU-hour. Includes Together Kernel Collection (Tri Dao's stack) baked in for ~24% faster training.
Together Enterprise Platform — private/VPC deployment, on Dell hardware. Targets Fortune 100 buyers who can't send data to a public inference API.
Agentic + synthetic data — code interpreter (via CodeSandbox acquisition), synthetic-data pipelines.

Customers (publicly disclosed)

Salesforce, Zoom, SK Telecom, Hedra, Cognition (Devin), Cartesia, Zomato, Krea, The Washington Post. They claim 450,000+ AI developers on the platform.

Infrastructure

200 MW of power capacity contracted.
Building out a 36,000 NVIDIA GB200 NVL72 cluster with Hypertec — one of the largest disclosed non-hyperscaler training clusters.
Dell partnership for enterprise on-prem appliances.

5. Technical contributions (the real moat)

What separates Together from CoreWeave / Lambda is they ship their own kernels and architectures, not just compute:

FlashAttention 1/2/3/4 — industry-standard attention kernel. Used by basically every frontier lab. Tri Dao.
Mamba / Mamba-2 — state-space model architecture; alternative to transformers for long sequences. Albert Gu (Cartesia) + Tri Dao.
Hyena, Monarch Mixer, H3 — sub-quadratic attention work. Dan Fu, Ré lab.
Medusa, Sequoia — speculative decoding techniques.
Mixture of Agents — agent ensembling research.
Together Kernel Collection — proprietary kernel set delivering claimed 24% training speedup.
RedPajama — open dataset / model lineage they sponsored early.

This output is materially more research-credible than other inference clouds. Fireworks AI competes on raw inference speed (FireAttention kernels) but doesn't ship novel architectures. Together's research → product flywheel is real.

6. Competitive position

Competitor	Position	Together's edge
Fireworks AI	Direct competitor on inference speed for OSS models	Together has broader model catalog (200+ vs 100+), training clusters, novel architectures (Mamba)
Anyscale	Ray-based distributed compute, more dev-tools focused	Together is more turnkey for inference; Anyscale more flexible for custom workloads
Modal / Replicate	Serverless GPU; more dev experience focus	Together targets enterprise + at-scale; Modal/Replicate target individual devs
CoreWeave / Lambda / Crusoe	Pure GPU rental	Together has the software layer; CoreWeave has more raw GPU supply
AWS Bedrock / Azure / GCP	Hyperscaler inference	Together is faster + cheaper on OSS models; hyperscalers win on enterprise integration
Hyperscaler internal teams	Existential threat. AWS/Azure could absorb this layer.	Together's research credibility + open-model focus is the defensible niche

7. Compensation

Role	Total comp	Sample size
ML Engineer, L3 (SF, 3 yrs exp)	~$324K ($205K base + $110K stock + $8.8K bonus)	1 reported
SWE median (US)	~$290K	Small sample
SWE high mark	$518K	Single high-end report

Source: levels.fyi. Sample sizes are very small — directional only. 4-year equity vest with 1-year cliff. Comp is materially below Anthropic / OpenAI / Meta GenAI at every level — Together pays startup-cash + Series-B-stage equity, not frontier-lab cash.

8. Open roles — Seattle / remote check

Pattern	Count (of 56 visible)
San Francisco	~40+
Amsterdam	~6
India / Singapore	~3
Remote (US)	1 (Sr. PM, Data Center Build)
Seattle	0 explicitly listed

Source: Together AI Greenhouse, fetched May 2026.

Honest Seattle assessment: I previously had Together listed as "remote-friendly" in your Seattle table. That was wrong, or it has changed. The current job board is overwhelmingly SF-collocated. Only one role is explicitly remote. If you want Together, expect to either (a) negotiate remote as a senior hire (possible — Tri Dao is at Princeton, so there's precedent), or (b) move to SF. I'll fix the Seattle table to drop them to Tier C.

9. Fit assessment for your situation

Axis	Score (1-5)	Why
100x equity potential	2	At ~$3.3B (or higher post next round), 100x requires $330B+ outcome. Unlikely — even a great Together exit is more like 5-15x from here. They're past the lottery-ticket stage.
Learning value	4	Best ML-systems bench outside Anthropic/DeepMind/Meta. Direct exposure to Tri Dao, Chris Ré's lineage. Kernels, distributed training, inference at scale. Strong learning.
InvAlign (AI-for-investing)	2	Indirect. Inference cloud is infrastructure, not investing. Useful if you later build an AI-investor and need cheap inference, but irrelevant to learning the methodology.
Fit / chance you land it	4	Your Meta ML eng profile is exactly what they hire. Bar is high but not Anthropic-high. Many roles available.
Seattle	1	Effectively SF-required for ML roles. Would need to negotiate as senior hire.

10. My honest take

Together is interesting but doesn't fit your stated bet. Three points:

Past lottery-ticket stage. At $3.3B+ they're priced like a mid-stage growth company. Best realistic outcome is a Snowflake-style IPO at $30-50B (10-15x for current employees). Not a 100x bet. If you want infrastructure-layer 100x, you want CoreWeave (already past it), Tenstorrent (still small), or a true seed-stage compute startup.
Business risk: hyperscaler absorption. AWS Bedrock, Azure AI Foundry, GCP Vertex are all building exactly what Together sells. Together's defense is research credibility (Mamba, FlashAttention) and open-model focus. Real defense — but if the hyperscalers cut prices aggressively, Together's enterprise GTM has to compete with bundled hyperscaler deals. The next 24 months tell the story.
Wrong on Seattle. SF-collocated. I had this wrong on the Seattle table — fixing now.

If you wanted to optimize for the "learn from the best ML systems people" axis only, Together is excellent and probably second only to Anthropic. If you're optimizing for your stated bet ($1M equity → 100x, ideally Seattle, ideally on AI-investing), Together is wrong on all three axes.

One scenario where Together is the right move: if you want to spin up your own AI-investing company in 2-3 years and need deep ML-systems credibility for the founder résumé. Working with Tri Dao / Chris Ré's lineage is a strong credential. Better than Microsoft AI for that specific outcome — Microsoft gives you the badge, Together gives you the actual systems chops.

11. Open questions / next steps

If you want to pursue Together: target Senior/Staff ML Engineer (Inference Platform) or AI Researcher, Core ML (Turbo), and ask explicitly during recruiter screen whether senior hires can negotiate Seattle remote. Tri Dao being part-time at Princeton is the precedent to cite.
I should update the Seattle ranking table to drop Together from Tier B to Tier C. Want me to do that now?
Worth confirming the rumored $1B / $7.5B round before pricing equity — if it closed, your 100x score drops further.

1. One-line

2. Founders & key people

3. Funding history

4. Product & business model

What they sell

Customers (publicly disclosed)

Infrastructure

5. Technical contributions (the real moat)

6. Competitive position

7. Compensation

8. Open roles — Seattle / remote check

9. Fit assessment for your situation

10. My honest take

11. Open questions / next steps

Sources