1. One-line
Together AI sells open-source model inference, fine-tuning, and GPU cluster training as a hosted cloud — positioning itself as the "anti-OpenAI" infrastructure for teams that want to run Llama / DeepSeek / Mistral / Mamba / Qwen / Cartesia voice etc. at scale without owning GPUs. Roughly $300M ARR (Sep 2025), $3.3B post-money (Feb 2025), reportedly raising again at materially higher mark.
2. Founders & key people
| Name | Role | Background |
| Vipul Ved Prakash | Co-founder & CEO | Previously founded Topsy (acquired Apple, $200M), founded Cloudmark. Long-time SF infra entrepreneur. |
| Ce Zhang | Co-founder & CTO | Was professor at ETH Zürich and Wisconsin, ML systems researcher. |
| Chris Ré | Co-founder | Stanford ML systems professor. MacArthur Fellow. Mentored Tri Dao, Dan Fu, many others. Spun out Snorkel, SambaNova, ColdSpark. |
| Tri Dao | Co-founder & Chief Scientist | FlashAttention (1/2/3/4) author. Mamba co-author. Princeton CS faculty. Ré's PhD student. Arguably the single most-cited applied ML systems researcher of the past 3 years. |
| Percy Liang | Co-founder | Stanford CS prof, director of Stanford HAI / CRFM. Co-led HELM, Alpaca, many open-model efforts. |
| Dan Fu | VP of Kernels | Ré PhD; H3, Hyena, Monarch Mixer co-author. |
| Max Ryabinin | VP of Model Shaping | Ex-Yandex / HSE; SWARM Parallelism, Petals (distributed training of large models on consumer GPUs). |
| Albert Meixner | SVP Engineering | Ex-Google (TPU / Borg-era infra). |
| Mahadev Konar | SVP Engineering Infrastructure | Original Apache ZooKeeper co-creator. Yahoo / Hortonworks lineage. |
| Charles Zedlewski | CPO | Ex-Cloudera CPO. |
This is one of the most credentialed ML-systems benches in the industry. The combination of Ré + Dao + Liang + Zhang gives them an unusually strong claim that they ship state-of-the-art kernels and architectures, not just resell GPUs. This is the moat vs. CoreWeave / Lambda / Crusoe (pure GPU rental).
3. Funding history
| Round | Date | Amount | Lead | Post-money |
| Seed | Nov 2022 | $20M | Kleiner Perkins | — |
| Series A | Nov 2023 | $102.5M | Kleiner Perkins, NVIDIA | ~$1.25B |
| Series A extension | Mar 2024 | $106M | Salesforce Ventures, NVIDIA | ~$1.25B |
| Series B | Feb 2025 | $305M | General Catalyst, Prosperity7 | ~$3.3B |
| Rumored next round | Reportedly mid-2025/26 | ~$1B target | Unconfirmed | Reported $7.5B target — treat as rumor |
Series B participants: General Catalyst, Prosperity7, Salesforce Ventures, DAMAC Capital, NVIDIA, Kleiner Perkins, March Capital, Emergence, Lux Capital, SE Ventures, Greycroft, Coatue, Definition, Cadenza, Long Journey, Brave, Scott Banister, SK Telecom, John Chambers. Total raised: ~$1.18B.
NVIDIA is a repeat strategic investor. This matters for supply — Together gets NVIDIA GPUs (Blackwell, GB200 NVL72) at scale that smaller competitors can't access. It also creates a coupling risk: Together's strategic position depends on NVIDIA continuing to view them as a preferred channel partner.
4. Product & business model
What they sell
- Inference API — OpenAI-compatible REST API for 200+ open-source models (Llama, DeepSeek-R1, Mistral, Qwen, Mamba, Stable Diffusion, voice via Cartesia). Per-token pricing.
- Fine-tuning — LoRA + full fine-tuning on hosted infra.
- GPU Clusters — dedicated H100 / H200 / GB200 / B200 clusters with InfiniBand, sold by GPU-hour. Includes Together Kernel Collection (Tri Dao's stack) baked in for ~24% faster training.
- Together Enterprise Platform — private/VPC deployment, on Dell hardware. Targets Fortune 100 buyers who can't send data to a public inference API.
- Agentic + synthetic data — code interpreter (via CodeSandbox acquisition), synthetic-data pipelines.
Customers (publicly disclosed)
Salesforce, Zoom, SK Telecom, Hedra, Cognition (Devin), Cartesia, Zomato, Krea, The Washington Post. They claim 450,000+ AI developers on the platform.
Infrastructure
- 200 MW of power capacity contracted.
- Building out a 36,000 NVIDIA GB200 NVL72 cluster with Hypertec — one of the largest disclosed non-hyperscaler training clusters.
- Dell partnership for enterprise on-prem appliances.
5. Technical contributions (the real moat)
What separates Together from CoreWeave / Lambda is they ship their own kernels and architectures, not just compute:
- FlashAttention 1/2/3/4 — industry-standard attention kernel. Used by basically every frontier lab. Tri Dao.
- Mamba / Mamba-2 — state-space model architecture; alternative to transformers for long sequences. Albert Gu (Cartesia) + Tri Dao.
- Hyena, Monarch Mixer, H3 — sub-quadratic attention work. Dan Fu, Ré lab.
- Medusa, Sequoia — speculative decoding techniques.
- Mixture of Agents — agent ensembling research.
- Together Kernel Collection — proprietary kernel set delivering claimed 24% training speedup.
- RedPajama — open dataset / model lineage they sponsored early.
This output is materially more research-credible than other inference clouds. Fireworks AI competes on raw inference speed (FireAttention kernels) but doesn't ship novel architectures. Together's research → product flywheel is real.
6. Competitive position
| Competitor | Position | Together's edge |
| Fireworks AI | Direct competitor on inference speed for OSS models | Together has broader model catalog (200+ vs 100+), training clusters, novel architectures (Mamba) |
| Anyscale | Ray-based distributed compute, more dev-tools focused | Together is more turnkey for inference; Anyscale more flexible for custom workloads |
| Modal / Replicate | Serverless GPU; more dev experience focus | Together targets enterprise + at-scale; Modal/Replicate target individual devs |
| CoreWeave / Lambda / Crusoe | Pure GPU rental | Together has the software layer; CoreWeave has more raw GPU supply |
| AWS Bedrock / Azure / GCP | Hyperscaler inference | Together is faster + cheaper on OSS models; hyperscalers win on enterprise integration |
| Hyperscaler internal teams | Existential threat. AWS/Azure could absorb this layer. | Together's research credibility + open-model focus is the defensible niche |
7. Compensation
| Role | Total comp | Sample size |
| ML Engineer, L3 (SF, 3 yrs exp) | ~$324K ($205K base + $110K stock + $8.8K bonus) | 1 reported |
| SWE median (US) | ~$290K | Small sample |
| SWE high mark | $518K | Single high-end report |
Source: levels.fyi. Sample sizes are very small — directional only. 4-year equity vest with 1-year cliff. Comp is materially below Anthropic / OpenAI / Meta GenAI at every level — Together pays startup-cash + Series-B-stage equity, not frontier-lab cash.
8. Open roles — Seattle / remote check
| Pattern | Count (of 56 visible) |
| San Francisco | ~40+ |
| Amsterdam | ~6 |
| India / Singapore | ~3 |
| Remote (US) | 1 (Sr. PM, Data Center Build) |
| Seattle | 0 explicitly listed |
Source: Together AI Greenhouse, fetched May 2026.
Honest Seattle assessment: I previously had Together listed as "remote-friendly" in your Seattle table. That was wrong, or it has changed. The current job board is overwhelmingly SF-collocated. Only one role is explicitly remote. If you want Together, expect to either (a) negotiate remote as a senior hire (possible — Tri Dao is at Princeton, so there's precedent), or (b) move to SF. I'll fix the Seattle table to drop them to Tier C.
9. Fit assessment for your situation
| Axis | Score (1-5) | Why |
| 100x equity potential | 2 | At ~$3.3B (or higher post next round), 100x requires $330B+ outcome. Unlikely — even a great Together exit is more like 5-15x from here. They're past the lottery-ticket stage. |
| Learning value | 4 | Best ML-systems bench outside Anthropic/DeepMind/Meta. Direct exposure to Tri Dao, Chris Ré's lineage. Kernels, distributed training, inference at scale. Strong learning. |
| InvAlign (AI-for-investing) | 2 | Indirect. Inference cloud is infrastructure, not investing. Useful if you later build an AI-investor and need cheap inference, but irrelevant to learning the methodology. |
| Fit / chance you land it | 4 | Your Meta ML eng profile is exactly what they hire. Bar is high but not Anthropic-high. Many roles available. |
| Seattle | 1 | Effectively SF-required for ML roles. Would need to negotiate as senior hire. |
10. My honest take
Together is interesting but doesn't fit your stated bet. Three points:
- Past lottery-ticket stage. At $3.3B+ they're priced like a mid-stage growth company. Best realistic outcome is a Snowflake-style IPO at $30-50B (10-15x for current employees). Not a 100x bet. If you want infrastructure-layer 100x, you want CoreWeave (already past it), Tenstorrent (still small), or a true seed-stage compute startup.
- Business risk: hyperscaler absorption. AWS Bedrock, Azure AI Foundry, GCP Vertex are all building exactly what Together sells. Together's defense is research credibility (Mamba, FlashAttention) and open-model focus. Real defense — but if the hyperscalers cut prices aggressively, Together's enterprise GTM has to compete with bundled hyperscaler deals. The next 24 months tell the story.
- Wrong on Seattle. SF-collocated. I had this wrong on the Seattle table — fixing now.
If you wanted to optimize for the "learn from the best ML systems people" axis only, Together is excellent and probably second only to Anthropic. If you're optimizing for your
stated bet ($1M equity → 100x, ideally Seattle, ideally on AI-investing), Together is wrong on all three axes.
One scenario where Together is the right move: if you want to spin up your own AI-investing company in 2-3 years and need deep ML-systems credibility for the founder résumé. Working with Tri Dao / Chris Ré's lineage is a strong credential. Better than Microsoft AI for that specific outcome — Microsoft gives you the badge, Together gives you the actual systems chops.
11. Open questions / next steps
- If you want to pursue Together: target Senior/Staff ML Engineer (Inference Platform) or AI Researcher, Core ML (Turbo), and ask explicitly during recruiter screen whether senior hires can negotiate Seattle remote. Tri Dao being part-time at Princeton is the precedent to cite.
- I should update the Seattle ranking table to drop Together from Tier B to Tier C. Want me to do that now?
- Worth confirming the rumored $1B / $7.5B round before pricing equity — if it closed, your 100x score drops further.
Sources