Together AI — Deep Dive

AI Acceleration Cloud. Inference + fine-tuning + GPU clusters for open-source models. Founders include Tri Dao (FlashAttention) and Percy Liang (Stanford HAI).

1. One-line

Together AI sells open-source model inference, fine-tuning, and GPU cluster training as a hosted cloud — positioning itself as the "anti-OpenAI" infrastructure for teams that want to run Llama / DeepSeek / Mistral / Mamba / Qwen / Cartesia voice etc. at scale without owning GPUs. Roughly $300M ARR (Sep 2025), $3.3B post-money (Feb 2025), reportedly raising again at materially higher mark.

2. Founders & key people

NameRoleBackground
Vipul Ved PrakashCo-founder & CEOPreviously founded Topsy (acquired Apple, $200M), founded Cloudmark. Long-time SF infra entrepreneur.
Ce ZhangCo-founder & CTOWas professor at ETH Zürich and Wisconsin, ML systems researcher.
Chris RéCo-founderStanford ML systems professor. MacArthur Fellow. Mentored Tri Dao, Dan Fu, many others. Spun out Snorkel, SambaNova, ColdSpark.
Tri DaoCo-founder & Chief ScientistFlashAttention (1/2/3/4) author. Mamba co-author. Princeton CS faculty. Ré's PhD student. Arguably the single most-cited applied ML systems researcher of the past 3 years.
Percy LiangCo-founderStanford CS prof, director of Stanford HAI / CRFM. Co-led HELM, Alpaca, many open-model efforts.
Dan FuVP of KernelsRé PhD; H3, Hyena, Monarch Mixer co-author.
Max RyabininVP of Model ShapingEx-Yandex / HSE; SWARM Parallelism, Petals (distributed training of large models on consumer GPUs).
Albert MeixnerSVP EngineeringEx-Google (TPU / Borg-era infra).
Mahadev KonarSVP Engineering InfrastructureOriginal Apache ZooKeeper co-creator. Yahoo / Hortonworks lineage.
Charles ZedlewskiCPOEx-Cloudera CPO.
This is one of the most credentialed ML-systems benches in the industry. The combination of Ré + Dao + Liang + Zhang gives them an unusually strong claim that they ship state-of-the-art kernels and architectures, not just resell GPUs. This is the moat vs. CoreWeave / Lambda / Crusoe (pure GPU rental).

3. Funding history

RoundDateAmountLeadPost-money
SeedNov 2022$20MKleiner Perkins
Series ANov 2023$102.5MKleiner Perkins, NVIDIA~$1.25B
Series A extensionMar 2024$106MSalesforce Ventures, NVIDIA~$1.25B
Series BFeb 2025$305MGeneral Catalyst, Prosperity7~$3.3B
Rumored next roundReportedly mid-2025/26~$1B targetUnconfirmedReported $7.5B target — treat as rumor

Series B participants: General Catalyst, Prosperity7, Salesforce Ventures, DAMAC Capital, NVIDIA, Kleiner Perkins, March Capital, Emergence, Lux Capital, SE Ventures, Greycroft, Coatue, Definition, Cadenza, Long Journey, Brave, Scott Banister, SK Telecom, John Chambers. Total raised: ~$1.18B.

NVIDIA is a repeat strategic investor. This matters for supply — Together gets NVIDIA GPUs (Blackwell, GB200 NVL72) at scale that smaller competitors can't access. It also creates a coupling risk: Together's strategic position depends on NVIDIA continuing to view them as a preferred channel partner.

4. Product & business model

What they sell

Customers (publicly disclosed)

Salesforce, Zoom, SK Telecom, Hedra, Cognition (Devin), Cartesia, Zomato, Krea, The Washington Post. They claim 450,000+ AI developers on the platform.

Infrastructure

5. Technical contributions (the real moat)

What separates Together from CoreWeave / Lambda is they ship their own kernels and architectures, not just compute:

This output is materially more research-credible than other inference clouds. Fireworks AI competes on raw inference speed (FireAttention kernels) but doesn't ship novel architectures. Together's research → product flywheel is real.

6. Competitive position

CompetitorPositionTogether's edge
Fireworks AIDirect competitor on inference speed for OSS modelsTogether has broader model catalog (200+ vs 100+), training clusters, novel architectures (Mamba)
AnyscaleRay-based distributed compute, more dev-tools focusedTogether is more turnkey for inference; Anyscale more flexible for custom workloads
Modal / ReplicateServerless GPU; more dev experience focusTogether targets enterprise + at-scale; Modal/Replicate target individual devs
CoreWeave / Lambda / CrusoePure GPU rentalTogether has the software layer; CoreWeave has more raw GPU supply
AWS Bedrock / Azure / GCPHyperscaler inferenceTogether is faster + cheaper on OSS models; hyperscalers win on enterprise integration
Hyperscaler internal teamsExistential threat. AWS/Azure could absorb this layer.Together's research credibility + open-model focus is the defensible niche

7. Compensation

RoleTotal compSample size
ML Engineer, L3 (SF, 3 yrs exp)~$324K ($205K base + $110K stock + $8.8K bonus)1 reported
SWE median (US)~$290KSmall sample
SWE high mark$518KSingle high-end report

Source: levels.fyi. Sample sizes are very small — directional only. 4-year equity vest with 1-year cliff. Comp is materially below Anthropic / OpenAI / Meta GenAI at every level — Together pays startup-cash + Series-B-stage equity, not frontier-lab cash.

8. Open roles — Seattle / remote check

PatternCount (of 56 visible)
San Francisco~40+
Amsterdam~6
India / Singapore~3
Remote (US)1 (Sr. PM, Data Center Build)
Seattle0 explicitly listed

Source: Together AI Greenhouse, fetched May 2026.

Honest Seattle assessment: I previously had Together listed as "remote-friendly" in your Seattle table. That was wrong, or it has changed. The current job board is overwhelmingly SF-collocated. Only one role is explicitly remote. If you want Together, expect to either (a) negotiate remote as a senior hire (possible — Tri Dao is at Princeton, so there's precedent), or (b) move to SF. I'll fix the Seattle table to drop them to Tier C.

9. Fit assessment for your situation

AxisScore (1-5)Why
100x equity potential2At ~$3.3B (or higher post next round), 100x requires $330B+ outcome. Unlikely — even a great Together exit is more like 5-15x from here. They're past the lottery-ticket stage.
Learning value4Best ML-systems bench outside Anthropic/DeepMind/Meta. Direct exposure to Tri Dao, Chris Ré's lineage. Kernels, distributed training, inference at scale. Strong learning.
InvAlign (AI-for-investing)2Indirect. Inference cloud is infrastructure, not investing. Useful if you later build an AI-investor and need cheap inference, but irrelevant to learning the methodology.
Fit / chance you land it4Your Meta ML eng profile is exactly what they hire. Bar is high but not Anthropic-high. Many roles available.
Seattle1Effectively SF-required for ML roles. Would need to negotiate as senior hire.

10. My honest take

Together is interesting but doesn't fit your stated bet. Three points:
  1. Past lottery-ticket stage. At $3.3B+ they're priced like a mid-stage growth company. Best realistic outcome is a Snowflake-style IPO at $30-50B (10-15x for current employees). Not a 100x bet. If you want infrastructure-layer 100x, you want CoreWeave (already past it), Tenstorrent (still small), or a true seed-stage compute startup.
  2. Business risk: hyperscaler absorption. AWS Bedrock, Azure AI Foundry, GCP Vertex are all building exactly what Together sells. Together's defense is research credibility (Mamba, FlashAttention) and open-model focus. Real defense — but if the hyperscalers cut prices aggressively, Together's enterprise GTM has to compete with bundled hyperscaler deals. The next 24 months tell the story.
  3. Wrong on Seattle. SF-collocated. I had this wrong on the Seattle table — fixing now.
If you wanted to optimize for the "learn from the best ML systems people" axis only, Together is excellent and probably second only to Anthropic. If you're optimizing for your stated bet ($1M equity → 100x, ideally Seattle, ideally on AI-investing), Together is wrong on all three axes.
One scenario where Together is the right move: if you want to spin up your own AI-investing company in 2-3 years and need deep ML-systems credibility for the founder résumé. Working with Tri Dao / Chris Ré's lineage is a strong credential. Better than Microsoft AI for that specific outcome — Microsoft gives you the badge, Together gives you the actual systems chops.

11. Open questions / next steps

Sources