How switches work, Ethernet vs InfiniBand, NVIDIA's vertical integration threat, and what AGI means for this industry. | 2026-04-16
When thousands of GPUs train an AI model, they need to constantly exchange data with each other. The network connecting them is often the bottleneck — GPUs sit idle waiting for data to arrive. This makes networking one of the most critical (and expensive) components of AI infrastructure. The central investment question: as AI scales, does the networking market grow and who captures the value? Arista (ANET), Broadcom (AVGO), and NVIDIA (NVDA) are all competing for this market, with very different strategies.
Key stocks: ANET (Arista — Ethernet switch vendor), AVGO (Broadcom — switch ASICs), NVDA (InfiniBand + Spectrum-X), LITE/COHR (optical transceivers that plug into switches), MRVL (custom networking ASICs).
A network switch is a physical box that connects computers together so they can send data to each other. Think of it like a postal sorting facility: data packets arrive at the switch, the switch reads the destination address on each packet, and forwards it out the correct port to reach the right computer.
Without switches, you would need a separate cable from every computer to every other computer. With 1,000 computers, that's ~500,000 cables. A switch lets you plug all 1,000 computers into a relatively small number of switches, and the switches figure out how to route the data.
The physical object:
A data center switch looks like a flat metal box, about the size of a pizza box, that sits in a standard server rack. The front has rows of ports — typically 32 to 64 ports — where fiber optic cables plug in. Each port can handle 100, 400, or 800 gigabits per second. A single high-end switch can move over 50 terabits per second of total throughput — that's roughly the equivalent of streaming 10 million 4K movies simultaneously.
The entire process is done in hardware, on the ASIC. Software handles the control plane (building the forwarding tables, managing the switch), but the actual packet forwarding is pure hardware — that's why it's so fast.
A modern data center might have 100,000+ servers. You can't plug them all into one switch (the biggest switches have ~64 ports). So data centers use a layered architecture called leaf-spine:
North-South Traffic
Data entering or leaving the data center — user requests coming in from the internet, responses going back out. This is what traditional web services generate.
East-West Traffic
Data moving between servers inside the data center. This is what AI training generates — thousands of GPUs constantly exchanging gradient updates. In AI clusters, east-west traffic dominates: 80-90%+ of all data stays inside the cluster.
This distinction matters because AI is dramatically shifting the traffic pattern. Traditional web workloads (serving search results, loading Instagram) are mostly north-south. AI training is almost entirely east-west. The more east-west traffic there is, the more spine switches you need, the higher-bandwidth your links need to be, and the more money gets spent on networking.
This is one of the user's key questions: is a switch a hardware product or a software product? The answer: it's a hardware box, but the value increasingly lives in the software.
| Component | What It Does | Who Makes It | Cost Share |
|---|---|---|---|
| Switch ASIC | The brain. Reads packet headers, does forwarding lookups, makes switching decisions. All in hardware at line rate. | Broadcom (dominant), NVIDIA (Spectrum), Marvell, Intel | 30-40% |
| TCAM memory | Ternary Content-Addressable Memory. Stores forwarding/routing tables for ultra-fast lookups. Specialized memory that can search all entries simultaneously. | Various memory vendors | 10-15% |
| Packet buffer memory | Temporarily holds packets when output ports are congested. Important for handling traffic bursts. | HBM or SRAM vendors | 5-10% |
| Optical transceiver ports | The physical ports where fiber optic cables plug in. Convert between electrical signals (inside the switch) and light (on the fiber cable). | Lumentum (LITE), Coherent (COHR), InnoLight, etc. | 25-35% |
| PCB, chassis, fans, PSU | Circuit board, metal enclosure, cooling, power supply. Standard electronics manufacturing. | Contract manufacturers | 10-15% |
The switch ASIC handles the fast path (forwarding packets). But you also need software to:
This software is called the Network Operating System (NOS). It's where Arista's competitive advantage lives.
Arista EOS
Why hyperscalers prefer it: programmable, reliable, modern architecture.
Cisco IOS/NX-OS (legacy)
Why cloud customers left: complexity, bugs, vendor lock-in.
So is it hardware or software?
The switch is physically a hardware product — you buy a box. But the ASIC inside is a commodity that anyone can buy from Broadcom. What differentiates Arista from a generic "white box" switch is EOS — the software. This is why Arista's gross margin is ~64%, which is remarkably high for a hardware company and looks more like a software margin. The software is where the value and the moat live.
That said, you still need to design the hardware, manage the supply chain, qualify the optics, test the whole system. It's not pure software. The business model is: sell hardware boxes at healthy margins, provide ongoing software support and subscriptions. Increasingly, Arista is also selling software-only licenses (CloudVision, DANZ) that recur annually.
Ethernet is a networking protocol — a set of rules for how computers package and send data to each other. It was invented at Xerox PARC in 1973 and has been the dominant local networking standard for over 50 years.
Ethernet is defined by the IEEE 802.3 standard. It's an open standard — anyone can build Ethernet equipment, and products from different vendors work together. This is in stark contrast to InfiniBand (discussed below), which is effectively controlled by NVIDIA.
| Year | Speed | Name | Context |
|---|---|---|---|
| 1980 | 10 Mbps | Ethernet | Original Xerox/Intel/DEC specification |
| 1995 | 100 Mbps | Fast Ethernet | Enough for basic web browsing |
| 1999 | 1 Gbps | Gigabit Ethernet | Standard for home/office networks today |
| 2006 | 10 Gbps | 10 GigE | First generation of data center Ethernet |
| 2010 | 40 Gbps | 40 GigE | Data center spine links |
| 2015 | 25/100 Gbps | 25/100 GigE | Server-to-switch (25G), spine (100G) |
| 2018 | 200/400 Gbps | 200/400 GigE | AI-era data center networking |
| 2024 | 800 Gbps | 800 GigE | AI backend networks, now shipping |
| ~2026 | 1.6 Tbps | 1.6T Ethernet | Next generation, in development |
The key takeaway: Ethernet keeps doubling in speed every 3-4 years. It started as a 10 Mbps office protocol and is now running at 800 Gbps in AI data centers — an 80,000x increase in speed. This relentless pace of improvement is part of why Ethernet keeps winning: it's a moving target that's hard for alternatives to outrun.
Why Ethernet's ubiquity matters:
Every network engineer in the world knows Ethernet. Every switch, router, NIC, server, and operating system supports it. There are thousands of vendors, mature tooling, abundant talent, and decades of operational experience. This installed base and ecosystem is an enormous moat. For any competing technology (like InfiniBand), "being slightly better technically" is not enough — you have to be dramatically better to overcome the switching costs and ecosystem advantage of Ethernet.
InfiniBand is a different networking protocol, designed specifically for high-performance computing (HPC). It was created in the early 2000s by a consortium including Intel, IBM, Sun, and others. The original goal was to replace the PCI bus inside computers, but it evolved into a network interconnect for supercomputers.
The key company: Mellanox Technologies, an Israeli company that became the dominant InfiniBand vendor. NVIDIA acquired Mellanox in 2020 for $6.9 billion. This acquisition gave NVIDIA control over InfiniBand — the leading high-performance networking technology.
| Property | Ethernet | InfiniBand |
|---|---|---|
| Governance | Open standard (IEEE) | NVIDIA-controlled (IBTA) |
| Ecosystem | Thousands of vendors | Essentially NVIDIA only |
| Latency | ~1-2 microseconds | ~0.5-0.6 microseconds |
| RDMA support | RoCE v2 (add-on, not native) | Native, built-in from day one |
| Congestion management | ECN-based (reactive) | Credit-based (proactive) |
| Adaptive routing | Limited (ECMP hashing) | Built-in, dynamic load balancing |
| Price | Competitive (many vendors) | Premium (monopoly pricing) |
| Scalability | Proven at massive scale (100K+ nodes) | Historically limited to ~10K nodes |
| Interoperability | Multi-vendor | NVIDIA hardware only |
| Cost per port | Lower (commodity) | Higher (premium) |
RDMA (Remote Direct Memory Access) is the single most important technical concept in this whole debate. Here's what it means:
In normal networking, when Computer A wants to send data to Computer B:
That's multiple memory copies and multiple context switches between application and operating system. Each one adds latency and burns CPU cycles.
With RDMA:
Zero CPU involvement. Zero memory copies. Zero OS overhead. The network cards talk directly to application memory, bypassing the entire operating system. This dramatically reduces latency and frees the CPU to do other work.
Why RDMA matters for AI training:
During AI training, GPUs need to exchange gradient updates thousands of times per second. Each exchange involves reading a chunk of GPU memory, sending it across the network, and writing it into another GPU's memory. With RDMA, this happens directly — GPU memory to network to GPU memory — without bothering the CPU or operating system. This can reduce communication time by 30-50% compared to traditional TCP/IP networking. When you have 10,000 GPUs and communication time is the bottleneck, that difference is enormous.
InfiniBand has had RDMA built in since day one. Ethernet added it later as an extension called RoCE v2 (RDMA over Converged Ethernet). RoCE v2 works, but it requires careful network configuration (lossless Ethernet with Priority Flow Control) that traditional Ethernet doesn't need. It's an afterthought bolted onto Ethernet, whereas in InfiniBand, RDMA is the native mode of operation.
To understand why networking is so important for AI, you need to understand one operation: all-reduce.
The all-reduce step is a collective communication operation where every GPU sends data to every other GPU. In a cluster of N GPUs, the total data exchanged scales with the model size multiplied by the number of GPUs. For a model like GPT-4 with a trillion parameters, training on 25,000 GPUs, the all-reduce step might need to move hundreds of terabytes of data per training iteration. And there are millions of iterations.
The network is the bottleneck
GPUs are incredibly fast at computation — a single H100 can do 2,000 teraflops. But that computation is useless if the GPU is sitting idle, waiting for gradient data to arrive from other GPUs. In large training runs, 30-50% of the total training time can be spent waiting for network communication. Every microsecond of extra network latency, multiplied by billions of iterations, translates into weeks of extra training time and millions of dollars in electricity and GPU rental costs.
This is why NVIDIA pushes InfiniBand and Spectrum-X so hard, and why hyperscalers spend billions on networking. The network isn't a commodity utility — it's a core performance lever for AI training.
| Cluster Size | Approximate Network Bandwidth Needed | Number of Switches | Network Cost |
|---|---|---|---|
| 1,000 GPUs | ~400 Tbps aggregate | ~50-80 | ~$20-40M |
| 10,000 GPUs | ~4 Pbps aggregate | ~500-800 | ~$200-400M |
| 100,000 GPUs | ~40 Pbps aggregate | ~5,000-8,000 | ~$2-4B |
Networking is typically 10-15% of the total cost of an AI cluster. For a $10B GPU cluster, that's $1-1.5B spent on switches, cables, and optics. This is a big market.
This is the central competitive question in data center networking today. Two networking technologies are fighting to connect AI clusters:
Team InfiniBand (NVIDIA)
Downside: vendor lock-in to NVIDIA. Premium pricing. Limited scale.
Team Ethernet (Arista, Broadcom, the world)
Downside: higher latency. RDMA is bolt-on, not native. Congestion management is harder.
| Customer | AI Network Choice | Why |
|---|---|---|
| NVIDIA (DGX systems) | InfiniBand | They own it. Optimized for their GPUs. Maximum control. |
| Meta | Ethernet | Builds own network. Uses Arista switches + Broadcom ASICs. Open ecosystem. |
| Microsoft Azure | Both | InfiniBand for biggest AI clusters (NVIDIA DGX), Ethernet for everything else. |
| Custom (Ethernet-based) | Builds own switches and TPU interconnect. Doesn't buy from NVIDIA or Arista. | |
| Amazon AWS | Custom (EFA) | Elastic Fabric Adapter — proprietary, Ethernet-based. Nitro networking. |
| Oracle Cloud | InfiniBand | Differentiated on InfiniBand for HPC customers (RDMA). |
| xAI (Elon Musk) | InfiniBand | Used InfiniBand for 100K GPU Colossus cluster (NVIDIA-supplied). |
| CoreWeave | InfiniBand | NVIDIA partner, uses NVIDIA's full stack including networking. |
The pattern: NVIDIA's own customers and partners use InfiniBand. The largest independent hyperscalers build on Ethernet. The question is which camp grows faster.
The trend is shifting toward Ethernet for AI
Three forces are pushing the market toward Ethernet:
Bottom line: InfiniBand is technically superior for small-to-medium AI clusters. But Ethernet is "good enough" and getting better fast, and the world doesn't want NVIDIA to own the network too. The structural forces favor Ethernet long-term.
Here's where it gets interesting — and where the threat to Arista gets real.
NVIDIA saw the Ethernet trend coming. Their response: if the world is going to use Ethernet for AI instead of InfiniBand, NVIDIA will make its own Ethernet networking product. That product is Spectrum-X.
Spectrum-X is NVIDIA's complete Ethernet networking platform for AI, consisting of three components:
| Component | What It Is | What It Replaces |
|---|---|---|
| Spectrum-4 switch ASIC | NVIDIA's own 51.2 Tbps Ethernet switch chip. Competitive with Broadcom's Tomahawk 5. | Broadcom Tomahawk (used by Arista) |
| BlueField-3 DPU | Data Processing Unit — a smart NIC that offloads networking, security, and storage from the CPU. Sits in each server. | Standard NICs (Mellanox ConnectX) |
| NVIDIA networking software | AI-optimized congestion control, adaptive routing, and telemetry. Designed specifically for GPU-to-GPU communication patterns. | Arista EOS / standard Ethernet software |
NVIDIA's pitch: "Spectrum-X delivers 1.6x the effective AI performance of traditional Ethernet at the same cost." They claim this by optimizing the entire stack — switch ASIC, NIC, and software — specifically for AI traffic patterns (many-to-many GPU communication, bursty traffic, large messages).
Why this threatens Arista:
If you're a company building an AI cluster and you're already buying NVIDIA GPUs, NVIDIA now says: "Buy our switches too. They work better with our GPUs because we optimize the entire stack end-to-end." This is the same vertical integration playbook that Apple uses (we make the chip AND the software AND the hardware, so they all work together perfectly).
If Spectrum-X succeeds, Arista loses the most valuable part of the networking market — AI backend networks. Arista would still sell switches for non-AI workloads (enterprise, cloud, campus), but the AI premium growth driver would belong to NVIDIA.
Early signs are mixed:
Behind Arista's switches lies another company with enormous market power: Broadcom.
Broadcom designs the switch ASICs — the custom chips that do the actual packet forwarding inside the switch. Their two main product lines:
| Product Line | Use Case | Latest Generation | Throughput | Key Feature |
|---|---|---|---|---|
| Tomahawk | High-bandwidth, low-latency switching for data center fabrics | Tomahawk 5 (2024) | 51.2 Tbps | Maximum bandwidth per chip. Used in spine switches and AI backend networks. |
| Jericho | Deep-buffer routing for WAN and peering | Jericho3-AI (2024) | 38.4 Tbps | Large packet buffers + routing. Jericho3-AI adds AI traffic optimization features. |
| Memory Memory Memory Trident | Feature-rich switching for enterprise/campus | Trident 5 | 12.8 Tbps | Rich feature set (ACLs, QoS, monitoring). Not used in AI clusters. |
Broadcom's switch ASIC market share in data centers is estimated at 70-80%+. Arista, Cisco, and most other switch vendors all buy Broadcom ASICs. The main alternatives are NVIDIA's Spectrum (used only in NVIDIA switches) and Marvell's Teralynx (smaller market share).
The supply chain:
Broadcom designs the ASIC → TSMC manufactures it → Broadcom sells it to Arista → Arista combines it with their EOS software, ports, optics, and chassis → Arista sells the complete switch to Microsoft, Meta, etc. At each step, value is added. Broadcom captures 30-40% of the switch BOM. Arista captures the rest through system integration and software. This is why Arista and Broadcom are symbiotic — Arista needs Broadcom's chips, Broadcom needs Arista's market access.
| Company | Role | AI Networking Strategy | Market Cap | Networking Revenue |
|---|---|---|---|---|
| Arista (ANET) | Ethernet switch vendor | Partner with Broadcom. Best NOS software. AI-optimized features in EOS. Targets $3.25B AI revenue by 2026. | ~$120B | $9B total |
| NVIDIA (NVDA) | GPU + networking | Vertical integration. InfiniBand for loyal customers. Spectrum-X Ethernet to capture the Ethernet shift. Own the full stack. | ~$4.4T | ~$15B networking |
| Broadcom (AVGO) | Switch ASIC + NIC vendor | Sell ASICs to everyone. Tomahawk 5 and Jericho3-AI for AI. Also building custom AI chips (XPUs) for hyperscalers. | ~$1.1T | ~$15B networking |
| Cisco (CSCO) | Legacy network vendor | Silicon One chip. Trying to stay relevant. Acquired for AI/ML networking startups. Still dominant in enterprise but losing cloud/DC share to Arista. | ~$250B | ~$14B switching |
| Juniper (HPE) | Enterprise/SP networking | Acquired by HPE for $14B (2024). Focused on enterprise and service providers. Not a major AI networking player. | (now HPE) | ~$5B |
| Marvell (MRVL) | ASIC + NIC vendor | Teralynx switch ASIC. Custom ASICs for hyperscalers. Smaller but growing networking business. | ~$80B | ~$2B networking |
Let's tie this back to the AGI thesis that frames all our investment analysis.
If you believe AGI is coming (and we do), here's what it means for networking:
The offsetting risk: NVIDIA is trying to vertically integrate from GPU all the way through the network switch. If they succeed, Arista loses the AI networking market to NVIDIA's Spectrum-X. The key question is whether hyperscalers buy NVIDIA's full stack or insist on open Ethernet with vendor choice.
History suggests that open standards eventually win over proprietary alternatives, especially when the biggest customers (hyperscalers) have a strong incentive to avoid vendor lock-in. The pattern repeats across tech:
Most likely scenario: Ethernet wins the volume, NVIDIA keeps InfiniBand/Spectrum-X for its premium DGX customers, and Arista captures a large share of the Ethernet AI networking market. The total networking TAM grows fast enough that both Arista and NVIDIA do well, but Arista's growth rate depends on how quickly Ethernet displaces InfiniBand in AI clusters.
| Metric | Value | Source / Context |
|---|---|---|
| Total data center switching market (2025) | ~$15-18B | Includes Ethernet + InfiniBand, switches only (not optics or cables) |
| AI backend networking TAM (2025) | ~$5-8B | Switches + NICs + optics specifically for AI GPU clusters |
| AI networking TAM (2028E) | ~$15-25B | Growing 30-40% CAGR as AI clusters scale |
| Arista AI-specific revenue (FY2026 target) | $3.25B | ~30% of total projected revenue |
| NVIDIA networking revenue (FY2025) | ~$15B | InfiniBand + Spectrum-X + ConnectX NICs |
| Broadcom networking revenue (FY2025) | ~$15B | Switch ASICs + custom XPUs + NICs |
| InfiniBand share of AI backend networking | ~50-60% | Declining as Ethernet gains share |
| Ethernet share of AI backend networking | ~40-50% | Growing, especially among hyperscalers |
| Average switch ASP (high-end 400G/800G) | $50-150K | Per switch, depending on port count and speed |
| Network cost as % of AI cluster | 10-15% | Switches + optics + cables + NICs |
Every port on a switch needs an optical transceiver — a small module that converts electrical signals to light for transmission over fiber optic cables. These are the components made by Lumentum (LITE) and Coherent (COHR).
As switch speeds increase (400G → 800G → 1.6T), optical transceivers get more complex and expensive. A single 800G transceiver costs $500-1,500. A large AI cluster might need 50,000-100,000 transceivers. At $1,000 each, that's $50-100M just in optics — a significant cost.
The next frontier is co-packaged optics (CPO), where the optical transceiver is integrated directly onto the switch ASIC package rather than being a separate pluggable module. CPO promises lower power consumption and higher density, but it's still in early development. NVIDIA has invested $2B in Lumentum specifically for CPO development. If CPO succeeds, it could change the economics of optical transceivers (fewer discrete modules, but higher ASIC cost).
ANET (Arista): The leading merchant Ethernet switch vendor. Benefits enormously if Ethernet wins over InfiniBand for AI. The threat is NVIDIA Spectrum-X taking the AI segment. Our fair value estimate of $400 assumes Arista captures its share of AI networking but doesn't dominate it (NVIDIA takes some). At ~$380, the stock is roughly fairly valued — not a screaming buy, but a high-quality business with a real AI tailwind. Would be more interesting at $250-300.
AVGO (Broadcom): The hidden monopoly. Makes the switch ASICs that go inside Arista's (and most other) switches. Also designing custom AI chips (XPUs) for Google and Meta. Broadcom benefits regardless of who wins the switch vendor battle — they supply the silicon to almost everyone. This is the most picks-and-shovels play in networking.
NVDA (NVIDIA): Owns both InfiniBand and Spectrum-X. Networking is ~6-7% of NVIDIA's revenue but growing fast. NVIDIA's vertical integration strategy (GPU + NIC + switch + software) is the existential threat to Arista and Broadcom in the AI segment. But networking is a small tail on a very large GPU dog — NVIDIA's stock price is driven by GPU demand, not networking.
LITE/COHR (Lumentum, Coherent): Make the optical transceivers that plug into every switch port. They benefit from the raw growth in port count regardless of who makes the switch. The CPO transition could reshape their business model. Both stocks have run up substantially on AI optics demand.
The big picture: AI is creating a massive structural increase in networking demand. The total networking TAM for AI could reach $15-25B by 2028. The question isn't whether the market grows — it's who captures it. Broadcom is probably the safest networking bet (they supply everyone). Arista is a great business but faces the NVIDIA threat. NVIDIA's networking revenue is growing fast but is a small part of their overall story.
Sources: Company filings (ANET, NVDA, AVGO 10-Ks), IEEE 802.3 standards, Ultra Ethernet Consortium specifications, Dell'Oro Group market data, industry analyst reports. Market sizes are approximate. Report date: April 2026.