Episode 102: AI Changed the Network: Inside the Ethernet Fabric Powering AI Infrastructure

ELI5/TLDR

When you train a giant AI model, you don’t use one chip — you use thousands, and they have to talk to each other constantly, swapping numbers faster than they can do the actual math. The wires between the chips become the bottleneck. This is a conversation with a Broadcom networking executive about the cables, chips, and light-pipes that move data inside an AI data center, and how that humble plumbing has quietly become the hardest part of the whole machine. The punchline: the network is no longer a pipe between computers, it’s basically an extension of the computers’ own memory.

The Full Story

Why a chat about wiring is actually a chat about AI

Here is the thing nobody tells you about AI: most of the difficulty isn’t the math. A graphics chip (a GPU) can do staggering amounts of arithmetic. But to train a model, you need thousands of them, and they spend a huge fraction of their time not computing but waiting for each other — passing partial results back and forth so the whole cluster stays in sync. If the wiring between them is slow, the expensive chips sit idle. So the cables become the bottleneck, and the people who make the cables and the switches that route the traffic suddenly matter as much as the people who make the chips.

That’s the frame for this whole interview. The guest, Hassan (from Broadcom — the host keeps the surname light), runs the networking-silicon side. The host, Nick, opens with a number that sets the tone: Ethernet, the standard plumbing of computer networks, started in 1983 at 10 megabits per second. It took until 2010 to reach 100 gigabits — roughly ten thousand times faster, over 27 years. Then, in the next 14 years, it jumped another ten-thousand-fold, into the terabit range. The acceleration itself accelerated, and AI is the reason.

“But with the introduction of AI training, distributed inference, it’s just the need for bandwidth has skyrocketed.”

To feel the gap: an ordinary corporate office data center today still runs on 25- or 50-gigabit connections. The AI machines Hassan is describing run at 800 gigabits per port, with 1.6-terabit and 3.2-terabit on the roadmap. Same word — “Ethernet” — wildly different animal.

Three different distances, three different problems

The cleanest idea in the interview is that “connecting GPUs” isn’t one problem, it’s three, and they’re sorted by distance. Think of it like a company: people at the same desk cluster, people in the same building, and people across different offices all need to communicate, but you’d never use the same method for all three.

Scale-up — the same rack, chips reaching into each other’s memory. This is the tightest circle. A group of GPUs sitting in one cabinet are wired so they can read each other’s memory directly, as if it were one giant shared brain. The numbers here are almost absurd: 40 terabytes per second of memory bandwidth today, heading to 100. To not choke that, you need around 10 terabits of networking bandwidth per chip, going to 25.

Counterintuitively, this hardest-bandwidth job needs the simplest network. Why? Because everything is one hop away — desk to neighboring desk. When the destinations are all close and few, you don’t need elaborate addressing.

“You need you do not need very very large Ethernet headers. So this is where this concept of optimized headers has come into play.”

An “Ethernet header” is the addressing label stapled to every chunk of data — like the to/from block on an envelope. If you’re only ever mailing to the same small room, you can shrink the envelope and stop wasting space on a giant address. That’s the “optimized headers” work happening in an industry group called Ultra Ethernet. The other non-negotiable at this range is reliability: these aren’t web pages where a dropped packet is shrugged off, these are memory transactions. A lost one is a corrupted thought.

Scale-out — across the building, the elephant problem. Now zoom out to thousands of GPUs across a data center hall. Here the traffic pattern breaks the old rules. The regular internet (TCP/IP) carries millions of small, varied flows, and it spreads them across available routes smoothly, like a city’s worth of cars naturally filling every lane. AI traffic is the opposite: a handful of enormous flows, nicknamed elephant flows.

“Think of this as an eight lane highway. If everybody is driving in one lane you’re going to have congestion.”

You have eight lanes and three trucks the size of buildings, all insisting on lane one. So the network’s job becomes (a) load balancing — shove the elephants into different lanes, (b) congestion control — keep the lanes from jamming, and (c) recovering instantly from a broken link. That last one is where Hassan keeps returning, because of a specific AI cruelty: if a single connection hiccups mid-training (a “link flap”), the job can’t just route around it — it has to roll back to its last saved checkpoint and redo the work. One flaky cable, and a thousand-chip cluster loses hours.

Scale-across — between buildings, fighting physics. Eventually a single data center runs out of power — not space, power. So clusters start spanning multiple buildings, miles apart. Now you’re fighting the speed of light (signals take real time to travel), you have to keep the network “lossless” over distance, and you’ve left your secure building, so the data needs encryption on the way. Different problem again.

Hassan’s pitch is that all three can run on Ethernet — one technology family stretched across all three distances — rather than needing a separate exotic system. The unspoken rival here is InfiniBand, Nvidia’s specialized AI-networking technology, long prized for low latency. The host gives it a polite nod; Hassan’s whole argument is that Ethernet’s larger, more open ecosystem can now match it and evolve faster.

Light, copper, and the lasers that keep dying

The second big theme: as speeds climb, you can no longer talk about networking without talking about optics — converting electrical signals into pulses of light and back, because past a certain speed and distance, copper wire gives up.

“You really can’t talk about networking without optics today. They are now just tied at the hip.”

Inside a single rack (scale-up), copper still wins — cheap, low-power, reliable. But people want bigger clusters of directly-connected chips (256, 512, eventually a thousand), and copper simply can’t reach that far at these speeds. The faster the signal, the shorter the distance copper can carry it. The fix is an open optical standard Broadcom built with Meta, Microsoft, OpenAI, Nvidia and AMD, called OCI, which pushes the reach to 2 kilometers while keeping copper’s virtues of low power and cost.

On the scale-out side, the frontier is co-packaged optics (CPO) — moving the light-conversion engines from a plug-in module right onto the same package as the switching chip. The payoff is a 70% power saving, which at data-center scale is enormous. The catch is reliability: if you weld the optics onto the silicon and one part fails, do you throw out the whole switch?

The most charming fact in the episode is the answer to what fails. It’s the lasers. So the lasers are deliberately kept outside, on the front panel, while the rest of the optics integrate onto the chip.

“The lasers actually have a higher probability of failure than the silicon.” / “Correct. Correct.” / “Wow.”

How much does this reliability work matter? Broadcom and Meta tested co-packaged optics to 50 million link-flap-free device-hours — roughly ten times more reliable than plug-in optics. Given that every link flap can blow up a training run, that 10x is the whole ballgame.

Show and tell, and the enterprise twist

The middle of the interview is hardware show-and-tell — Hassan brought the actual chips. The headliners: the Tomahawk 6, billed as the only 100-terabit switch in production, which fits 128 ports of 800-gig (or 64 of 1.6-terabit) and can anchor a 128,000-GPU cluster. The Jericho 4 / “Qumran 4D” router carries its own stacked memory (HBM) as deep buffers — surge tanks for when you’re sending data across long distances and need somewhere to park it. And the Thor Ultra, billed as the world’s first 800-gig network card, which modernizes the old memory-sharing protocol (RDMA) for AI traffic patterns.

The closing turn is the one most relevant to anyone not running a hyperscaler. So far AI infrastructure has been a story about giants — Meta, Microsoft, OpenAI. But ordinary large companies don’t need 100,000-chip monsters; they need a few hundred chips, maybe fewer. And the same silicon that builds the monsters can, at small scale, collapse an entire AI network fabric into one switch. Hassan’s headline comparison: gear that used to be a 14-to-16-unit rack drawing 20,000 watts and costing a million dollars now fits in a 2-unit box — roughly one-seventh the size, one-tenth the power, one-tenth the cost, eight times the performance. That compression is what drags AI down from the hyperscalers into normal enterprise reach.

Key Takeaways

Ethernet went from 10 Mbps (1983) to 100 Gbps (2010) to terabit speeds within ~14 years; AI demand drove the second acceleration.
AI clusters are bottlenecked by the network, not the chips — GPUs idle while waiting to exchange data, so faster links directly mean cheaper training.
“Connecting GPUs” splits into three problems by distance: scale-up (one rack, shared memory), scale-out (across a hall), scale-across (between buildings).
Scale-up needs huge bandwidth but a simple network — everything one hop away, so addressing labels (“headers”) can be shrunk to save overhead (the Ultra Ethernet effort).
Memory transactions over scale-up demand near-perfect reliability; a dropped packet is corrupted data, not a reloadable web page.
AI traffic is elephant flows — a few giant flows instead of many small ones — which defeats normal load balancing and causes congestion (the “eight-lane highway, everyone in lane one” problem).
A single link flap (momentary connection drop) forces a training job to roll back to its last checkpoint, so fast link-failure recovery is critical.
Data centers hit a power ceiling before a space ceiling, which is why clusters now span multiple buildings — invoking speed-of-light latency, lossless-over-distance, and encryption needs.
Copper still wins inside a rack (cheap, low-power, reliable) but runs out of reach as speeds rise; the OCI optical standard extends scale-up to 2 km without copper’s downsides.
Co-packaged optics (CPO) fuses light-conversion onto the switch chip for ~70% power savings; the reliability risk is managed by keeping the failure-prone lasers external on the front panel.
Lasers fail more often than the silicon — the reason they’re deliberately kept removable.
Broadcom/Meta validated CPO at 50 million link-flap-free device-hours, ~10x more reliable than pluggable optics.
A modern switch (Tomahawk 6) can collapse what was a 14–16U, 20kW, $1M rack into a 2U box at ~1/10th the power and cost with 8x the performance — bringing AI networking within enterprise reach.

Claude’s Take

This is a conference-floor interview, and it shows. The host is friendly and the guest is a Broadcom executive holding up Broadcom chips, so the framing — “Ethernet beats InfiniBand, and Broadcom builds the best Ethernet” — is exactly the framing you’d expect from a Broadcom executive. Every number is a Broadcom win; InfiniBand and Nvidia get one polite sentence and no rebuttal. Treat the comparative claims (the “10x reliability,” the “1/10th cost”) as vendor figures, not independent benchmarks.

That said, the conceptual content is genuinely good and not really disputable. The three-distances framework (scale-up / scale-out / scale-across) is the real mental model the whole industry uses, and the elephant-flow and link-flap explanations are clean ways to understand why AI broke the old networking assumptions. If you strip out the product names, you’re left with a solid primer on why the network became the hard part of AI. The laser-failure detail is the kind of specific, non-obvious fact that signals the guest actually knows the plumbing.

Score 6: high-quality explanations wrapped in an unmistakable sales pitch. Useful for the concepts, skeptical on the scoreboard. The space-data-center bit at the end is throwaway fun and the host knows it.