heading · body

YouTube

The AI Bandwidth Wall & Co-Packaged Optics

Asianometry published 2025-08-10 added 2026-06-14 score 8/10
semiconductors ai-hardware photonics co-packaged-optics data-centers tsmc nvidia networking
watch on youtube → view transcript

ELI5/TLDR

Modern AI chips can do math at a staggering pace, but they keep choking on a more boring problem: getting data in and out of the chip fast enough. The wires that carry data between chips are running out of headroom and burning a lot of electricity in the process. The fix the industry has landed on is to stop sending data as electricity over copper for the last leg and instead convert it to light much closer to the chip — and eventually inside the chip’s own package. This is called co-packaged optics, and AI’s enormous power bills are finally what made everyone take it seriously.

The Full Story

The wall nobody talks about

Start with the good news. A 2025 Nvidia B200 GPU can do roughly 178,000 times more calculations per second than a top Intel CPU from the late 1990s. That is not a typo. Raw compute has gone vertical.

The bad news is that you rarely get to use all of it, because the chip keeps waiting around for data. There are several “walls” that hold compute back. One is the memory wall — memory chips that store data have not gotten faster nearly as quickly as the math units have, so the math units sit idle waiting for numbers to arrive. Asianometry has covered that one before.

This video is about a different, newer bottleneck: the bandwidth wall. Think of it as the on-ramps and off-ramps. It’s not about how fast the chip thinks, or even how fast its memory is — it’s about how fast you can shovel data between one server and another across the data center.

Two halves of getting data out

When data leaves a server rack, two things do the work. First, a switch chip — picture an airport terminal, with tens of thousands of passengers (bits) flowing in and out, each headed somewhere specific. Its whole job is to route traffic at high speed.

Second, the optics IO — the actual doors the data walks through to leave the building. Inside a rack, data travels as electricity through thin copper wires. That’s fine for short hops. But push a signal more than about two meters over copper and it starts to smear and degrade. Beyond that distance, light through glass fiber is simply better — it carries more, travels further, and arrives cleaner.

So somewhere there has to be a translator that turns electricity into light and back. That translator is a small plug-in module called a pluggable optical transceiver — you literally plug it into the faceplate on the front of the rack. Inside it sits an optical engine (a tiny laser and modulator to send light, a photodiode to catch incoming light) and an electrical engine (the parts that talk to the switch chip in electricity).

If the switch silicon is like the airport terminal, then you can think of the optical IO as like that terminal’s gates and roads. If the terminal is too small - or if its roads or gates are too easily congested - then we get traffic jams.

One half is racing, the other is jogging

Here’s where the wall comes from. Switch chips are made of ordinary silicon, so they ride Moore’s Law — every couple of years TSMC ships a smaller, denser process and switch capacity jumps. The optical transceivers do not keep that pace.

The numbers tell the story. Switch chips in 2023 could handle 51.2 terabits per second, and Broadcom is already seeding a 102.4-terabit chip — with another doubling likely in two more years. Transceivers, meanwhile, went from 10 gigabits per second twenty years ago to 800 gigabits today, with 1.6-terabit versions only now rolling out. The terminal keeps getting bigger; the doors are barely widening. And you can only bolt so many transceivers onto a faceplate before you run out of room.

Why the doors are so stubborn

A big part of the holdup is a component called the Serdes — serializer/deserializer. The reason it exists: data leaves the switch in parallel, many slower streams side by side, but a fiber can only carry one stream at a time, in series. So you need something to merge the many lanes into one, and split it back out at the other end.

Imagine multiple streams of people approaching an airport gate from different directions. The serializer is like the gate agent… arranging all these passengers into an orderly line.

The trouble is that at 800 gigabits per second, the electrical signal traveling over copper to reach the transceiver is barely holding together. The Serdes has to fight that — packing bits more cleverly, or boosting the weak signal back up (called equalization). Both burn power. A lot of it.

In 2013–14 Huawei reckoned the Serdes ate 10–15% of a switch’s area and power. A decade later it’s more than twice that. Nvidia’s VP of hyperscale, Ian Buck, told IEEE Spectrum in 2025 that pluggable optics now consume 10% of a GPU’s total compute power. That matters because a data center has a fixed power budget — every watt spent shoving data around is a watt you can’t spend on actual computing.

The fix: move the translator closer

The instinct is simple — stop making electricity travel so far before it becomes light. The dream version is a laser built right into the chip itself. But silicon is bad at making and steering laser light, so that remains a dream. The practical version is to move the optical module as close to the switch chip as possible, eventually inside the same package, sitting on the same substrate. Hence the name: co-packaged optics, or CPO.

The less copper wiring all that data has to travel through, the better.

You also get bonus space (the optics no longer hog the faceplate) and a chance to delete some redundant parts and save cost.

Why it’s hard: heat and threading a needle with light

Two problems make CPO genuinely tough.

Heat. Packing things tightly always traps heat, and heat warps materials. But with optics there’s an extra cruelty. Light moves around the photonic chip through hair-thin channels called waveguides, and these are tuned to a precise color (wavelength) of light. Heat physically shifts that tuning, so the waveguide stops matching the laser and the whole thing quietly stops working. Cooling isn’t just about protecting the package — it’s about keeping the optics in tune.

Coupling. This is the art of getting light from the fiber into the chip. The fiber is about 10 micrometers across; the waveguide is a few hundred nanometers — roughly ten times thinner.

This is an obvious mismatch, and is like channeling water from a firehose into a bundle of straws.

Misalign it even slightly and you lose light. Doing this precisely, at scale, is genuinely fiddly.

The slow march, in steps

The bigger obstacle, though, was never the physics — it was inertia. Pluggable transceivers are cheap, easy to swap when they break, and come from many vendors thanks to mature standards. Nobody wants to abandon that for something new and finicky. So the industry crept toward CPO in cautious half-steps.

  • 2018 — On-Board Optics (OBO). A Microsoft-led group moved the optical engine off the faceplate and onto the circuit board next to the switch chip. Shorter distance, a little less power.
  • 2020 — Near-Package Optics (NPO). Slip a fast silicon bridge between the chip’s package and the optical engine. Closer still — a halfway house to full CPO. (The same way the packaging industry used “2.5D” as a stepping stone toward true 3D stacking.)

TSMC’s COUPE, and the part where AI changes everything

In 2021 TSMC unveiled its CPO play: COUPE (Compact Universal Photonics Engine). The clever bit is that it 3D-stacks an electrical chip on top of a photonic chip. That lets you build the electrical layer on an expensive cutting-edge process while building the photonic layer on a cheaper old one (like 65nm), and stacking them shortens the wires between them, which cuts signal loss. TSMC claimed 40% better power and 170% more speed than the alternatives.

And almost nobody cared. When COUPE launched, barely anyone wrote about it. TSMC’s 2022 and 2023 annual reports each gave it a single paragraph — a few test chips, results “met expectations.” Politely mid.

Then ChatGPT happened.

That changed when ChatGPT and generative AI became the hottest thing since Mala and boba. Which you should not mix together.

As hyperscalers poured billions into AI data centers that drink unprecedented amounts of power, suddenly that 10% power tax on networking became unbearable — and CPO had a real economic reason to exist. TSMC scrambled to flesh out a three-stage roadmap: first put COUPE into ordinary pluggables (a comfortable, familiar step), then fold it into CoWoS, its flagship advanced-packaging platform, so vendors can drop a COUPE chiplet right beside the switch silicon — and eventually beside CPUs and GPUs. The two-track approach lets a nervous customer dip a toe in without feeling like they’ve crossed a bridge of no return.

Nvidia has been the tip of the spear. At GTC 2025 it announced its first co-packaged-optics switches — the Quantum-X (InfiniBand) and Spectrum-X (Ethernet) platforms — claiming 3.5× better power efficiency, 10× better network resilience, and 1.3× faster deployment versus pluggables.

Not a one-horse race

Nvidia and TSMC aren’t alone. Broadcom was actually first to market with CPO, on its 51.2-terabit Tomahawk 5, and it keeps pushing switch speeds. On the foundry side, TSMC is the one playing catch-up: GlobalFoundries (its GF Fotonix platform, now third-gen) and Tower Semiconductor have a head start in silicon photonics, with a strong roster of customers like Ayar Labs, Ranovus, and Lightmatter. Ayar Labs in particular already does much of what COUPE is only planning — its new optical interconnect chiplet promises up to 8 terabits per second and plugs into the UCIe chiplet standard. It’s funded by both AMD and Intel, which must make for an interesting boardroom.

So does light eat everything?

The recurring fantasy is a fully photonic AI chip — light doing the computing, not just the commuting. Asianometry is skeptical, for two reasons. Economically, we’ve built everything around silicon and electrons for 70 years; that’s a hard turn to make. Technically, the heat and coupling problems get worse for all-optical chips, there’s still no scalable on-chip light source, and some basic logic operations are genuinely hard to do with light. Meanwhile the electrical side keeps improving — for instance, swapping some copper wiring for ruthenium to cut resistance.

So the all-photonic dream stays a dream. But co-packaged optics is real, arriving now, and a vivid example of how AI’s appetite is quietly reshaping the entire semiconductor supply chain.

Key Takeaways

  • A 2025 Nvidia B200 GPU does ~178,000× more FLOPs than a late-1990s top Intel CPU — but compute outran the ability to feed it data.
  • The bandwidth wall is distinct from the memory wall: it’s about IO (getting data between servers), not memory access speed.
  • Copper is fine under ~2 meters; beyond that, optical fiber carries more data, further, with cleaner signals.
  • A pluggable optical transceiver converts electricity ↔ light; it contains an optical engine (laser, modulator, photodiode) and an electrical engine.
  • Switch chips ride Moore’s Law (51.2 Tbps in 2023; Broadcom seeding 102.4 Tbps; likely doubling again in ~2 years). Transceivers lag badly (10 Gbps → 800 Gbps over 20 years; 1.6 Tbps only now arriving).
  • Serdes (serializer/deserializer) converts parallel data lanes ↔ a single serial fiber stream. At 800 Gbps it must fight copper signal degradation, which costs power.
  • Serdes power/area grew from ~10–15% of a 28nm switch (Huawei, 2013–14) to over twice that a decade later. Nvidia’s Ian Buck: pluggable optics consume ~10% of a GPU’s total compute power.
  • Power is a fixed budget in a data center — watts spent on networking can’t go to GPUs.
  • Co-packaged optics (CPO) moves the optical translation as close to (and eventually inside) the chip package as possible to cut copper distance.
  • Two core CPO challenges: heat (warps materials AND detunes waveguide wavelengths) and coupling (matching a ~10µm fiber to a few-hundred-nm waveguide — “firehose into a bundle of straws”).
  • Stepping stones: On-Board Optics (Microsoft-led, 2018) → Near-Package Optics (2020) → full CPO.
  • TSMC’s COUPE 3D-stacks an electrical IC on a photonic IC; lets the photonic layer use a cheap node (e.g., 65nm). Claimed +40% power efficiency, +170% speed. Ignored until the AI boom.
  • TSMC roadmap: COUPE in pluggables → COUPE chiplets inside CoWoS next to switch silicon → eventually next to CPU/GPU.
  • Nvidia’s first CPO switches (GTC 2025): Quantum-X (InfiniBand), Spectrum-X (Ethernet) — claimed 3.5× power efficiency, 10× network resilience, 1.3× faster deployment.
  • Broadcom shipped CPO first (Tomahawk 5, 51.2 Tbps). GlobalFoundries (GF Fotonix) and Tower lead TSMC in silicon photonics; customers include Ayar Labs, Ranovus, Lightmatter.
  • Ayar Labs’ optical interconnect chiplet: up to 8 Tbps, UCIe-compatible, funded by both AMD and Intel.
  • A fully photonic chip remains unlikely: 70 years of silicon inertia, worse heat/coupling at all-optical scale, no scalable on-chip light source, hard all-optical logic, plus electrical improvements like ruthenium interconnects.

Claude’s Take

This is Asianometry doing what it does best: taking a real, underappreciated bottleneck and walking you through it without hand-waving. The framing is honest — the headline isn’t “light replaces electronics,” it’s “the doors got too small and AI’s power bill finally forced the renovation.” That restraint is the tell of a good explainer; the breathless version of this story would promise photonic computers any day now, and the video explicitly knocks that down with concrete reasons.

The mechanics check out. The Serdes-as-power-hog point is the genuinely useful insight here — most coverage of AI infrastructure stops at “GPUs need power,” and this drills into the unglamorous translation layer that quietly eats 10% of it. The airport analogy is reused maybe one beat too often, and a couple of figures are stated loosely (the “twice that proportion” Serdes growth is a paraphrase of a paraphrase), but nothing is misleading.

Where I’d add a pinch of salt: the vendor claims — Nvidia’s 3.5×/10×/1.3× and TSMC’s 40%/170% — are marketing numbers repeated as-is, best read as “directionally yes, real-world less.” And the competitive-landscape section is a fair snapshot but moves fast; foundry positioning in silicon photonics is shifting quarter to quarter. Still, as a map of why CPO matters and what’s actually hard about it, this is close to ideal. An 8: substantive, well-structured, intellectually honest, slightly let down only by leaning on the same analogy and parroting a few spec-sheet figures.

Further Reading

  • IEEE Spectrum — 2025 interview with Nvidia’s Ian Buck (source of the “10% of GPU compute power” figure on pluggable optics).
  • TSMC’s North American Technology Symposium materials — for the COUPE / CoWoS co-packaged-optics roadmap.
  • Broadcom Tomahawk 5 — the first commercial 51.2-Tbps switch with co-packaged optics.
  • Ayar Labs — optical interconnect chiplets and the UCIe (Universal Chiplet Interconnect Express) standard.