Transcript: Amazons Durability Stratechery

---TRANSCRIPT--- [music]

Amazon’s durability was published on Tuesday, May 5th, 2026. When it comes to the AI soap opera, there is news every day and the company on top and the bottom seems to shift by the quarter, if not the month. The news that I find most intriguing and instructive this week is about physical goods and logistics, from Bloomberg. Amazon unveiled a suite of logistics services that will let businesses buy its existing freight and distribution offerings as a package, sending shares of rival delivery companies such as FedEx and United Parcel Service lower. The world’s largest online retailer on Monday announced Amazon Supply Chain Services, ASCS, offering other companies access to its {quote} full {end quote} of supply chain and distribution offerings. The service largely consolidates a package of existing products, air and ocean freight, trucking, and last-mile delivery into a new suite it says companies like Procter & Gamble and 3M are already using. This is a very satisfying announcement for us at TechCrunch, given it’s the culmination of a prediction I made a decade ago in The Amazon Tax. Amazon at that point had two primary businesses, amazon.com and AWS. And I made the case in that article that they were actually very similar. In both cases, Amazon built primitives that had Amazon itself as their first best customer, justifying and driving initial development. But in both cases, the ultimate play was to sell those primitives to other companies. It was already clear at the time that logistics would follow the same path. {quote} It seems increasingly clear that Amazon intends to repeat the model when it comes to logistics. After experimenting with six planes last year, the company recently leased 20 more to flush out its private logistics network. This is on top of registering its China subsidiary as an ocean freight forwarder. So, how might this play out? Well, start with the fact that Amazon itself would be this logistics network’s first and best customer, just as was the case with AWS. This justifies the massive expenditure necessary to build out a logistics network that competes with UPS, FedEx, and all. And most outlets are framing these moves as a way for Amazon to rein in shipping costs and improve reliability, especially around the holidays. However, I think it is a mistake to think that Amazon will stop there. Just as they have with AWS and e-commerce distribution, I expect the company to offer its logistics network to third parties, which will increase the return to scale and by extension deepen Amazon’s eventual moat.” End quote. Now, 10 years later, we are here with the official unveiling of Amazon Supply Chain Services. And I think the time frame is an important one. Amazon, more than any other company, actually operates with decade-long time frames, consistently making real-world investments at massive scale that one, convert their marginal costs into capital costs, and two, gain leverage on those capital costs by selling them to other businesses. This is, by the way, still a story about AI. A brief history of AWS. Three years ago, SemiAnalysis wrote an article entitled, “Amazon’s Cloud Crisis: How AWS Will Lose the Future of Computing.” And I found it very compelling. First though, some history, much of which is covered in SemiAnalysis’s article. Amazon not only invented cloud computing, but realized it would be a commodity market. While most people in tech think about building sustainable differentiation that allows you to charge higher prices, thus producing profit, commodity markets work differently. There, sustainable profits come from having structurally cheaper costs. Amazon developed exactly that. First through having the largest scale, giving the company both buying power and also the most leverage on their development costs. And second, through genuine innovation. AWS built a specialized system called Nitro, built on their own chips, that offloaded server management, including network management, storage management, hypervisor management, et cetera, from the expensive Intel and AMD servers that the company sold access to. This let Amazon run that many more virtual machines on a single server, significantly increasing utilization, i.e., delivering a structural cost advantage. Amazon doubled down their custom chip offers with Graviton, their arm processors. Graviton chips, particularly the first few generations, were inferior to Intel or AMD chips, but that didn’t mean they were useless. By that time AWS had expanded from simply being an infrastructure as a service IAAS provider to being a platform as a service PAAS provider as well. Infrastructure as a service means you provide raw compute, storage, etc. on which customers can run things like operating systems or databases. Platform as a service means you provide that basic functionality as a service. Amazon Relational Database Service RDS, for example, is a fully managed database that customers can access via set of APIs without having to worry about actually managing the full database themselves, worrying about scaling, duplication, etc. This, by extension, means that customers don’t need to know and don’t need to care about the compute infrastructure that undergirds services like RDS, which has long been Graviton’s. Platform as a service lets Amazon double dip in terms of profitability. First, AWS could sell platform as a service products at a higher margin than infrastructure as a service products. And second, the company could leverage its own cheaper silicon to serve those products, reducing their costs. Over time, Graviton has become more competitive in performance, while still being cheaper, giving Amazon a lower cost compute instance to sell to end users. But even without third-party take-up, the investment in building its own silicon has paid off over time. [music] Training versus inference. Fast forward to AI, and SemiAnalysis’ concern was that all these optimizations left AWS ill-prepared for AI. One big problem was networking. Rather than implement the best networking from Nvidia and or Broadcom, Amazon is using its own Nitro and Elastic Fabric Adapter EFA networking. This works well for many workloads, plus it delivers a cost, performance, and security advantage. There are business, cultural, and security reasons why Amazon will not implement other networking. The cultural one is important. Nitro and networking SOCs generally have been Amazon’s biggest cost advantage for years. It’s ingrained into their DNA. Even EFA delivers on this, too, but they don’t see how new workloads are evolving and that a new tier is needed due to the lack of foresight in their internal workload and infrastructure teams. Amazon is making a deliberate choice of not adopting that we believe will bite them in the future. Another was Amazon’s insistence on building its own chips, which are not only inferior to the best Nvidia chips in terms of performance, but might also lead to them getting fewer Nvidia chips going forward. At least some other clouds will implement out-of-node NVLink. That’s where the discussion of prioritization now comes in. AI GPUs face tremendous shortages for at least a full year. This is one of the most pivotal times for AI, and it may mark the haves and the have-nots. Nvidia is a complete monopoly right now. Why would Nvidia prioritize Amazon for these GPUs? When they know Amazon will move to their in-house chips as quickly as they can for as many compute workloads as they can. Why would Nvidia ship tons of GPUs to the cloud that is not using any of their networking, thereby reducing their share of wallet? Instead, Nvidia prioritizes the me-too clouds. Amazon does get meaningful volume, but nowhere close to where demand is. Amazon’s H100 GPU shipments relative to public cloud shipments is a significantly lower than their share of the public cloud. Those other clouds also can’t satisfy demand, but they get a bigger percentage of the GPUs they ask Nvidia for. And as such, firms looking for GPUs for training or inference will move to those clouds. Nvidia is the kingmaker right now, and they are capitalizing on it. They have to spread the balance of power out to prevent compute share from clustering towards Amazon. These concerns were well-founded in the 2023 time period when that article was written. That was a time when AI, thanks to ChatGPT, had hit the mainstream, but the largest share of compute still into training. Training required all the things that Amazon lacked, particularly the ability to network large number of Nvidia GPUs together into one coherent system. In such a system, the most important capability was horizontal networking between chips, so that you could update weights during training, a step that needed to happen serially. It was absolutely the case that cloud providers like Microsoft or Oracle or the neo clouds, which implemented full Nvidia solutions, instead of the stand-alone HGX racks that AWS favored, were much better suited to training large language models. That is still the case, by the way. What has changed is that training is no longer the biggest AI compute market. Inference is, thanks not only to increased AI adoption, but also because of fundamental changes in terms of how AI works. From an update about Nvidia, quote, “The first inflection point was the emergence of LLMs. Call this the ChatGPT moment. In this first paradigm, tokens were generated by GPUs and presented as the answer to a question. The second inflection point was the emergence of reasoning models. Call this the 01 moment. In this paradigm, there are a very large number of tokens that are generated to figure out the answer before the answer is actually generated. This was an exponential increase in the addressable market for tokens. The third inflection point was the emergence of functional agents. Call this the Opus 4.5 moment. In this paradigm, those reasoning models are not triggered by humans asking a question, but by an agent solving a problem. This increases the market in two directions. First, humans can run multiple agents, and secondly, agents can leverage reasoning models multiple times to accomplish a task. This isn’t just an exponential increase in the addressable market for tokens. It’s two exponential increases squared.” End quote. Both this shift to inference and this shift in the nature of inference have been positive for AWS’s approach. First, while inference still requires significant memory, the requirement is significantly less than that required for training. It’s actually viable to store a model’s parameters in a single server. You don’t need to network together thousands of chips. Second, while reasoning and agentic workloads will require significantly more tokens, and thus a massively larger KB cache, the increase is actually so large that even the most optimized Nvidia inference systems are being built with dedicated memory servers. This sort of architecture is much more compatible with Amazon’s networking approach than the thousands of chips network together approach is. Third, agents are heavily CPU dependent, which has two important implications. First, fully utilizing accelerators is a function of having sufficient general compute. Second, achieving maximum utilization of heterogeneous compute means unbundling CPUs and GPUs and routing workloads between resources, which is exactly the sort of disaggregated resource abstraction that Amazon has been building with Nitro. The utilization point is an important one. Nvidia’s CEO Jensen Huang made his case for Nvidia chips over custom ASICs at length at GTC 2025. Huang’s argument was that AI factories, to use his term, were ultimately constrained by power. That meant that the most important metric for profitability was not the cost of chips, but rather tokens per watt. In other words, if you can’t increase watts, it’s worth spending more on chips to increase tokens on those watts. There are, however, three reasons why this argument may not hold, particularly for a company like Amazon. First, if you have the money to buy that many Nvidia chips, you also have the money to spend on getting more power, which is exactly what AWS has been focused on. This very much fits AWS’s modus operandi, which is to invest more upstream, in this case in power, with the goal of spending less downstream, paying Nvidia huge margins for their chips. Second, in the long term, electricity is more of a commodity than logic is. That means it is a market where innovation and competition are more likely to break a bottleneck, which is another way to say that investing in one’s own silicon is the area most likely to deliver a return on investment. Third, the nature of inference workloads, particularly agentic ones, are such that perfect accelerator utilization is going to be a much harder problem to solve than when it comes to training. These points are moot, however, if you don’t have your own logic chip that is at least competitive. And here Amazon’s long-term outlook is paying off. Amazon bought Annapurna Labs, which makes their chips in 2015, and launched their first AI-focused chip in 2019. No, it wasn’t very good, but critically, that was 7 years ago. Now, Trainium 3 is decent, and their trajectory is even better. AWS is poised to have a sustainable cost advantage for inference going forward. [music] AWS’s neutrality. Moreover, they’re already replaying the Graviton playbook. Trainium chips help undergird Bedrock, their AI platform, which is to say that users are using Trainium chips even if they didn’t explicitly choose to do so. AWS CEO Matt Garman made this point explicitly in a The Sequence interview. Um I I think just with GPUs, by the way, um you’re going to interact with a lot of these accelerator chips through abstractions. The vast majority of customers don’t interact with GPUs, either, except through maybe like in their laptop or something like that, but, you know, for graphics. But, when you’re talking to OpenAI, even if they’re running on GPUs, you’re not talking to the GPUs. If you’re talking to Claude, you’re through GPUs or Trainium or TPUs, you’re not talking to any of those chips. You’re talking to the interface. And the vast majority of inference out there is being done on one of a handful of models, right? And so, whether it’s 5, 10, 20, 100, it’s not millions of people that are programming to those things directly. And and I think that’s going to be true going forward, just because these systems are so complex. They’re very large. If you’re going to go train a model, not that many people have enough money to go train a model, not that many people have the expertise to actually manage it. They’re very complicated systems, and um and the OpenAI team is is incredible in in their ability to squeeze value out of a very large compute cluster. But, not that many people have the team that can do that, um independent of what the chip happens to be. And so, I think that that’s going to be true for all accelerator chips. Honestly. The frontier models are an important factor in this, and this is an angle I didn’t see coming. Nvidia CEO Jensen Huang explained in a recent interview with Dor Kesh Patel why Nvidia didn’t invest in Anthropic early on. I at the time I didn’t deeply internalize how difficult it would be to build a a foundation AI lab Mhm. like OpenAI and Anthropic. Uh and the the fact that they needed huge investments from the supplier themselves. Uh we just weren’t in a position to make the multi-billion dollar investment into Anthropic so that they could use our use our compute. But Google and and AWS were, and they put in huge investments in the beginning so that Anthropic um in return use their compute. Uh we we just weren’t in a position to do so uh at the time. Nor nor did I I would say my mistake is I didn’t deeply internalize that they they really had no other options. That that that a VC would never put in 5 10 billion dollars of investment into an AI lab with the with the hopes of it turning out to be Anthropic. And so, that was my miss. Uh but even if I understood it, I don’t think we would have been in a position to do that at the time. But um I’m not going to make that same mistake again and And Amazon had both the money and the chips to invest into Anthropic precisely because they had built such a cash machine with AWS in the first place. That’s the thing with big investments in infrastructure. They take years to build, but the benefit of that investment compounds over time. Anthropic, meanwhile, thanks to those investments from Amazon and Google, can not only run across a variety of chips, but for a long time was the only frontier model available on all of the leading clouds. An important selling point for enterprises. Microsoft, in the end, needed to let go of Azure’s exclusive access to Open AI’s API in part because that exclusivity was hurting the prospects of their mammoth stake in Open AI. You can also make the case that Amazon is the best choice for frontier model access in a world of limited compute. Microsoft’s core business is software, which is to say that the company faces massive pressure to invest in their own AI capabilities even at the cost of deprioritizing cloud customers. That’s exactly what happened at Microsoft earlier this year when the company missed Azure’s growth projections because they devoted more compute to their internal workloads. It was an understandable decision. Cloud demand is eternal, but the risk from AI for existing software businesses is existential. This also applies to Google. The company’s core business is also digital and while search has fended off the threat from chatbots that many expected, the fundamental challenge is still one to be managed, not extinguished. Amazon’s core businesses, meanwhile, are very much rooted in the physical world. Selling and shipping physical goods and building data centers. Both are amenable to Amazon devoting the majority of its chips to customers’ workloads. Amazon’s future. If this week marks the resolution of one of Amazon’s long bets, you can see the outline of future resolutions in present-day announcements. One prominent example is Amazon Leo, the company’s satellite service that seems, at first glance, duplicative to SpaceX’s Starlink, which has the advantage of already existing at scale. Remember Amazon’s formula, however, which CEO Andy Jassy stated explicitly with regards to Leo on the company’s most recent earnings call. You know, today if you ask what stops us from growing the business, we do we have to get the constellation into space. Um we have over 20 launches planned. This year we have over 30 launches planned in 2027, but I think the business has a chance to be a very large, um you know, many billion-dollar revenue business and I think it has some characteristics that are reminiscent of AWS in that it’s capital intensive up where you’re you’re you’re committing a lot of capital and and cash in the early years um for assets that you get to leverage over a long period of time. And so, I I like the free cash flow and return on invested capital characteristics of that business in the medium to long term. The fact that it’s extremely capital intensive is not the only thing about Leo that makes it like AWS. A critical factor is that Amazon is the first best customer to give the service scale. And here’s what’s going back to logistics. I noted above Amazon delivery still has marginal costs and that is because humans have to make the delivery. Amazon, however, has already pointed to the future a full 13 years ago when the company first started talking publicly about drone delivery. It’s been a long slog to be sure, but it’s increasingly plausible to imagine a future where delivery costs are a matter of depreciation on drone assets. And what would such a future require? How about reliable widespread satellite coverage for communicating with and guiding those drones? And if Amazon doesn’t want to be dependent on Jensen Huang for chips, do you think they want to be dependent on Elon Musk for drone connectivity? Of course other businesses like Apple will be able to pay to use Amazon’s satellite infrastructure just like they can now pay to use Amazon’s delivery service or pay to use AWS or pay to sell on amazon.com. The world may change in increasingly drastic ways, but Amazon’s approach by virtue of its focus on long-term investments in the physical world appears to be as sturdy as ever. More generally, I increasingly suspect that long-term vulnerability to late or to put it more positively, long-term incentives to invest in AI are very strongly correlated with the degree to which a company interacts with the physical world and secondarily, the degree to which companies feel secure in their control of distribution. Apple and Amazon feel comfortable not having leading edge models, just access to them because their business is rooted in the physical. Microsoft has invested heavily in data centers, but doesn’t own their own model perhaps because they feel their control of distribution to enterprises will protect their core business or because they had too much of a dependency on OpenAI. Google and Meta are investing at a similar scale to Amazon and also heavily invested in their own models. Both are aggregators, which is to say they have to continually earn attention from consumers. Given that competition is only a click away, having good AI is existential to them. This is, in the end, another advantage to making the sort of long-term bets Amazon specializes in. The threats are so distant that you have plenty of time to make new investments that address any weaknesses that develop in the meantime. Or, as is the case of AI, wait for the market to tilt in your favor.