Block CTO Dhanji Prasanna: Building the AI-First Enterprise with Goose, their Open Source Agent

ELI5/TLDR

Block’s CTO walks through how a payments company turned itself into an AI-first shop by treating every internal system — Gmail, Snowflake, Square payments, GitHub — as a “capability” sitting under an agent middleware layer. The agent is Goose: an open-source, MCP-native, model-agnostic loop that runs on your laptop, takes destructive actions only with your permission, and acts with your access controls. They use it for everything from non-engineers building dashboards to Goose writing its own pull requests. Internal metric: 25% of manual hours saved by year-end. The bet for the future is swarms of small agents, not one big brain.

The Full Story

What Goose actually is

Strip away the marketing and Goose is a tool-use loop. The LLM is the brain in a jar. Goose is the arms and legs. It runs on your laptop, has both a CLI (engineer-flavored) and a GUI (everyone else), and calls out to any tool-capable LLM through a pluggable provider system. Underneath, it speaks MCP — Model Context Protocol — to wrap existing systems and expose them as tools. Block was on the original MCP announcement alongside Anthropic and helped shape the protocol.

If you think of the LLM as a brain in a jar that’s not capable of anything except chatting with you, Goose gives it arms and legs to go out and act in the real world.

What gets it past the toy stage is autonomy. The loop runs as far as it can. When it stumbles, it backs up and tries another approach. You can pin it to “ask before every action,” or let it run free. Either way, it interrupts itself before destructive moves and surfaces a confirmation. You can also interrupt it mid-flight and redirect.

The capability/interface insight

The architectural call that matters most isn’t Goose itself — it’s how Block reorganized its world around it. Dhanji frames it as a Jack Dorsey insight: everything Block does is either a capability or an interface. Taking payments, moving Bitcoin, opening a PR, creating a Jira ticket — all capabilities. Goose, the Square AI bot, the Cash App — interfaces.

Everything is a capability at Block. We treated the corporate side like that too. Creating an issue, opening a PR, all of these are just capabilities. And then we put an agent middleware layer on top.

Once you frame it that way, AI stops being a thing you bolt onto specific products and becomes a layer that orchestrates across all of them. The agent middleware is the connective tissue. The interface is whatever the user touches — Goose UI, voice, a chat box on a merchant dashboard.

Don’t engineer the tools — let the agent figure it out

A counter-intuitive lesson from running Goose at scale: don’t try to make tools “agent-friendly.” Just expose them honestly via MCP and let the agent learn.

We find that Goose is more capable than if you tried to figure out how to make a tool Goose-friendly. It figures things out in surprising ways that you wouldn’t think of as a human. And it does it quicker than you might do it as well.

The reason: even if you build clever scaffolding around a tool today, the next LLM release might just blow past it. You have to stop thinking like an engineer and start thinking more like a data scientist — which, for someone who has shipped JVMs and large distributed systems for two decades, Dhanji admits is hard.

The escape hatch for repeatable patterns is recipes. If you find a Goose workflow you like, you bake it into a recipe — basically a parametrizable script — and share it with teammates. So the system learns from doing, then captures what worked.

Safety: blast radius equals user permissions

The first thing enterprise people ask: how do you keep this thing from emailing the board or wiping a database? The answer is layered.

First, LLMs are inherently cautious about tool use. Second, Goose runs in three modes — full confirmation, autonomous, and somewhere in between — with the agent surfacing destructive actions for review even in autonomous mode. Third, and most important: Goose acts as you. It inherits your access controls. If you’re in sales, you can’t read finance data and neither can your Goose. Blast radius is bounded by the permissions of whoever launched it.

The headless variant is more interesting. Headless Goose runs in the CI pipeline. When InfoSec files a vulnerability ticket, Headless Goose tries to fix it automatically and opens a PR. Humans still review every line before it merges. Same audit trail, lower latency.

The 25% number

Block tracks one metric weekly: manual hours saved by Goose. Started at zero. Targeting 25% by end of the year. Engaged engineers report 8 to 10 hours saved per week. In legacy codebases, the most engaged engineers ship 30 to 40% AI-generated code. In AI-first teams — the Goose team itself — basically every PR is written by Goose. The goal is for each Goose release to rewrite itself 100% from scratch.

This is one of the few honest data points in an industry that mostly traffics in vibes. The MIT report saying few Fortune 500 companies see real AI value gets a measured response: yes, the value is real, but it’s about identifying where the LLM’s general-purpose capability fits the enterprise’s specific workflows.

When you put people with a lot of depth in an organization together, they tend to outperform the basic LLM capability. So it’s identifying what the strengths of the LLM are and how you apply it to the enterprise that really is going to unlock the value.

Non-engineers are the surprise

The story Block didn’t expect: salespeople and finance folks building their own dashboards. Someone took a Figma and asked Goose to turn it into a working site. Someone on holiday in Paris had Goose build her a personal travel-salesman walking tour app. Treasury dashboards. Reporting tools. One-click shareable internal apps.

The wildest example — an engineer who has Goose watching every Slack message, every Google Meet, every email. He’ll mention a feature idea in a meeting and a few hours later find that Goose has opened a PR for it. Goose reschedules his calendar when he’s running late and someone messages him. Dhanji concedes you have to have the stomach for it.

Vibe coding, with caveats

Dhanji writes code every day, all of it through Goose. He’ll occasionally edit or comment something out manually to see how things behave, but he doesn’t really write code by hand anymore. His claim — and it’s a strong one — is that Goose pioneered vibe coding, or at the very least was very early to it.

But he’s clear about where it breaks. Vibe coding works best for smaller, individual tools: dashboards, reports, interactive systems. It struggles in 10-million-line legacy codebases — purely a context window and reasoning-at-scale problem. Even for performant, secure code (crypto, payment systems), he argues you should still start with an LLM draft and refine, like a sculptor with a rough block.

Where humans are still mandatory: high-level architectural design, race conditions, orchestrating across multiple systems. The LLMs also struggle with proprietary APIs and internal frameworks they’ve never seen.

Models: open everything, eventually

Goose works with any tool-capable LLM. Block runs a gateway with 10 to 20 models behind it. Engineers pick their poison. The privacy-conscious crowd uses Qwen and DeepSeek locally — not a token leaves the laptop. Coding-heavy work tends toward the Claude family; GPT-5 is closing the gap.

Open-source LLMs don’t natively support tool calling — they just generate text. Block built a thing called tool shim that adapts these models to MCP. Dhanji’s preference is for everything to be open weights, the way the internet was meant to be a utility. Block doesn’t train its own LLMs but does train smaller, focused SLMs for risk and customer service, and is working on an open-source speech-to-speech model.

The swarm bet

Asked whether open models will always trail closed ones, Dhanji reframes the question.

I really think the future of unlocking coding capability from these models is swarm intelligence. It’s how do you unlock 50 instances of the agent or 100 instances of Goose, or geese if you will, to go off and work with each other to build fairly complex applications.

Today’s tool-call loop averages two to three minutes per turn. What if it ran for hours, with a hundred geese coordinating? Could it build something the size of Cash App? If yes, the question of whether one model is marginally better stops mattering. What matters is whether you can run thousands of cheap, small-model instances in parallel. He calls it “the question of whether infinite ants can build a spaceship.” A hierarchical swarm — big models doing planning and integration, small models biting off nano-services — is the shape he’s currently betting on.

Square AI, the customer-facing version

The same architecture is rolling out to merchants. Square AI is a Goose-like bot that understands a merchant’s financials. You can ask it to build a Q3 sales chart, or — actual customer example — “if I close my wine bar an hour early on Thursdays, how much do I lose?” Turned out the waiters made most of their tips in that last hour. Decision reversed.

Block’s design culture matters here. Dhanji keeps coming back to the original Square reader hiding enormous complexity behind a beautiful, simple object. Cash App showing a single balance while orchestrating a mess of money flows underneath. The AI agent middleware is the same instinct — push the complexity down, keep the interface honest.

Key Takeaways

Capability/interface decomposition is the architectural primitive. Every internal action becomes an MCP-wrapped tool; every product surface becomes an interface that calls into the agent middleware layer.
Goose is model-agnostic with a pluggable provider system, runs locally, has CLI + GUI, and works with any tool-capable LLM. Open-source models are bridged via a “tool shim.”
Don’t agent-engineer your tools. Expose them honestly, let the agent figure it out, capture repeatable wins as recipes.
Safety model: Goose acts as the user, inherits user permissions. Three confirmation modes. Even in autonomous mode, surfaces destructive actions before executing.
Headless Goose runs in CI, auto-fixes vulnerability tickets, opens PRs. Humans review before merge — strict audit trail preserved.
Productivity metric: 25% of manual hours saved targeted by year-end (started at zero). Engaged engineers save 8 to 10 hours per week. AI-first teams (Goose team itself) write basically all PRs through Goose.
Vibe coding boundary: Works for individual tools, dashboards, smaller apps. Breaks down on 10M-line legacy codebases. Humans still own high-level architecture, race conditions, multi-system orchestration.
The swarm thesis: Future is 50-1000 small agents working in parallel for hours, not one big agent for two minutes. Possibly hierarchical — large models plan, small models execute.
MCP origin: Block was on the initial MCP announcement with Anthropic. One of the earliest production deployments of the protocol.
Org change preceded the tech: Block dissolved its GM structure (Square / Cash App / Tidal silos) into a functional org so platform teams could build the agent layer once for the whole company.
Goose built Goose. Vast majority of Goose’s own code is written by Goose. Goal: each release rewrites itself from scratch.

Claude’s Take

This is one of the more honest enterprise-AI conversations of the year, mostly because Dhanji has the receipts. The 25% manual-hours number is specific, the rollout strategy (try every tool, see what sticks, build your own) is grown-up, and the capability/interface framing is genuinely a useful primitive — it’s the same shape as treating your internal services as a platform, which Block has clearly internalized.

A few things land harder than the standard pitch. The point about not engineering your tools to be agent-friendly is unfashionable but probably correct — every wrapper you build today depreciates the moment Sonnet 4.7 ships. The “Goose acts as you” permission model is the right answer to enterprise security concerns, and far more durable than RBAC bolted onto an agent. The swarm thesis isn’t original to Dhanji but his framing — that the open-vs-closed-model question stops mattering if you can run a thousand cheap agents — is the most clear-eyed version I’ve heard.

Where it’s a bit thin: the 25% number is a target, not yet hit, and “manual hours saved” is a famously slippery metric. The non-engineer-builds-dashboards examples are real but probably a tiny slice of actual usage. And the engineer who has Goose watching every Slack message and opening unsolicited PRs sounds less like a workflow and more like a cry for help.

Also worth noting what isn’t said: nothing about cost. Nothing about token spend at this scale. Nothing about whether the Square AI deployment is making merchants more money or just giving them prettier charts. The internal productivity story is convincing; the customer-facing story is still mostly aspiration.

Score: 8/10. Real architectural thinking, real metrics, real production deployment, told without the usual AI-bro inflation. Lower than 9 because half of it is implicit Goose marketing and the conversation skirts the hard tradeoff questions (cost, scale limits, where it actually fails).