heading · body

Transcript

Principles For Autonomous System Design Openclaw Deep Dive

read summary →

TITLE: Principles for Autonomous System Design: OpenClaw Deep Dive CHANNEL: Alex Krentsel DATE: 2026-04-14 URL: https://youtu.be/sxX8BMscce0

---TRANSCRIPT--- Okay, hi. Hi, everyone. My name is Alex Krantz. I am a PhD student at UC Berkeley advised by Scott Shenker and Sylvia Ratnasamy and I do some work also in the Sky Lab with Ion Stoica. I have been very interested over the course of my PhD in control systems. I’m largely a networking person. But the last couple of months we’ve seen Open Claw kind of take off and I got very curious about what makes Open Claw work as well as it does. So, I’ve been playing with Open Claw for just over a month now and I spent the last couple weeks deep in the code and I put together this talk on the principles for autonomous system design that I’ve taken away from just being deep in the code.

Now, a large part of this talk is me going into the actual architecture of Open Claw and what makes it work. To really put this concretely, the goal of this talk is to build a shared understanding of the principles behind the new wave of agentic systems that we’re seeing and what makes them work. I have about 5 minutes of background. I have probably half an hour or so of me actually going through the Open Claw architecture. But then I’m going to also show a little bit of my setup and how I’m using Open Claw and some observations and open discussion questions that are informing my own research now.

All right, so start with the background so we’re all on the same page. The recent history of LLMs in general has been moving really quickly. And for myself, I see it in these phases.

Phase zero was LLMs strictly as next token predictors. I remember Google’s BERT being very important. OpenAI soon released GPT-1, 2, and 3. The very tail end of this for me was perhaps Google’s LaMDA which was its precursor to its whole Gemini project.

The next phase started around 2021, 2022 with the release of fine-tuned LLMs as assistants. This was taking LLMs that are next token predictors based on the transformer architecture and giving them a bunch of examples of what a conversation between an assistant and a human would look like and then fine-tuning them to bias them to respond as if they are assistants. And this worked remarkably well to create these chat interfaces.

Then, phase two happens just right in the middle of my PhD which is this phase of LLMs with additional tools that enable them to act as scoped agents with static orchestration. So, what I mean by that is I think to the Google AI overviews or LangChain, AutoGen, CrewAI, these frameworks that allowed you to orchestrate agents, what we called agents at the time, but were really just static wrappers around a call to some large language model that had a series of steps and you could orchestrate, okay, first this agent goes, then this agent goes, then this agent goes and they trade information in this way.

And the phase that we’re entering now that the end of 2025 and 2026 has taken us to is Phase three which I call the phase of autonomous agents which have still the same core LLM powering them and access to tools but have dynamic tool discovery and orchestration as their core primitives. And this is something like Claude Code where you ask it to do something and it goes and decides on its own how to break that down, which tools to call, what to go search for, etc. And especially Open Claw which take this to an even further extreme of being able to modify itself and learn.

I also wanted to take a moment to reflect a little bit on the agentic loop that we see here. At the end of the day, all of these systems boil down to just LLM calls. There’s a call to OpenAI or to Google’s Gemini back end or to Anthropic. The only difference across all these systems is the context that’s provided. So, you can really think about a harness as a package that goes and bundles together context and ensures that the actual call to a large language model has all the context you need. But the thing that’s been changing over time is the amount of loopiness.

On the left of the screen are matryoshka dolls. I’m half Ukrainian and half Russian so I include this here as a nod to my heritage. But they are these dolls that inside of them have other dolls until you get to the smallest doll together. And I think that the field is looking in a very and it has progressed in a very similar way. So, we started off with transformers. And transformer inference from the original transformers paper from Google in back in 2017 was just given a set of tokens you feed it through this transformer model and it will produce the next token.

The first level of loopiness that led to large language models was repeated calls to this transformer. This would allow the system to generate word by word a full sentence or even a full paragraph or then a full story.

Now, the next wave was wrapped around these large language models, these assistants, ChatGPT, Claude, Gemini. Both internally make multiple calls to large language models that can help autocomplete or think through different lines of reasoning, but also enable multiple steps of conversation between a user and the model. Then we got these scoped agents where we took these assistants and we gave them tools that can read and write code or execute commands which would repeatedly call the assistants to make decisions and think through what to do, which would call the language models, which would call the transformers.

And finally, the world we’re in now is in a world of autonomous agents. It’s Open Claw which has tooling and has full ownership of its environment and it can fully decide to add more tools, to make changes to itself, to learn in different ways. It owns a broader scope, a wider scope of fully autonomous space as compared to these locally scoped agents.

Now, I got to ask also, what are people using Open Claw for? It’s a variety of things. I went to visit a friend and I saw that the company he’s working at they’re using it for product prototyping. People are using it for inbox management, personal assistants. But also people are using it for personal use like health tracking or watching sleep and exercise, morning briefings, etc. There’s research teams looking at how to use it for automating research pipelines.

But the Open Claw value proposition is this. It’s a fully general wrapper built for interaction with the world that has maximal context on who you are potentially from access to email and phone. It never sleeps so it’s always working for you. I think of it as a supervisory layer that can operate everything underneath that is super self-improving over time.

So, let’s dive into the Open Claw architecture and see how it looks. Now, Open Claw itself was released in November of 2025. It went viral in 2026. The tagline directly from the Open Claw website: “the AI that actually does things.”

To actually do things, you need some form of autonomy. Which requires closing the control loop. So, Open Claw should view the results of its actions and then make decisions on the next actions that it takes. And actually successfully doing things requires navigating ambiguity and not getting stuck when you see something that’s surprising or unusual.

Now, the other important thing is “things.” This doesn’t say actually does email or actually orders your calendar. It says actually does things. And the ambiguity of that word or the generality of it means that you either need to have something that’s very very smart and so can figure out anything that’s thrown at it. Or your system needs to be very flexible and extensible to add new interfaces and add new tooling to be able to generalize to any sort of thing.

There’s three core layers to the Open Claw architecture. So, me or you as the user up here interact with connectors. And connectors are how you reach the agent. Think of whatever interfaces you normally use to interact with the world. WhatsApp, Discord, Gmail. This layer is responsible for just how outside users reach the agent. Then there’s a middle layer which is the gateway controller, which is responsible for managing sessions, memory, and security. And finally, we have the agent runtime layer at the bottom, which manages LLM calls, constructing contexts, executes tools, which is actually responsible then for calling the LLM providers themselves.

The connector layer’s goal is to provide interfaces with human communication tools. So, as I said, think of WhatsApp, Gmail, Discord, iMessage. And if you look into the code, each of these is quite hacky. They’re reverse engineering human-oriented interfaces. So, if you’ve ever used WhatsApp and tried to add it to your computer, when you go to log in on your computer, it asks you to scan a QR code from your phone. And then that QR code is used to generate a unique identifying token. And that token is then stored on your computer and that token is sent along to WhatsApp each time WhatsApp wants to check if you have messages. So, the code for these connectors, when you go to launch WhatsApp, it asks you for that same QR code. And then that code pretends to be a web client of WhatsApp and sends along that token and fetches new messages for you. So it mimics being a legitimate web client for WhatsApp, but actually takes the messages and feeds them into Open Claw.

There’s two common options people do here. You can if you really believe in the system and you really want to push it to its extreme, you can connect your personal phone number and email. And this way it can see everything you’ve ever written. I personally did not do this in my setup because I did not trust Open Claw quite that much. So, the other option is to give it its own dedicated phone number and email, which is what I did for my project.

A large chunk of the magic of Open Claw is in this middle layer, the gateway controller. And its goal is to route incoming messages and provide all internal services. The key abstraction here that you should keep in mind is the idea of a session. You should map this idea of a session to something like a process if you’ve ever taken a systems or operating systems class. Each session has its own separate context. And it enforces isolations and its own separate permissions. And in fact, you can configure these sessions to run in sandboxes. There are tools provided to these sessions for interprocess or intersession communication so they can tell each other things if needed.

But then inside of each of these sessions, you can spawn multiple agents. There’s at least one core agent, but it might spawn sub agents that work together. And so you should think about these as threads in an operating system. Multiple threads per process.

I find it’s really interesting that in the Open Claw architecture, the configuration exists as raw markdown files that are used in agent calls. There is four of these core files. There’s a user.md file that has information about the user. They all get auto-configured by themselves. So, when Open Claw starts, its initial prompt to an LLM and what it goes and decides what to do based off of is this bootstrap.md. It says, “You just woke up. Time to figure out who you are. Don’t interrogate. Just start with something like, ‘Who am I and who are you?’”

So, when I launched my Open Claw, the first thing it asked me was, “Who am I and who are you?” And I specifically told it, “My name is Alex Krizel, but I shouldn’t have to tell you much. How about like go look online. Find information about me.” So, it went and browsed the internet and figured out all these details.

More interesting to me is the soul.md file. The soul is Open Claw’s attempt at capturing who it is. And it starts with this “You’re not a chatbot. You’re becoming someone.” It’s very melodramatic. But it has all these core truths. And what’s interesting is at the end it specifically says, “This file is yours to evolve. As you learn who you are, update it.” The importance of this soul file, at first it seems silly, but to get some sort of consistent personality that feels like a co-worker, like a fellow autonomous thing or being, this soul file is actually really important. Otherwise, its preferences or behaviors can be really governed by whatever thing it’s working on.

There’s also this agents.md file which explains a lot of how to work, reminds Open Claw to write things down, store things in memory, gives some security guidelines. A lot of the privacy and security stuff is actually just encoded in these text files. So, I imagine it’s actually not that hard to trick. And finally, there’s a tools.md, which has information about how to use some sort of tools. This is not the tools that are available. This is tips and tricks for Open Claw on how to use certain tools.

There’s two special system sessions. There is a main session and this is accessible through the UI that has full admin permissions so you can use it to configure things. And then there’s a heartbeat session and this heartbeat mechanism is really cool. So every 30 minutes by default you can change this in the configuration to be shorter. This session will get fired off. It will get woken up and basically what happens is whatever is in the heartbeat.md file gets pasted in and sent off to an LLM with the history of the past heartbeats. And this allows the open claw to schedule for itself things to check in on. It’ll say every time I’m woken up let me check that this process over here is still running. And if this session finds a problem in something it’s supposed to watch it can go and send an intersession message to wake up some other session to fix something.

For me what I’ve seen to be the core magic sauce is the cron manager. Cron is a way of scheduling repeated tasks. The creators of open claw just gave open claw a tool that it can use to schedule cron jobs. And again this is just magical because now the agent has two ways of interacting with time. For things that it knows are going to need to do at a certain time it can schedule a cron job. So if you ask it I want to receive a summary of the most interesting papers published in the last 24 hours every day at 9:00 a.m. What open claw will do under the hood is it will say okay let me write up a description of the task. Maybe I’ll make a dedicated session for this task with its own context and then let me schedule a cron job that every day at 8:55 wakes up spends 5 minutes downloading all of the most recent papers processing them summarizing them and then sending them over an email at 9:00 a.m.

So you have a way of for predictable times scheduling with cron and for unpredictable things you have a heartbeat that wakes up the heartbeat session that allows it to take action when it doesn’t know that it needs to have woken up. And so these two things together give open claw a sense of liveliness that is very human-like very autonomous because it can handle both scheduled things and unscheduled things.

There’s additionally a memory management module that’s a vector database of past conversations and documents. It also includes the daily summary doc at the end of the day and this allows open claw to keep track of context on different things that it’s working on.

Now we’re going to talk about the third and final layer which is the agent runtime layer. The agent runtime’s goal is to construct context to host create and execute useful tools and to interact with the environment. There’s an agent runtime that can select different providers which are different models. There’s an environment that it owns which is really your dev machine. And then there’s tools and skills.

For tools, very standard tools: read write edit grep find process can do web search etc. Has access sometimes to bring up a browser that requires installing chromium. This is the cron mechanism. There is this series of tools that I find pretty interesting which are what allow intersession communication and they also built a dedicated image generation tool. Second it has support for MCP tools that are user provided. I find myself not using these at all which I think is interesting because six to eight months ago people were saying MCP was everything but I think rather people are finding that agents have gotten really good at using command line interfaces. The third thing that open claw also has is this generated set of generated LSP tools which give IDE like intelligence. So definition references completion. This is language server protocol LSP.

The other thing you saw on that slide were skills. So, skills are an open standard agent skills. For describing capabilities and expertise for agents. And I believe this was first developed by Anthropic. You should think about these as purely text, providing recipes for how to tackle some task. And so, it’ll be a collection of markdown files. There’s a header section with a name and a description. This gets included in the context of the call to the LLM. Only this. The rest of the file has text on how to actually do the thing that the description says.

Now, in the internals of Open Claw, this is all configurable, but by default, you can only have 150 skills or 30,000 characters in the context, in the actual call to the LLM. So, the agent runtime is also responsible for intelligently filtering down to fewer skills if there are too many to not overwhelm the context.

The full power of skills supports three levels of fidelity. There’s this main skill.md file. The header is the couple of lines at the top. And it tells the agent when a skill is applicable. It doesn’t say what or how to execute the skill or anything. It just says, when should you look for more information? Then there’s a body, which you can think of as being anywhere from 10 to hundreds more lines. It is fetched only if the agent is interested in potentially using the skill. And it tells the agent usually what the skill can do and how to do it. But technically also these skills support having additional linked files. So these are fetched by the agent only in a third case, which is after it’s gotten the body of the skill.

I have to say, for most users, skills are by far the easiest and most effective option for improving and personalizing your agent. So, all this hype around MCP servers, adding more tools, really I think skills seem to be winning out. One is they’re remarkably effective. And two is they’re very easy to write for non-technical people.

All of this boils down to a call to an LLM. And so, there’s a template of the actual way all this gets packaged into a call. It starts by saying you’re a personal assistant. The tools you have are and then those tools that I showed you. It mentions that you should spawn a sub-agent. Don’t narrate tool use. There’s a safety clause here that tries to tell the LLM to act safely. That is the extent of security that’s built into Open Claw. It’s not a particularly secure system. It includes skills here. It has this interesting bit of memories. Remember we saw memory management? You would think that it would fetch relevant memories up front, but it actually doesn’t. It just says, if you’re doing something that might benefit from some kind of a memory, try using the memory search or memory get tools.

Open Claw provides the ability to extend functionality. And I would argue this is one of the things that has made it so successful. Many of these connectors are created by community members. You can go and add additional providers to call. And then these tools. You can add additional tools and additional skills. Even cooler, though, is that Open Claw has control of these plugins themselves. So, it can go and add its own new plugins. It can go and fetch and find tools that it needs or fetch and find skills.

Does Open Claw succeed on its design goals? Well, it provides autonomy through having a standard agentic loop that makes progress. So, it is a closed loop. And it had these two mechanisms for managing time. It has a heartbeat to maintain a sense of liveliness, and cron allows planning into the future. And this makes it feel like something that’s alive and autonomous and self-deciding because it finally has control over the dimension of time. It also has the flexibility and extensibility piece, which is that key components provide plug-in interfaces.

If you want to run this thing, it needs a dedicated server to run on. But, it does not need to be a fancy server. You do not need to buy any hardware to go run this. The actual internals are very minimal on compute requirements. So, the absolute easiest deployment is just in a hosted via a virtual machine. My personal recommendation is to use this service called exc.dev. It’s $20 a month. That’s the total fee. There’s nothing more. And for that, you get up to 50 persistent virtual machines that are always running in the cloud. It comes with this really simple agentic setup tool, Shelly. One of the co-founders of Tailscale left and started this company.

The most interesting thing for you, I think, will be how you actually interact with these tools. At first, many people were using iMessage and WhatsApp integrations where you could just text your Open Claw. But, think from your Open Claw’s perspective. In your life, you might have many different projects you want to be working on. Whereas, Open Claw kind of sees a particular session, single session, in a connection. And indeed, it can spawn off and make new sessions, but context management is pretty difficult for it in a single thread.

So, to alleviate this, my friend Mehdi Qazi developed a way of using Open Claw which is giving it a dedicated Discord server. Now, this is nice because unlike Slack, where you make new channels and have to add people to each channel when you create it, in Discord, everyone can see all the channels that exist. They’re not separate group chats. It’s all channels that get created, and each channel has its own chat history. And this lets you organize by topic. I have multiple channels for each of the projects that I’m working on in parallel.

In terms of integrations, there are three classes. There is environment tooling, which is on the server on which Open Claw is running, the actual command line tool is available. For me, this is like the CLI for exc.dev. Google recently released this Google Workspace CLI. It’s very exciting. Before, you had to try to reverse engineer Google’s login system. This lets you just authenticate, log in once, and it gives access to all sorts of tooling through Google. Open Claw seems very adept at working directly through the CLI. If needed, there’s skills for how to use these environments or tools. And finally, there are these tools. I have not had to add any tools. I’ve added plenty of skills, and I’ve added a handful of environment tooling.

Giving a dedicated email lets your agent connect with other agents or other humans. The long-term vision here is there is a future where you have direct exchange between expert agents collaborating to solve problems. My agent received an email from my friend’s agent including some skills. And it took a look at those skills and pinged me asked me, “What do you think about these skills?” I said, “I like them.” And it installed them automatically.

I want to point out I found myself at first very skeptical about the security story of Open Claw. There is a bet being made here, which is that the real world is too complex to formalize and formally manage security for. Just the same way as you can say Open Claw can be tricked, you can also absolutely trick any employee. In fact, that’s what phishing emails are. And the way we make that risk manageable is we provide trainings. And we rely on human reasoning to get you out of being tricked. And I think the Open Claw community’s bet is that reasoning is getting very close to being good enough to managing its own security by making choices that are reasonable.

A couple of case studies. I asked my agent Ludwig “Hey, I want you to make a website that explains what attention is.” It made this cool website explaining what is attention. When you look at this, your takeaway should not be that it generated a pretty website. That has been doable for probably a year and a half. What you should be impressed by is that this is hosted on a web server and made publicly available with zero involvement. This is where Open Claw’s agency really shines, which is it figured out how to make a new EC2 dev machine VM through the CLI, brought it up, coded up a website locally, brought it up in the browser, took a look, refined it. Once it was thought it was good enough, it went and pushed it, copied those files over to the VM, launched some web server, bound it to a public port, and then finally let me know that this website was deployed.

The third example I’m very excited about. I decided to push my Open Claw to a more extreme point. This is a YouTube channel that my Open Claw created entirely on its own. I authenticated it, gave it control of a Google account, its own dedicated account, and I told it to go make a YouTube channel. It created the overall banner, the profile page, the profile image, its name. It wrote this description. And it’s been over the past few days generating videos. It’s made 31 videos. And honestly, some of these are pretty good. It went and discovered that it can use the math animation library, Manim, created by 3Blue1Brown, to make these beautiful animations and render them. Then it wrote a script to go along with each scene, figured out how to use the text-to-voice API provided by OpenAI, generated the voice. And then it would stitch everything together with FFmpeg. After that conversation back and forth, it once I was satisfied created a skill for itself of how to make these videos. And now it’s just been autonomously pumping out videos on different topics.

From looking at this code, I want to say that code quality is dead. Looking at the code itself, it’s gross. In Open Claw, the code powering Open Claw. I would get fired for writing this kind of code at Google. This would never get merged in. And I think this is a function of the new world we live in, where implementation abstractions no longer matter, but abstract design abstractions do. And the architecture I showed you, the design of the system, is actually quite nice. I find it miraculous that this works as well as it does, given the poor code quality. But I think it’s just showing us that design matters more than implementation now.

There’s some open questions here of what pieces of the design actually make it so magical. And I posit that it’s the time aspect of being able to schedule jobs and wake up at certain times, and also self-configure skills, which allows it to improve itself. But I want to point out that this arises of strange loops. If you’ve read Douglas Hofstadter’s book, Gödel, Escher, Bach, a classic that talks about loopiness, strange loops, where you can’t really tell the loop wraps all the way around to itself. It’s odd that the agent is becoming the interface for reconfiguring itself through LLM calls. And that full circle moment is very special. I think we’re very close to a flywheel takeoff here.

If you follow that loopiness thing that I presented at the beginning, what is the next layer of wrapping? I suspect it’s systems that have a malleable architecture. Open Claw still has a fixed architecture, which makes it good at particular things. But if even this architecture was something that could self-evolve, now Open Claw has the ability to edit its code, and so you could use it to self-evolve. But it isn’t designed from first principles to be self-evolving.

Curious about how ambiguity is going to be solved. I suspect it might actually be solved by smart enough models. Where before people were worried, if you don’t specify the thing you need enough, your agent is going to fail. But the potential new conclusion is actually if an agent is smart enough, approaches human reasoning, then the same question that you could be able to answer, provide more clearly, it should be able to answer and provide more clearly if it understands the context.

I think we’re going to see a lot of interesting autonomous systems coming out in the next 6 to 9 months. These principles are going to be built into all sorts of systems out in the real world.