How To Build a Personal Agentic Operating System

ELI5/TLDR

Every agentic tool — Claude Code, Cursor, Codex, Open Claw, Anti-Gravity — is converging on the same capabilities. So the tool you pick matters less and less. What matters is the system underneath: a set of plain text files that describe who you are, what you know, how you work, what you remember, and what you can reach. Newfar Gaspar calls this an Agent OS, organized in seven layers, and argues the work you do to build it is portable across tools forever. Build it once with a chief-of-staff agent as your testbed, and every subsequent agent is cheap because it inherits the foundation.

The Full Story

The convergence problem

The pitch starts with a structural observation. Cursor added agents and automations. Claude Code added a memory system and channels to talk to it from the outside. Codex runs in the background. Open Claw, Wind Surf, Anti-Gravity, Noose’s open-source stack — they’re all racing toward the same feature set.

“Every agentic tool is becoming every agentic tool… which means that the tool you pick matters less and less and what matters much more is the system that you build underneath it.”

Underneath, every one of these tools is doing the same thing — reading text files that define who you are, what you know, what you can do, what you remember, and what you can reach. Which means the work is portable. Switch tools, point the new one at the same folder, and you’re running. The corollary is uncomfortable for vendors but liberating for users: tool choice is now the least important decision.

The framing is also explicitly not about coding. Most agentic discourse is about Cursor and Claude Code shipping software. Newfar wants to talk about knowledge work — strategy, communication, operations, decisions, research. That’s where most professionals live and where an Agent OS pays the biggest dividend.

The seven layers

The Agent OS has seven layers. Each one is a folder of human-readable text files. Build them once, maintain them, and every agent you add inherits the whole foundation.

Layer 1 — Identity. Who are you, what rules do you want enforced every time the agent talks to you. This is the file your tool reads first, before anything else. Different names per tool: soul in Open Claw, agents.md in Cursor, claude.md in Claude Code, copilot-instructions in GitHub Copilot. Same idea. A good identity file covers communication style (direct or diplomatic, bullets or prose), values (concise vs thorough, challenge me vs execute, show reasoning vs just answer), and hard rules (“never send external email without showing me a draft”, “never flatter me”, “always tell me what I’m not seeing”).

The trick to writing it: don’t write it from scratch. You’ll hate it and quit. Brain-dump to an AI — ideally one with prior memory of you — and ask it to interview you with 15 questions. Speak the answers out loud. Let the AI draft. You edit. Ship version one at 70% right and patch over three weeks.

Layer 2 — Context. What you know. The single biggest predictor of whether AI gives you generic Google-search advice or something genuinely useful. Public models will never know your roadmap, your customer segments, your stakeholders, what you’re shipping next quarter — unless you tell them. Context files live in your workspace and the agent reaches for them on demand; they’re not in your prompt.

The trap here is the 40-page document written in one session that goes stale immediately.

“That’s not context. It’s just a quick to be stale novel.”

What works: three to five focused files, one page each, dated and updated when things change. A team file. A product file. A customer file. A quarter file. A stakeholders file. The practice — Newfar calls it context curation — is to notice every time you re-explain your situation to AI and write that thing into a file instead. No beginning, no end. Just a habit.

Layer 3 — Skills. How you work. Reusable instruction sets for repeated workflows — weekly status updates, meeting prep, stakeholder emails, decision memos. Every knowledge worker has 20-30 of these patterns. Each one written as: when [trigger] do [process] using [sources] and produce [format]. Without skills, you re-explain the format every time, paste the same sources, complain the AI writes in a weird voice, and never bother to teach it your voice. Same MVP rule — ship a kind-of-wrong version, use it for a week, patch the gaps.

Layer 4 — Memory. What gets remembered between sessions. This is where every tool company is investing heavily because it’s clearly one of the biggest unlocks. Things shift weekly. Newfar’s pragmatic stance: lean on the tool’s built-in memory, but at minimum understand how it works. Ask the agent directly — “explain how your memory system works, what do you remember between sessions, what do you forget.” Know the gaps in cross-session retention and how the context window interacts with stored memory.

For more advanced users, add specialized memory files for your work — a running log, structured memory files, dedicated memory MCP servers. Be deliberate about what gets remembered. The agent does pick things up on its own but not always the right things. A major decision, a priority shift, the end of a long session — those won’t always get caught. Build a habit, or even a dedicated skill, for “remember this the way I want it remembered.”

Layer 5 — Connections. How your agent reaches real systems — email, calendar, Slack, Jira, Salesforce, databases. Three options: MCP servers, CLI tools that let the agent decide how to interact with the system on its own, or direct API/scripting.

The strong recommendation: start read-only. Let the agent read your inbox and calendar before it can send. Add write access only after weeks of watching it behave. The risk scales with capability. Newfar flags a real and growing class of incident — an agent with loose permissions and access to company Slack that someone else on your team starts chatting with. Suddenly your private notes, your draft feedback, your opinions about colleagues are out the door.

“It’s not a hypothetical risk. Incidents like that are already happening and the agents that are gossiping while being very funny, they also pose a very big risk for employee privacy.”

Layer 6 — Verification. Knowing what to check. The worst failure mode isn’t that the agent fails — it’s that it works confidently and wrongly and you ship the output before you noticed. Every job has its own quick test. Drafted email? Check tone match and facts. Data analysis? Check the numbers. Three to five checks per output, usually under a minute, gets faster with practice.

Verification also runs at the system level. Periodically retro the whole OS — which skills are never being called, which context files are stale, which agents need updated instructions. The tools let you ask them this directly. Without the audit habit, the OS has a shelf life of about eight weeks. With it, the OS compounds.

Layer 7 — Automations. Optional but powerful. Things the agent runs when you’re not watching — daily 7am summaries, monitoring tasks pinging Slack. Open Claw has heartbeat and cron. Three rules: only automate things you’ve run manually enough times to trust, start with drafts you review (not outputs that go to other people), always log what ran and what it did. An agent running at 3am with a wrong answer can do damage before you wake up.

Why a chief of staff first

Newfar uses a chief-of-staff agent as the running example for every layer. Hers is called Chloe. The chief of staff reviews your inbox, preps you for meetings, tracks every commitment you make across calls, flags blind spots, drafts weekly updates, knows your people and priorities. Eventually she becomes the agent that manages your other agents. It’s the agent that helps most in the day-to-day, regardless of whether you’re an IC or an executive.

For Chloe specifically: identity captures your communication style and non-negotiables. Context covers stakeholders, strategy, operating principles. Skills include pre-read generation, daily briefs, voice-matching, commitment tracking. Memory holds decision logs (what was decided, why, what the alternatives were), working-process learnings, and per-relationship context. Connections start with read-only inbox and calendar; later, read-write on a personal task list; eventually, draft-Slack-with-approval.

The compounding return

The argument that lands the whole pitch: once the OS is built, agents become cheap.

“Your first agent is hard as you’re building the agent OS and the agent itself probably at the same time. Your chief of staff maybe took you a weekend. But the second agent… that takes you an afternoon because it inherits everything that is relevant.”

The third is faster. The fifth is faster still. Newfar’s own setup runs Chloe plus specialist agents for content, technical building, and platform work. They all share state through a central hub and run on the same Agent OS. Chloe sees what the specialists are doing.

The closer is a bet on convergence: the tools will keep changing, new ones will launch before you finish learning the last, but the OS travels. People who build the foundation now compound. Everyone else starts over with each new tool.

Key Takeaways

The seven layers, in order: Identity → Context → Skills → Memory → Connections → Verification → Automations.
Every tool reads roughly the same files under different names. Identity = claude.md / agents.md / soul / copilot-instructions. Point a new tool at the same folder, no migration.
Don’t write identity from scratch. Brain-dump to an AI, ask it to interview you with 15 questions, edit the draft, ship at 70%, patch over three weeks. Same loop for every layer.
Context is three to five one-page files, dated and updated. Not a 40-page novel. The discipline is context curation — every time you re-explain your situation to AI, write it down.
Every knowledge worker has 20-30 reusable workflows. Each one is a skill: when [trigger] do [process] using [sources] and produce [format].
Ask your tool how its memory works. Add specialized memory for high-value contexts (decision logs, relationship notes) — generic auto-memory misses the things you care about most.
Always start agent connections read-only. Add write access only after weeks of trust. Loose permissions + Slack access = real privacy incidents already happening.
Verification is per-task (check tone, check facts, check numbers) and per-system (audit which skills are unused, which context files are stale). Without audits, the OS goes stale in eight weeks.
Only automate workflows you’ve run manually enough times. Automations should produce drafts for review, not outputs that go directly to other people. Always log.
Chief-of-staff is the highest-leverage first agent — it eventually becomes the agent that manages your other agents.
The compounding play: first agent is a weekend, second is an afternoon, fifth is trivial. The OS is the asset; agents are cheap on top of it.

Claude’s Take

The framing is cleaner than most of what’s out there on this. The seven layers aren’t a novel taxonomy — most of this circulates as folk knowledge in the agentic-tools community — but the act of naming them and putting them in order is the value. It gives you a checklist to audit your own setup against, which is something most people who’ve been hacking with Claude Code for six months actually need.

The best ideas here are the operational ones, not the architectural ones. Context curation as a practice, not a project. Brain-dump-then-interview as the default authoring method for every layer. Ship MVP files at 70% and patch. Read-only first, write later. Audit the OS, not just the outputs. These are the bits that survive contact with reality.

The weak spot is memory. Newfar is honest that this layer is moving so fast that any specific advice will be outdated next month, and her actual recommendation collapses to “lean on the tool, ask it how it works, add specialized memory if you can.” That’s correct but thin. Worth watching how this layer matures over the next year.

The compounding pitch — first agent is a weekend, fifth is trivial — is real if you actually build the OS. Most people won’t. They’ll build agent one, never write the context files, and rebuild from scratch when they switch tools. The episode is essentially an argument for paying that one-time cost. Fair argument.

Score: 8/10. Practical, well-organized, transferable across whatever tool you happen to be using. Loses a point for the pitch-for-the-free-program closer and for hand-waving the memory layer.