Agentic Stack: One Brain Across 8 AI Coding Agents

ELI5/TLDR

Agentic Stack is a portable .agent/ directory — your AI coding agent’s brain in a folder. It plugs into eight different harnesses (Claude Code, Cursor, Windsurf, OpenCode, etc.) so the same memory, skills, and conventions follow you no matter which tool you open. Four memory layers, a nightly cron job that mines repeating patterns into proposed lessons, and skills that rewrite themselves after they keep failing. 714 stars, Apache 2.0, install via Homebrew.

The Full Story

The problem it’s chasing

Every coding agent today is a walled garden. Claude Code learns your conventions, then you try Cursor for a week and that context evaporates. Switch back and it’s intact in Claude Code but Cursor stays dumb. Agentic Stack treats the agent’s brain as a separate thing from the tool it runs in. The brain lives in .agent/. The harness — Claude Code, Cursor, Windsurf, OpenCode, Open Claw, Hermes, Pie, or a standalone Python conductor — is just the body.

Four-layer memory

Borrowed straight from cognitive science vocabulary.

Working memory — current session: files edited, decisions made, what task you’re on. Decays fast.
Episodic memory — full task histories: what you tried, what worked, what didn’t, why. Decays slowly.
Semantic memory — graduated lessons. Proven patterns. Permanent unless rejected.
Personal memory — your preferences. Languages, testing style, commit conventions. Never decays.

Retrieval is query-aware. When the agent needs to recall something, it searches all four at once, weighted by relevance and recency. Optional FTS5 indexing for larger stores.

The “dream cycle”

This is the cute name. auto_dream.py runs nightly as a cron job. Crucially, it does no LLM calls — it’s pure deterministic clustering. It walks episodic memory, finds repeating patterns using single-linkage Jaccard similarity, and stages them as candidate lessons.

The staging is purely mechanical. It identifies what keeps coming up, but it doesn’t decide whether those patterns should become permanent lessons. That decision requires human rationale.

In the morning, you review candidates via CLI. graduate accepts (with a required written rationale). reject dismisses (with a required reason). Rejected candidates keep their history — if the same pattern keeps re-staging after rejection, the churn itself becomes signal. Graduated lessons land in lessons.json, which lessons.md auto-renders from. Future sessions pull relevant lessons via recall.py before the agent acts.

The trick here is the discipline. No silent auto-promotion. Every permanent lesson has a paper trail explaining why it earned its slot.

Self-evolving skills

Five seed skills ship: Skill Forge (creates new skills from patterns), Memory Manager, Git Proxy (wraps git with safety checks), Debug Investigator, and Deploy Checklist. Each has a lightweight manifest that’s always loaded — names, triggers — and the full skill file only loads when triggered, keeping context lean.

The interesting bit: each skill tracks its own success and failure rates. Three or more failures in a 14-day window flags the skill for rewrite. The agent reads the failure patterns and proposes an updated version. So skills aren’t static prompts — they’re prompts under selection pressure.

Eight harness adapters

Same .agent/ directory, eight surface translations.

Claude Code — claude.md + claude_settings.json, with post-tool-use and stop hooks that trigger memory writes and skill activations
Cursor — .cursor/rules/*.mdc files
Windsurf — .windsurf_rules
OpenCode — agents.md + opencode.json
Open Claw — system prompt includes
Hermes — agents.md (compatible with agent-skills.io)
Pie — agents.md with a symlinked .py_skills directory
Standalone Python — full programmatic hook control

Add a lesson in Cursor, it’s there next time you open Claude Code.

Key Takeaways

.agent/ is the brain, the harness is the body — the whole architectural premise
Four memory layers with different decay policies (session / weeks / permanent / never)
Nightly cron does deterministic clustering only — no LLM in the unattended path, by design
Lesson graduation requires written human rationale; rejected candidates retain history so recurring churn becomes its own signal
Skills auto-flag for rewrite after 3 failures in 14 days
Skill manifests always load, full skill files load on trigger — context discipline
Apache 2.0, v0.7.0, Homebrew install on macOS/Linux, PowerShell on Windows
CLI surface: learn.py (teach), recall.py (retrieve), show.py (dashboard), auto_dream.py (cron)

Claude’s Take

Score: 6/10. The video itself is a thinly-narrated repo walkthrough — Prism Labs reads the README at you in a synthesized voice. There’s no original analysis, no real testing, no “I tried this for a week and here’s what broke.” Treat it as a pointer, not a review.

But the project itself is genuinely interesting and worth a closer look. Three design choices stand out. First, the unattended dream cycle does zero LLM work — that’s the right call. Letting an LLM auto-write your agent’s permanent memory at 3am is how you wake up to garbage lessons. Second, requiring written rationale for graduation forces you to actually think about why a pattern is a lesson, not just that it recurred. Third, treating skills as artifacts under failure-driven selection pressure is the cleanest framing of “self-improving prompts” I’ve seen — concrete threshold (3 failures / 14 days), concrete action (propose rewrite), human in the loop.

The skeptical reads. 714 stars on GitHub at v0.7.0 means it’s early — APIs will move. Eight harness adapters is a lot of surface area to keep in sync; expect drift. And the whole premise rests on you actually using multiple agents enough that portability matters — for someone who lives in Claude Code, the cross-tool value is mostly theoretical.

For Shantum: probably worth cloning and reading the source for the memory architecture and dream cycle alone. Even if you never adopt the full thing, the four-layer model and the graduation protocol are good shapes to steal.