DarDev Studio — AI Orchestration Platform
An internal engineering program to build agentic orchestration from scratch — 6/6 intent routing, 8/8 RAG IT, and a five-tier memory model validated on PC.
Problem
I needed a platform where AI-assisted engineering on real monorepos could ship through the same discipline as production CI: retrieval you can verify, agents with bounds, patches with review, and routing you can regression-test. A chat UI on top of a vector DB was not enough.
What failed first
Three failures forced the architecture you see on the systems page. Each pivot was gated by an eval runner, not intuition.
- Single retrieve path — session turns leaked into code questions and returned conversational fragments. That concrete failure justified five memory tiers and hard-excluding conversations/ from org RAG ingest. eval:rag:it reached 8/8 after the tier split.
- Unbounded agent loops — runs called web_search, graph_lookup, and patch tools in one round, then looped again with degraded context. Iteration and per-round tool caps plus eval:orchestrator-intents (6/6 on engineering-goal fixtures) made routing regressions visible in CI.
- Raw workspace writes — the agent edited files directly and produced diffs that were hard to review. propose_patch and verify_patch turned model edits into a gate the same way tests gate code.
Constraints
- Built from scratch as a full TypeScript monorepo — not a thin API demo
- Local-first: SQLite embeddings, JSON indexes; no mandatory vector SaaS
- Separate chat (:8080) and embedding (:8081) inference planes
- Multi-surface product — web, admin, TUI, edge workers
- Validated on DarDev internal engineering workflows — no external user scale claimed
- Apache-2.0 OSS: Phase 1 export published on github.com/Theemiss/dardev-studio (platform-core, platform-api, platform-eval, architecture docs); full monorepo publishes in phases
My role
As architecture owner at DarDev, I designed and built DarDev Studio: five memory tiers, hybrid RAG, IT_ORCHESTRATOR default mode, MCP and skill packs, safe patch pipeline, benchmark platform, and the @dardev/platform-* monorepo. Companion article documents the engineering program and methodology.
Outcome
- eval:orchestrator-intents — 6/6 offline routing accuracy on engineering-goal fixtures
- check:integration:dev-team — RAG IT 8/8, plan mode 6/6, graph lookup 5/5
- validate:orchestrator-prompts — 5/5 live API prompt matrix (closeout gate)
- Context envelope (RTX 4050, Gemma E2B): LiteRT max 32K tokens; llama.cpp ~96K before alloc fail
- BGE-M3 dogfood ingest — ~17k chunks indexed for org RAG v6 on the monorepo mirror
- 8 @dardev/platform-* packages; 6 client surfaces (web, admin, TUI, Android, Windows)
Lessons learned
- Memory tiers beat one context blob. Early builds that merged session into RAG ingest returned chat noise on code queries. The fix was mechanical: separate tiers, separate indexes, and eval:rag:it at 8/8 before merge.
- Agentic systems without offline evals are demos. orchestrator-intents caught routing regressions when the intent router changed; without it, "it feels worse" would have been the only signal.
- Safe patch propose/verify beats unconstrained writes. Letting the agent edit the workspace directly produced unreviewable diffs; the patch pipeline makes failure visible before integration.
- Outcome metrics beat sprint counts. S01–S13 matter internally; what external readers need is 6/6 intent routing, 8/8 RAG IT, and envelope numbers that justify model-pack routing.