DarDev Studio — AI Orchestration Platform

An internal engineering program to build agentic orchestration from scratch — 6/6 intent routing, 8/8 RAG IT, and a five-tier memory model validated on PC.

Problem

I needed a platform where AI-assisted engineering on real monorepos could ship through the same discipline as production CI: retrieval you can verify, agents with bounds, patches with review, and routing you can regression-test. A chat UI on top of a vector DB was not enough.

What failed first

Three failures forced the architecture you see on the systems page. Each pivot was gated by an eval runner, not intuition.

Single retrieve path — session turns leaked into code questions and returned conversational fragments. That concrete failure justified five memory tiers and hard-excluding conversations/ from org RAG ingest. eval:rag:it reached 8/8 after the tier split.
Unbounded agent loops — runs called web_search, graph_lookup, and patch tools in one round, then looped again with degraded context. Iteration and per-round tool caps plus eval:orchestrator-intents (6/6 on engineering-goal fixtures) made routing regressions visible in CI.
Raw workspace writes — the agent edited files directly and produced diffs that were hard to review. propose_patch and verify_patch turned model edits into a gate the same way tests gate code.

Constraints

Built from scratch as a full TypeScript monorepo — not a thin API demo
Local-first: SQLite embeddings, JSON indexes; no mandatory vector SaaS
Separate chat (:8080) and embedding (:8081) inference planes
Multi-surface product — web, admin, TUI, edge workers
Validated on DarDev internal engineering workflows — no external user scale claimed
Apache-2.0 OSS: Phase 1 export published on github.com/Theemiss/dardev-studio (platform-core, platform-api, platform-eval, architecture docs); full monorepo publishes in phases

My role

As architecture owner at DarDev, I designed and built DarDev Studio: five memory tiers, hybrid RAG, IT_ORCHESTRATOR default mode, MCP and skill packs, safe patch pipeline, benchmark platform, and the @dardev/platform-* monorepo. Companion article documents the engineering program and methodology.

Outcome

eval:orchestrator-intents — 6/6 offline routing accuracy on engineering-goal fixtures
check:integration:dev-team — RAG IT 8/8, plan mode 6/6, graph lookup 5/5
validate:orchestrator-prompts — 5/5 live API prompt matrix (closeout gate)
Context envelope (RTX 4050, Gemma E2B): LiteRT max 32K tokens; llama.cpp ~96K before alloc fail
BGE-M3 dogfood ingest — ~17k chunks indexed for org RAG v6 on the monorepo mirror
8 @dardev/platform-* packages; 6 client surfaces (web, admin, TUI, Android, Windows)

Lessons learned

Memory tiers beat one context blob. Early builds that merged session into RAG ingest returned chat noise on code queries. The fix was mechanical: separate tiers, separate indexes, and eval:rag:it at 8/8 before merge.
Agentic systems without offline evals are demos. orchestrator-intents caught routing regressions when the intent router changed; without it, "it feels worse" would have been the only signal.
Safe patch propose/verify beats unconstrained writes. Letting the agent edit the workspace directly produced unreviewable diffs; the patch pipeline makes failure visible before integration.
Outcome metrics beat sprint counts. S01–S13 matter internally; what external readers need is 6/6 intent routing, 8/8 RAG IT, and envelope numbers that justify model-pack routing.