Ahmed Belhaj
All case studies

DarDev Studio — AI Orchestration Platform

An internal engineering program to build agentic orchestration from scratch — 6/6 intent routing, 8/8 RAG IT, and a five-tier memory model validated on PC.

Problem

I needed a platform where AI-assisted engineering on real monorepos could ship through the same discipline as production CI: retrieval you can verify, agents with bounds, patches with review, and routing you can regression-test. A chat UI on top of a vector DB was not enough.

What failed first

Three failures forced the architecture you see on the systems page. Each pivot was gated by an eval runner, not intuition.

  • Single retrieve path — session turns leaked into code questions and returned conversational fragments. That concrete failure justified five memory tiers and hard-excluding conversations/ from org RAG ingest. eval:rag:it reached 8/8 after the tier split.
  • Unbounded agent loops — runs called web_search, graph_lookup, and patch tools in one round, then looped again with degraded context. Iteration and per-round tool caps plus eval:orchestrator-intents (6/6 on engineering-goal fixtures) made routing regressions visible in CI.
  • Raw workspace writes — the agent edited files directly and produced diffs that were hard to review. propose_patch and verify_patch turned model edits into a gate the same way tests gate code.

Constraints

  • Built from scratch as a full TypeScript monorepo — not a thin API demo
  • Local-first: SQLite embeddings, JSON indexes; no mandatory vector SaaS
  • Separate chat (:8080) and embedding (:8081) inference planes
  • Multi-surface product — web, admin, TUI, edge workers
  • Validated on DarDev internal engineering workflows — no external user scale claimed
  • Apache-2.0 OSS: Phase 1 export published on github.com/Theemiss/dardev-studio (platform-core, platform-api, platform-eval, architecture docs); full monorepo publishes in phases

My role

As architecture owner at DarDev, I designed and built DarDev Studio: five memory tiers, hybrid RAG, IT_ORCHESTRATOR default mode, MCP and skill packs, safe patch pipeline, benchmark platform, and the @dardev/platform-* monorepo. Companion article documents the engineering program and methodology.

Outcome

  • eval:orchestrator-intents — 6/6 offline routing accuracy on engineering-goal fixtures
  • check:integration:dev-team — RAG IT 8/8, plan mode 6/6, graph lookup 5/5
  • validate:orchestrator-prompts — 5/5 live API prompt matrix (closeout gate)
  • Context envelope (RTX 4050, Gemma E2B): LiteRT max 32K tokens; llama.cpp ~96K before alloc fail
  • BGE-M3 dogfood ingest — ~17k chunks indexed for org RAG v6 on the monorepo mirror
  • 8 @dardev/platform-* packages; 6 client surfaces (web, admin, TUI, Android, Windows)

Lessons learned

  • Memory tiers beat one context blob. Early builds that merged session into RAG ingest returned chat noise on code queries. The fix was mechanical: separate tiers, separate indexes, and eval:rag:it at 8/8 before merge.
  • Agentic systems without offline evals are demos. orchestrator-intents caught routing regressions when the intent router changed; without it, "it feels worse" would have been the only signal.
  • Safe patch propose/verify beats unconstrained writes. Letting the agent edit the workspace directly produced unreviewable diffs; the patch pipeline makes failure visible before integration.
  • Outcome metrics beat sprint counts. S01–S13 matter internally; what external readers need is 6/6 intent routing, 8/8 RAG IT, and envelope numbers that justify model-pack routing.