Platform architecture
DarDev Studio
Open-source AI-native engineering platform I architected from scratch at DarDev: hybrid RAG, five memory tiers, bounded agent loops, IT orchestrator, MCP and skill packs, safe patch pipeline, multi-surface clients, and offline eval gates. Validated on DarDev engineering workflows.
Overview
DarDev Studio is an agentic engineering platform I designed and built from scratch as architecture owner at DarDev. It is not a chat wrapper on a demo API. The product spans a TypeScript monorepo with a core brain, HTTP API, web and admin clients, terminal orchestrator, edge inference workers, eval infrastructure, and a full documentation and sprint program behind it.
The active product SKU is dev-team (PRODUCT_PROFILE=dev-team): IT modes, RAG over docs and code, orchestrated tool execution, sandbox and patch pipelines, and multi-backend inference. I validate it on DarDev internal engineering work before features ship.
DarDev Studio releases under Apache-2.0. A scrubbed Phase 1 slice (platform-core, platform-api, platform-eval, architecture docs) is on GitHub; the full monorepo including web, admin, workers, and ops tooling is developed locally and publishes in phases.
Vision
Most teams bolt an LLM onto an IDE and call it AI-assisted development. DarDev Studio treats AI orchestration as platform infrastructure: explicit memory tiers, hybrid retrieval, tool and policy boundaries, offline eval gates, and inference routing that engineering teams can own, extend, and run locally.
The north star is an engineering brain that knows your organization context (session history, ingested docs and code, symbol graph, ephemeral web), routes natural goals to the right tools and specialist modes without manual recipe picking, and only ships changes when eval suites pass. Open source so teams inspect the orchestrator, eval runners, and architecture directly.
DarDev Studio is built for teams that already ship production software and want agentic workflows with the same rigor they apply to CI/CD. Success means shipped, verified software (integration gates green, deploy smoke recorded), not chat quality alone.
The platform is local-first by design: vectors in SQLite, structure in JSON indexes, no mandatory Pinecone or Neo4j in the default path. Chat inference and embedding inference run on separate planes so retrieval quality does not compete with generation for GPU budget.
Agentic systems without evals are demos, not platforms. DarDev Studio is built for teams that would not merge without tests.
Product scope
DarDev Studio is the dev-team SKU: an agentic engineering platform with IT, dev, and architect editions. Legacy education and LMS code paths exist in the monorepo but are frozen under the dev-team profile. The workspace ships production software through platform sprints S01 through S15, not through ad-hoc feature branches.
- IT_ORCHESTRATOR mode: goal-only routing as default dev-team entry point
- Editions: it, dev, architect (active under dev-team profile)
- Dogfood corpora: monorepo mirror, team runbooks, optional OSS repo ingest
- Workflows, sandbox execution, dev worktree, and cron scheduler in platform.db
- Benchmark platform with admin UI, CLI matrix, and regression gates
- OpenAPI surface validated agent-ready (check:openapi, check:client-api)
The challenge
Engineering teams adopting LLMs hit the same wall: retrieval quality is inconsistent, context windows get stuffed, agents loop without quality gates, tools run without policy, and AI features ship as thin API wrappers instead of integrated workflows.
DarDev Studio addresses this as a full product: memory tiers with separate mechanisms, hybrid RAG with citation verify, bounded agent loops, MCP and skill packs behind policy gates, safe patch propose/verify flow, and offline eval suites before orchestrator changes merge.
Platform architecture
The monorepo uses npm workspaces under @dardev/platform-* packages and apps/*. Dependency direction flows from platform-api into platform-core and platform-inference; clients talk to the API via fetch and OpenAPI, not direct brain imports.
- platform-core — brain: RAG, agent loop, plan mode, prompts, policy, tools
- platform-api — Fastify HTTP API, SSE streaming, WebSocket hub, inference init
- platform-eval — offline eval runners (IT modes, RAG, graph, orchestrator)
- platform-inference — shared worker protocol types for edge devices
- platform-mcp-client + platform-mcp-server — MCP integration bridge
- platform-cli — ingest helpers and dry-run requests
- platform-tui — terminal orchestrator (Ink)
- platform-web — IT Studio chat UI (port 5173)
- platform-admin — operator console: agent capabilities, benchmarks, workflows (5174)
- android-worker, windows-desktop, windows-worker, desktop-worker — edge inference
Five memory tiers
Context is split by mechanism, not one giant prompt. Session history and episodic summaries cover the current thread (conversations/, never ingested into RAG). Org RAG runs hybrid BM25, lexical, and semantic retrieval with BGE-M3 embeddings in SQLite and sharded chunk indexes. A code graph tier resolves symbols and imports from indexes/code-graph.json. Web search adds ephemeral Brave/Tavily/Google context. A planner tier handles delegate_subtask, plan mode, and workflows.
Three on-disk corpora feed Tier 2: a dogfood mirror of the monorepo, team docs and runbooks under content/projects/, and optional OSS mirrors after ingest:repo. The live workspace you edit is not the RAG index; patch and read_workspace_file tools handle truth-on-disk.
IT orchestrator
IT_ORCHESTRATOR is the default dev-team mode. A natural-language goal enters the intent router (orchestrator-router.ts), which selects memory tiers and tools. prepareTutorTurn runs hybrid RAG retrieval, applies prompt budget rules, and registers patch context. The agent loop executes with bounded iterations and tools per round, streaming via SSE with retries.
On failure the orchestrator reprompts and re-routes rather than silently continuing. delegate_subtask supports parallel specialist work with depth limits. Edge vs central split: RAG, graph, web, and patch run on the API host; phone or desktop inference routes through registered workers.
Capabilities — core brain
platform-core implements the full turn pipeline: corpus routing, hybrid retrieve, context assembly, policy gates, prompt budgeting, agent loop, and tool execution.
- Hybrid RAG — BM25 + lexical + semantic; BGE-M3 1024-dim embeddings; SQLite chunk store; citation post-check
- Agent loop — multi-iteration LLM execution with streaming, bounded retries, tool truncation
- Plan mode — structured planning, critique runs, plan export to files
- Policy engine — exam mode, tool allow/deny lists, course-scoped capabilities
- Code graph — symbol and import index; graph_lookup without vector search
- Turn preparation — episodic memory blocks, prompt-budget.ts, prompt-patch-registry
- Corpus router — dogfood mirror, team docs, OSS repos, workspace truth separation
Capabilities — agent tools and integrations
The tool registry exposes IT engineering tools behind policy and admin visibility. MCP servers and skill packs extend the agent without hard-coding every integration.
- graph_lookup, search_code, search_docs, retrieve_sources, get_file_content
- web_search — Brave, Tavily, or Google Programmable Search
- propose_patch, verify_patch, PATCH /dev/patch/* safe patch pipeline
- sandbox_execute, run_sandbox, sandbox read/write — isolated shell execution
- delegate_subtask — parallel specialists with IT_DELEGATE_MAX_DEPTH
- MCP invoke via config/mcp-servers.json and platform-mcp-client
- Skill packs — content/skills/**/SKILL.md with config/skill-packs.json registry
- Browser tools — navigate, screenshot, snapshot (host allowlist gated)
- document_generator workflow, validate_document, request_human_review
Capabilities — surfaces and clients
DarDev Studio is a multi-surface product, not a headless API. Each client consumes the same OpenAPI contract.
- IT Studio web (platform-web) — chat UI, plan export panel, tool traces, skill chips, settings drawer
- Admin console (platform-admin) — agent capabilities dashboard, inference workers panel, benchmarks tab, workflows scheduler UI, skill-pack editor
- Terminal TUI (platform-tui) — Ink-based orchestrator for keyboard-first workflows
- Android worker — on-device LiteRT inference over WebSocket
- Windows desktop + worker — Compose UI and headless edge worker
- Async jobs — POST /api/v1/studio/jobs for long-running orchestration
- Streaming contract — SSE events documented in STREAMING_CONTRACT.md
Inference and embedding plane
Chat and embedding run on separate ports and provider pools. Model packs M1 through M5 map modes to sampling profiles with provider failover. Central chat uses Gemma via llama.cpp or LiteRT; embeddings use a dedicated BGE-M3 pool on :8081 while chat stays on :8080.
- Providers: LiteRT server, Ollama, llama.cpp HTTP, OpenAI-compatible HTTP, mock runtime for CI
- Model tier track M1–M5 complete — registry, model-packs.json, envelope benchmarks
- Edge routing — central / self / federated worker modes with PII tier badges in admin
- Context envelope evidence — LiteRT 32K, llama.cpp ~96K verified on RTX 4050 class hardware
- Optional ONNX MiniLM via @xenova/transformers for local embed without GPU pool
Capabilities — workflows, benchmarks, and ops
Beyond chat, DarDev Studio ships workflow orchestration, scheduled jobs, and a benchmark platform used to gate inference and RAG changes.
- Workflow engine + cron scheduler — platform.db, admin Workflows tab
- Benchmark platform phases A–G complete — API, CLI matrix, admin tab, regression gates
- check:integration:dev-team — dev-team integration gate before merge
- Retrieval cache, improvement metrics, training export CLI (phases E–F)
- RAG weight tuning — npm run tune:rag-weights + config/rag_weights.json
- Incremental ingest — npm run ingest:inc; repo ingest via ingest:repo
Measured outcomes
Eval runners and envelope benchmarks on PC — not sprint labels. I do not claim external user scale; gates run on internal engineering workflows and offline fixtures.
- eval:orchestrator-intents — 6/6 offline on engineering-goal fixtures
- check:integration:dev-team — RAG IT 8/8, plan mode 6/6, graph lookup 5/5
- validate:orchestrator-prompts — 5/5 live API prompt matrix at closeout
- RTX 4050 context envelope — LiteRT Gemma 32K max; llama.cpp ~96K before alloc fail
- BGE-M3 dogfood ingest — ~17k chunks on monorepo mirror (RAG v6)
- S01–S13 and API phases A–G shipped on PC (internal program tracking)
Eval and quality gates
Offline gates must pass before orchestrator and RAG changes merge. Eval fixtures live under evals/; runners in platform-eval.
- eval:orchestrator-intents — intent routing across representative engineering goals
- eval:rag:it — retrieval quality on IT corpora with citation verify
- eval:graph — code graph lookup correctness
- eval:it-modes — edition and mode matrix
- eval:plan-mode and adaptive orchestrator matrices
- validate:orchestrator-prompts — live API prompt matrix consistency
- benchmark:platform:gate — inference regression gate
- npm test — platform-core, platform-api, platform-eval on mock runtime
Fifteen-minute walkthrough: SYSTEM_ARCHITECTURE.md, one prepareTutorTurn flow, orchestrator mode, eval output.
Production validation
DarDev Studio is developed and validated against DarDev internal engineering platforms. I ingest monorepo docs and code into org RAG, route natural goals through the IT orchestrator, and ship patches through propose/verify gates on the same repos that power dardev.net and the Studio monorepo itself.
Features that do not survive validation on real delivery work do not ship.
About this program
DarDev Studio is an internal engineering program I lead as architecture owner at DarDev — not a launched consumer product with external user scale. The long-form article documents methodology and falsifiable hypotheses; the case study tells what failed first and what the eval numbers were.
Writing: /writing/ai-orchestration-systems-research · Case study: /case-studies/dardev-studio-ai-orchestration
Apache-2.0 OSS on github.com/Theemiss/dardev-studio — Phase 1 export published (platform-core, platform-api, platform-eval, architecture docs). Web, admin, and edge workers follow in later phases.
Roadmap
Roadmap status derived from PROGRAM_SCOPE_AND_STATUS and roadmap-v1 track. Shipped vs in progress vs planned stated honestly.
- Now — Phase 1 public on GitHub; architecture docs in repo; theemis.cloud systems page and case study
- In progress — Phase H LiteRT 0.12; C1 Android on-device end-to-end
- Deferred — C4 MTP benchmark and safety matrix on device
- Planned — S14–S15 orchestrator and reliability sprints; C5 external deploy after device gates
- Future — Phase J university LMS bridge (separate from active dev-team SKU)
- Ops optional — full corpus RAG v6 re-ingest when embedding pool available
Results
- 6/6
- Orchestrator intents
- eval:orchestrator-intents offline fixtures
- 8/8
- RAG IT eval
- check:integration:dev-team gate
- 32K
- LiteRT envelope
- RTX 4050 max verified context
- ~96K
- llama.cpp envelope
- Gemma E2B before alloc fail
- 5
- Memory tiers
- Session, org RAG, code graph, web, planner
- 8
- Platform packages
- @dardev/platform-* monorepo
Key decisions
Memory tiers over one context blob
Each tier uses a different retrieval or execution mechanism on purpose. Mixing session chat into org RAG pollutes code retrieval. Tiers make tradeoffs explicit.
Separate chat and embedding planes
BGE-M3 on :8081 and chat LLM on :8080 prevent retrieval and generation from competing for the same GPU budget.
Offline eval gates before merge
Orchestrator intents, RAG IT suites, and graph lookup evals run in CI-style gates. Agentic systems without evals are demos.
Safe patch pipeline over raw edits
propose_patch and verify_patch gate workspace changes. Agents that ship code need the same review discipline as human commits.
Open source as distribution
Apache-2.0 on GitHub is the front door. Phased export publishes core engine and evals first; surfaces and workers follow hygiene review.
Production validation on DarDev engineering workflows
The platform ingests real monorepo corpora and routes real patch workflows. Not a sandbox disconnected from delivery.
Open-source proof (GitHub)
Theemiss/dardev-studioLicense: Apache-2.0Phase 1 Apache-2.0 export published: platform-core, platform-api, platform-eval, eval fixtures, and architecture docs. Surfaces and workers publish in later phases.My role
- Architecture owner and primary builder — DarDev Studio from scratch at DarDev
- Monorepo design — platform-core brain, API, evals, inference protocol, MCP, CLI, TUI
- Hybrid RAG, five memory tiers, agent loop, IT orchestrator, and plan mode
- Multi-surface product — IT Studio web, admin console, edge workers, OpenAPI contract
- Eval suite and benchmark platform — orchestrator intents, RAG quality, regression gates
- Open-source export strategy and public architecture documentation (Apache-2.0)
- Production validation — AI-first development on DarDev.net and Studio monorepo delivery