Engineering Patterns
RAG loops, agent orchestration, reflection, multi-agent composition, memory graphs.
Engineering Patterns
What this is
Building a thinking system is mostly about structuring existing parts — LLMs, retrievers, APIs, eval loops — into a stable pipeline. These are the patterns that keep showing up.
Core patterns
A. Retrieval-augmented loop
Purpose: ground reasoning in verifiable data.
Flow:
User Input → Embedding → Vector Search → Context Assembly → Generation → Output
Components:
- Embedding model (e.g.,
text-embedding-3-small) - Vector store (pgvector, Pinecone, Weaviate, Chroma)
- Retriever
- Context assembler
- LLM
Things that matter:
- Chunking granularity drives relevance and cost.
- Embedding schema has to map to domain concepts.
- Combine semantic similarity with metadata filtering.
- Cache embeddings.
What you get: factual continuity. Fewer hallucinations.
B. Agent-orchestrated workflow
Purpose: dynamic planning and tool use.
Flow:
Goal → Planner → Tool Calls → Feedback → Plan Update → Result
Components:
- Planner: the LLM deciding which tool to call.
- Tool registry: typed functions callable by schema.
- Sandbox: safe, isolated execution.
- Observation handler: captures results.
Things that matter:
- Strict, typed schemas (JSONSchema).
- Timeout and validation around tools.
- Memory connectors for plan state.
- Audit every action.
What you get: something that behaves like an intelligent process manager, not a stateless prompt.
C. Reflection and evaluation loop
Purpose: self-correction and monitoring.
Flow:
Action Result → Evaluator → Score → Memory Update → Next Iteration
Components:
- Evaluator: small model or human.
- Metrics engine: coherence, accuracy, success rate.
- Feedback store: log for retraining.
Things that matter:
- Use cheap models for evaluation.
- Dashboard the results.
- RL or weighted rules if you're moving toward autonomy.
What you get: reasoning becomes adaptive instead of fixed.
D. Multi-agent composition
Purpose: scale by specialising.
Flow:
Controller → Sub-Agent Delegation → Results → Aggregation → Final Response
Components:
- Controller: decomposes the goal.
- Sub-agents: retrieval, synthesis, evaluation.
- Message bus.
- Consensus protocol: voting, confidence scoring.
Things that matter:
- Clear interface contracts.
- Depth limits — no uncontrolled recursion.
- Latency tracking across hops.
- Trace IDs for observability.
What you get: composable intelligence that scales across domains.
E. Persistent memory graph
Purpose: context as a network, not a log.
Structure:
- Nodes: events, entities, decisions, observations.
- Edges: causal or semantic relationships.
- Queries: vector + symbolic hybrid.
Things that matter:
- Property graphs (Neo4j, ArangoDB) or hybrid vector-graph stores.
- Summarisation nodes for long history.
- Integrate with RAG.
What you get: memory that generalises, not just recalls.
Reference flow
A production system stitches these together:
1. Input from user or environment (Interface)
2. Intent parsed, context retrieved (Orchestration + RAG)
3. Reasoning plan generated (Agent)
4. Tools invoked (Action)
5. Results evaluated (Reflection)
6. Memory graph updated (Knowledge)
7. Response generated (Interface)
Closed loop. Feedback, grounding, continuity.
Deployment
| Component | Hosting | Stack | | -------------------- | ------------------------------- | ----------------------------------- | | Frontend / Interface | Serverless (Vercel, Cloudflare) | Next.js + AI SDK | | Agent Orchestrator | Stateful microservice | Node/Express, FastAPI, LangGraph | | Vector Store | Managed DB | Supabase (pgvector) / Pinecone | | Memory Graph | Persistent DB | Neo4j / RedisGraph | | Observability | Logging + Metrics | OpenTelemetry, Prometheus | | Security | AuthN/AuthZ, rate limiting | JWT, API Gateway |
Operational notes:
- Log every reasoning step.
- Version your pipelines.
- Sandbox tools.
- Treat token use and latency as first-class metrics.
Failure modes
| Risk | Cause | Mitigation | | --------------------- | ------------------------- | --------------------------------------------------- | | Hallucination | Weak retrieval | Tighten RAG relevance, enforce context injection | | Looping | Unbounded recursion | Iteration limits, plan termination checks | | Data drift | Outdated embeddings | Re-embed periodically | | Context explosion | Oversized prompts | Summarise history dynamically | | Latency spikes | Deep chains | Parallelise sub-agents, batch tool calls |
Governance
Transparent and auditable by design.
- Trace every chain (input → output → action → eval).
- Store traces for reproducibility.
- Feedback ledger for human review.
- Safety guardrails in orchestration, not afterthoughts.
Telemetry I track:
- Token and cost per interaction
- Retrieval hit ratio
- Tool success/failure
- Task completion times
- Reflection score deltas
The operational view of cognition.
Evolution path
| Stage | Description | Transition | | ----------------- | ------------------------------------ | -------------------------------------------- | | 1. Reactive | Static response to input | Add RAG | | 2. Contextual | Recalls past context | Add persistent memory | | 3. Procedural | Plans multi-step | Add agents | | 4. Reflective | Evaluates own performance | Add feedback loops | | 5. Adaptive | Improves autonomously | RL or retraining cycles |
Each stage builds on the last. Intelligence emerges from the stability of feedback between them.
The shift
Building a thinking system means moving from code-centric to system-centric design. The question isn't "which model do we use." It's:
How do information, reasoning, and memory interact to produce reliable understanding?
When that's clear, observable, and grounded, the result behaves less like a chatbot and more like a collaborator.