Engineering Blog
RAG, Agents, and Fine-Tuning — What Actually Works in Production (and What Doesn’t)
RAG, agents, and fine-tuning can work in production—if you build with measurement, constraints, and disciplined operations.
There’s no shortage of AI architecture diagrams showing Retrieval-Augmented Generation (RAG), agents, and fine-tuning layered together. In practice, many of these systems struggle once they hit production.
So what actually works?
RAG works — when it’s done properly
RAG is one of the most reliable ways to ground AI in real information. But production RAG systems succeed only when:
- Documents are chunked with meaning preserved
- Embeddings reflect the domain
- Retrieval quality is measured, not assumed
- Guardrails prevent irrelevant or sensitive content from leaking into prompts
Poor RAG doesn’t just reduce accuracy — it increases cost by inflating context and retry rates.
Agents need constraints
Agentic systems are powerful, but uncontrolled agents can be expensive and risky. In production, successful agent systems:
- Use explicit tool definitions
- Restrict credentials and destinations
- Log every action for auditability
- Fail safely when confidence is low
Agents should behave like well-designed software components, not autonomous experiments.
Fine-tuning should be lightweight and continuous
Full model retraining is rarely practical. Instead, parameter-efficient fine-tuning methods (like LoRA) allow teams to:
- Adapt behaviour without retraining the full model
- Roll changes forward or back safely
- Maintain multiple variants for different use cases
The most effective systems treat fine-tuning as a continuous improvement loop, not a one-off event.
The production mindset
What separates production AI from demos is discipline:
- Measure latency, accuracy, and cost
- Test prompts and workflows like code
- Monitor real usage and adjust continuously
AI works best when it’s treated as infrastructure, not magic.