LogoTRUONG PHAM
Home
Projects
Blogs
YouTube
Contact

Newsletter

Stay updated with technical artifacts and engineering insights.

LogoTRUONG PHAM

Building scalable software and sharing insights on technology & life.

Sitemap

  • Home
  • Projects
  • Blogs
  • YouTube
  • Contact

Connect

  • GitHub
  • LinkedIn
  • Email
  • YouTube

© 2024 TRUONG PHAM. © All rights reserved.

Privacy PolicyTerms of Service
Back
RAG in Production [P1]: Real-world Problem - When Does a Business Actually Need AI?
RAG in Production — The Journey of Building a Real-world AI System

RAG in Production [P1]: Real-world Problem - When Does a Business Actually Need AI?

Explore the practical journey of building a RAG system for enterprises, starting from identifying the right business problem instead of just chasing technology.

TP
Truong PhamSoftware Engineer
PublishedMarch 25, 2024
Stack
RAG ·AI ·LLM ·Business Strategy

"We want to build an AI chatbot for our company." I hear this at least 3 times a week. But when I ask, "to solve what specific problem?", most of the answers are... silence.*


Table of Contents

  1. Series Introduction
  2. Context: Why Did We Build RAG?
  3. Correctly Identifying the Business Problem
  4. Specific Pain Points
  5. Measuring Business Impact
  6. Is AI the Only Solution?
  7. When to Use and NOT Use AI
  8. Defining Scope & Success Metrics
  9. Stakeholder Buy-in
  10. Conclusion & Next Post

Series Introduction

Welcome to "RAG in Production — The Journey of Building a Real-world AI System".

This is not a copy-paste tutorial series from documentation. This is a real journey that I and my team went through when building a RAG (Retrieval-Augmented Generation) system for a medium to large enterprise — full of mistakes, learnings, and architectural decisions that had to be redone from scratch.

In the 10 posts of this series, we will go through the entire lifecycle of a production RAG system:

PostTopic
01Business Problem — the post you are reading
02Why RAG — comparing different approaches
03Architecture Design
04Backend Implementation
05Vector Database Design
06LLM Inference Deployment
07Containerization & Orchestration
08Monitoring & Optimization
09Security Considerations
10Lessons Learned & Future Improvements

Reference Tech Stack throughout the series: Python · FastAPI · LangChain · Qdrant · OpenAI / vLLM · Docker · Kubernetes · Prometheus · Grafana

Level Note: The series is designed so that both Junior and Senior developers can read it. The basic part explains what, the advanced part explains why and the trade-offs.


Context

The Story Begins

In 2023, our company — a Fintech with ~500 employees — was facing a classic problem: knowledge silos (knowledge scattered and isolated in each department).

The Customer Support team receives 200–300 tickets/day. About 60% of them are repetitive questions about products, policies, and processes. Each question takes an average of 8–12 minutes for an agent to handle — not because it's difficult, but because they have to:

  1. Search in Confluence (more than 3,000 pages of documentation)
  2. Ask more experienced colleagues
  3. Dig through old email threads
  4. Sometimes misread and give the wrong answer

That was when our product manager said the familiar phrase: "Should we build an AI chatbot?"


Correctly Identifying the Business Problem

Most Common Mistake: Starting from the solution, not the problem

Many teams jump straight into "we will use GPT-4" or "we need a vector database" without spending time answering the core question:

"What is the actual problem we are solving?"

This is not a technical question. This is a business question.

Framework: 5 Whys to dig into the root cause

We applied 5 Whys — a root cause analysis technique from Toyota:

Observed Problem:
"Customer support agents spend too much time answering tickets"

Why 1: Why does it take a lot of time?
→ They have to manually search for information across multiple sources.

Why 2: Why do they have to search manually?
→ There is no tool that aggregates information from all sources.

Why 3: Why is there no aggregation tool yet?
→ Documentation is scattered across Confluence, SharePoint, email, and Slack.

Why 4: Why is the documentation so scattered?
→ Lack of a centralized knowledge management process.

Why 5: Why is there a lack of a process?
→ The company grew fast, knowledge was created ad-hoc, and there was no owner.

Root Cause: Knowledge fragmentation due to lack of governance

Key Conclusion: The problem is not "needing AI". The problem is "the need for a system that can intelligently aggregate and retrieve fragmented knowledge". AI is a means, not an end.


Specific Pain Points

After interviewing 15 end users (customer support agents, onboarding specialists, junior developers), we aggregated the specific pain points:

1. Information Overload

Number of documents on Confluence: 3,247 pages
Number of Slack channels: 89
Number of related email threads: ~15,000/month
Average time to find the right information: 6-8 minutes

Agents don't lack information — they are drowning in it. The problem is findability (the ability to find the right thing at the right time).

2. Knowledge Staleness

Documentation is updated but no one knows. An agent answers a customer based on an old policy from 6 months ago — causing churn.

Practical example:

  • Refund policy changed in March
  • Documentation on Confluence was updated
  • But 30% of agents are still answering according to the old policy because no one notified them

3. Implicit Knowledge

"Tips" for handling edge cases only exist in the heads of senior agents. When they leave the company or transfer teams, that knowledge is lost.

4. Inconsistent Answers

For the same question, 3 agents can answer in 3 different ways. This creates an inconsistent experience for customers.

5. Onboarding Bottleneck

New employees take 3–4 weeks to be confident enough to handle tickets independently. Most of that time is spent "absorbing" knowledge — a process that cannot scale.


Measuring Business Impact

One important thing when presenting to stakeholders: don't speak with emotion, speak with numbers.

Quantitative Metrics

We measured baseline metrics before doing anything:

📊 BASELINE METRICS (before the system)

Average Handle Time (AHT):        10.2 minutes/ticket
First Contact Resolution (FCR):   67%
Tickets/agent/day:                28 tickets
Cost per ticket:                   ~$4.2 USD
Customer Satisfaction (CSAT):     3.6/5
Onboarding time (new agent):       3.5 weeks
Escalation rate:                   23%

Projected Impact

Based on industry research and a small POC (Proof of Concept):

🎯 TARGET METRICS (after deployment)

AHT reduction:                     ~35% → ~6.6 minutes
FCR increase:                      ~75%
Cost per ticket reduction:          ~$2.8 USD (~33%)
CSAT increase:                     ~4.1/5
Onboarding time reduction:          ~1.5 weeks
Escalation rate reduction:          ~15%

💰 ROI ESTIMATE (1 year with 50 agents)
System cost:                       ~$80,000 (build + infra)
Savings from AHT:                  ~$210,000
Savings from onboarding:           ~$45,000
Net benefit:                       ~$175,000

Pro tip: When calculating ROI, be conservative. It's better to deliver more than expected rather than promising too much and disappointing.

Qualitative Benefits

Not everything can be measured in money:

  • Reduced stress for agents when not having to handle repetitive questions
  • Increased job satisfaction when doing more complex work
  • Brand consistency — every customer receives a consistent answer
  • Institutional knowledge preservation — knowledge is not lost when people leave

Is AI the Only Solution?

This is an important question many teams ignore because they got "hyped" for AI too early.

Before deciding on AI, evaluate the alternatives:

Option 1: Improve existing search engine

Solution: Upgrade Confluence search, add tags, improve taxonomy.

Pros: Cheap, fast, low risk.

Cons: Still requires users to know exactly what they are looking for. Doesn't understand context. Can't aggregate information from multiple sources.

Conclusion: Keywords-based search improvement is not enough when the problem is semantic understanding.

Option 2: FAQ Database + Decision Tree

Solution: Build a structured FAQ, flow-based chatbot.

Pros: Simple, easy to control, no AI needed.

Cons: Can't handle questions outside the script. High maintenance effort. When the business changes, you have to manually update each node in the decision tree.

Conclusion: Suitable if use cases are very narrow and stable. Not suitable for fast-changing environments.

Option 3: Hire more people

Solution: Hire more senior agents to share knowledge.

Pros: No tech needed.

Cons: Not scalable. High hiring and training costs. Still doesn't solve the knowledge fragmentation problem.

Conclusion: Band-aid solution, not a systemic fix.

Option 4: RAG System ✅

Solution: A system that can understand semantic questions, retrieve information from multiple sources, and aggregate contextual answers.

Pros: Scalable. Automatically updates when documentation changes. Handles complex questions. Consistent.

Cons: More complex to build. Needs infrastructure. Needs monitoring. Can hallucinate if not designed right.

Conclusion: Suitable for our problem — large knowledge base, diverse questions, high frequency.


When to Use and NOT Use AI

✅ SHOULD use RAG when:

1. You have a large, unstructured knowledge base
2. User questions are diverse and unpredictable
3. Information changes frequently
4. Need to aggregate information from multiple sources
5. High volume of repetitive questions → saves significant time
6. Answer quality is more important than speed

❌ SHOULD NOT use RAG when:

1. Questions are simple and can be hardcoded → use FAQ
2. Need 100% accuracy (complex legal, financial) → needs human review
3. Knowledge base < 100 documents → over-engineering
4. Team doesn't have skills to maintain AI system
5. Budget is too low → not enough for infra + monitoring
6. Time is too short → POC is better than a rushed production system

🤔 NEED careful consideration when:

1. Highly specialized domain (medical, law) → high hallucination risk
2. Data contains PII (Personally Identifiable Information) → complex security
3. Real-time data is needed → basic RAG is not enough, needs hybrid
4. Multilingual → embedding model selection is more important

Defining Scope & Success Metrics

Scope Definition: Start small

A classic mistake is a too large scope right from the beginning. We applied the principle "Start Small, Prove Value, Scale":

Phase 1 (MVP - 6 weeks):
  - Ingest documents from Confluence only
  - For Customer Support team only (25 agents)
  - Interface: Simple Slack bot
  - Use case: Answering product questions

Phase 2 (3 months):
  - Expand to email, PDF handbook
  - Expand to onboarding team
  - Integrate into helpdesk software
  - Add feedback mechanism

Phase 3 (6+ months):
  - Company-wide
  - Multi-language
  - Proactive suggestions
  - Automated escalation routing

OKRs (Objectives & Key Results)

These are the OKRs we set for Phase 1:

Objective: Reduce ticket handle time for Customer Support

Key Results:
  KR1: AHT reduced by at least 20% after 4 weeks of pilot
  KR2: CSAT does not drop below 3.4 (baseline: 3.6)
  KR3: 80% of agents use it at least once a day after 2 weeks
  KR4: System uptime ≥ 99% during working hours
  KR5: P95 response time < 5 seconds

Acceptance Criteria for AI Responses

Defining what a "good answer" is is just as important as OKRs:

✅ Response is acceptable when:
  - Relevant: On the topic the user asked
  - Accurate: Based on official documentation, no hallucinations
  - Concise: Not longer than necessary
  - Cited: Has source citations for agent verification
  - Actionable: Can be used immediately, without much editing

❌ Response is NOT acceptable when:
  - Answers based on information NOT in the knowledge base
  - Uses information from outdated sources
  - Doesn't answer the question (topic drift)
  - Inventing numbers, dates, product names

Stakeholder Buy-in

Techniques for convincing stakeholders without a technical background

This is a soft skill but just as important as technical skills. Here's what worked for us:

1. Demo first, explain later

Don't spend 30 minutes explaining what RAG is. Build a simple prototype in 2 days and demo live with a real question that the stakeholder cares about.

Example demo script:
"I will ask the system the question that an agent spent 8 minutes 
searching for yesterday: [real question about refund policy]"

→ System answers in 3 seconds, with correct citations.

Stakeholder reaction: "Oh... that's good."

2. Speak their language

  • To CFO: ROI, cost reduction, payback period
  • To COO: Efficiency, SLA, scalability
  • To Head of Customer Success: CSAT, agent satisfaction, onboarding speed
  • To CTO: Tech stack, security, maintainability

3. Identify and address fears

Stakeholders are often afraid of things they don't say out loud:

Hidden Fear           → How to address
─────────────────────────────────────────────────────
"AI will replace agents?" → "No. AI is a copilot, agents
                             are still the final decision makers."

"Data leak?"          → "All data stays on our infra, 
                         not sent outside."

"What if AI is wrong?" → "Agents verify before sending.
                         Mechanism to report and fix."

"Stable cost?"        → "Cost cap and automated alerts."

4. Propose a small, clear Pilot

Instead of asking for a budget for the whole system, ask for a budget for a pilot with a specific time and metrics:

"We need 6 weeks and $15,000 to prove the concept. 
After 6 weeks, if AHT doesn't drop by at least 15%, we stop. 
If it does, we have enough data to build the full system."

This approach minimizes perceived risk and creates clear checkpoints.


Overview: From Problem to Solution

After all the analysis steps above, here's how we articulated our final problem statement:

┌─────────────────────────────────────────────────────────┐
│                   PROBLEM STATEMENT                      │
│                                                          │
│  "Customer Support agents at [Company] are losing        │
│   an average of 10.2 minutes/ticket due to manual       │
│   search across 3,000+ scattered documents. This leads   │
│   to a cost of $4.2/ticket, CSAT of 3.6/5, and an        │
│   escalation rate of 23%. We need a system that can     │
│   understand semantic questions and retrieve accurate    │
│   information from the knowledge base in < 5 seconds."   │
└─────────────────────────────────────────────────────────┘

        ↓ Solution ↓

┌─────────────────────────────────────────────────────────┐
│                   RAG SYSTEM                             │
│                                                          │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐          │
│  │ Documents│───▶│  Vector  │───▶│   LLM    │          │
│  │Confluence│    │   DB     │    │ Generate │          │
│  │SharePoint│    │ (Search) │    │ Answer   │          │
│  │   PDF    │    └──────────┘    └──────────┘          │
│  └──────────┘         ▲                │               │
│                        │                ▼               │
│                   [User Query]   [Cited Answer]         │
└─────────────────────────────────────────────────────────┘

Checklist Before You Start Building

Before running any line of code, make sure you can answer all of the following questions:

✅ Problem Definition
   □ Clearly defined the problem (not the solution)?
   □ Found the root cause using 5 Whys or equivalent?
   □ Talked to at least 10 real users?

✅ Business Justification
   □ Have baseline metrics?
   □ Calculated ROI (even if estimated)?
   □ Stakeholders approved?

✅ Solution Validation
   □ Evaluated alternatives?
   □ Confirmed AI is the most suitable approach?
   □ Built or seen a POC?

✅ Scope & Success
   □ MVP scope clearly defined?
   □ Success metrics agreed upon?
   □ Realistic timeline?

✅ Risk Assessment
   □ Identified main risks?
   □ Have a mitigation plan?
   □ Have a rollback plan in case of failure?

Conclusion & Next Post

A clear business problem is the foundation of everything. Without it, you are building something no one needs — or building the right thing in the wrong way.

3 takeaways from this post:

  1. Start from the pain, not from the solution. "We need AI" is not a problem statement.

  2. Measure everything that can be measured. Baseline metrics are the strongest persuasion tool.

  3. Start small, prove value, scale. A small pilot with clear metrics is better than a grand plan no one believes in.


👉 Next Post: [Post 02] What is RAG & Why Not Fine-tuning or Prompt Engineering?

In the next post, we will dive deep into Why RAG — a detailed comparison between approaches: Prompt Engineering, Fine-tuning, RAG, and Hybrid. When to use what? What are the real trade-offs?


📬 If this post was helpful, please share it with colleagues who are about to start an AI project. And if you are in a similar situation, leave a comment — I'll try to answer.


Author: [Your Name] Series: RAG in Production — The Journey of Building a Real-world AI System Tags: RAG AI LLM System Design Production Business Analysis

Series • Part 1 of 11

RAG in Production — The Journey of Building a Real-world AI System

NextRAG in Production [P2]: What is RAG? Why not Fine-tuning or Prompt Engineering?
01RAG in Production [P1]: Real-world Problem - When Does a Business Actually Need AI?Reading02RAG in Production [P2]: What is RAG? Why not Fine-tuning or Prompt Engineering?03RAG in Production [P3]: Architecture Design - Blueprint for an Enterprise RAG System04RAG in Production [P4]: Backend Implementation - Building the Engine with FastAPI & LangChain05RAG in Production [P5]: Vector Database Design - Optimizing Qdrant for Scale06RAG in Production [P6]: LLM Inference Deployment - Scalability with vLLM & Kubernetes07RAG in Production [P7]: DevOps & GitOps - Orchestrating the RAG Ecosystem08RAG in Production [P8]: Monitoring & Optimization - Keeping an Eye on Your AI09RAG in Production [P9]: Security & Privacy - Protecting Your Enterprise Data10RAG in Production [P10]: Future Improvements - Agentic RAG, GraphRAG & Beyond11RAG in Production [P11]: Lessons Learned - 15 Hard Truths About RAG in Production
TP

Written by Truong Pham

Software Engineer passionate about building high-performance systems and meaningful experiences.

Read more articles