ViduraRAG
Knowledge Base
The RAG (Retrieval-Augmented Generation) knowledge base stores pre-processed AWS Redshift documentation as vector embeddings. When a user asks a conceptual question about Redshift internals, best practices, or configuration options, the agent queries this knowledge base to retrieve authoritative documentation context before synthesizing an answer.
Source Documents
| File | Size | Chunks | Topics Covered |
|---|---|---|---|
| redshift-dg.txt | 5.2 MB | ~5,200 | SQL syntax, query optimization, distribution styles, sort keys, WLM configuration, system tables and views, window functions, COPY and UNLOAD commands |
| redshift-mgmt.txt | 2.3 MB | ~2,300 | Cluster management, maintenance windows, snapshots, resizing operations, security and IAM, parameter groups, VPC configuration, monitoring and logging |
Processing Pipeline
Each source document goes through the following processing steps before being stored in Milvus:
- Text Extraction: Raw text files are loaded directly (documents were pre-converted from AWS official PDF documentation to plain text).
- Chunking: Split into overlapping chunks using LangChain's
RecursiveCharacterTextSplitter:- Chunk size: 1,000 characters
- Overlap: 150 characters (ensures context continuity at chunk boundaries)
- Embedding: Each chunk is embedded using all-MiniLM-L6-v2, a 384-dimensional sentence transformer model from HuggingFace. Chosen for its balance of embedding quality and inference speed.
- Indexing: Embeddings stored in Milvus with an IVF_FLAT index (1,024 cells) using COSINE similarity metric.
Retrieval Configuration
| Parameter | Value | Description |
|---|---|---|
| embedding_model | all-MiniLM-L6-v2 | 384-dimensional sentence transformer from HuggingFace |
| vector_db | Milvus | Open-source vector database for similarity search |
| index_type | IVF_FLAT | Inverted file index with flat quantization — exact search within cells |
| nlist | 1,024 | Number of cluster cells in the IVF index |
| metric_type | COSINE | Cosine similarity for semantic distance measurement |
| top_k | 5 | Number of most similar chunks returned per query |
When the Agent Uses book_qa
The agent calls book_qa(question) when the user asks conceptual questions where documentation context is more reliable than real-time data. Typical triggers:
- "What is the difference between KEY and EVEN distribution styles?"
- "How does WLM queue concurrency scaling work?"
- "What are the best practices for choosing a sort key?"
- "Explain what the stl_alert_event_log table records"
- "When should I use VACUUM SORT ONLY vs. VACUUM DELETE ONLY?"
RAG vs. Real-Time Data
For operational questions ("which tables need vacuuming right now?"), the agent uses the Redshift performance tools to query live data. For conceptual questions about how Redshift works, the agent uses
book_qa to retrieve documentation. The agent's system prompt guides it to choose the appropriate tool type for each question category.