ViduraRAG

Knowledge Base

The RAG (Retrieval-Augmented Generation) knowledge base stores pre-processed AWS Redshift documentation as vector embeddings. When a user asks a conceptual question about Redshift internals, best practices, or configuration options, the agent queries this knowledge base to retrieve authoritative documentation context before synthesizing an answer.

Source Documents

FileSizeChunksTopics Covered
redshift-dg.txt5.2 MB~5,200SQL syntax, query optimization, distribution styles, sort keys, WLM configuration, system tables and views, window functions, COPY and UNLOAD commands
redshift-mgmt.txt2.3 MB~2,300Cluster management, maintenance windows, snapshots, resizing operations, security and IAM, parameter groups, VPC configuration, monitoring and logging

Processing Pipeline

Each source document goes through the following processing steps before being stored in Milvus:

  1. Text Extraction: Raw text files are loaded directly (documents were pre-converted from AWS official PDF documentation to plain text).
  2. Chunking: Split into overlapping chunks using LangChain's RecursiveCharacterTextSplitter:
    • Chunk size: 1,000 characters
    • Overlap: 150 characters (ensures context continuity at chunk boundaries)
  3. Embedding: Each chunk is embedded using all-MiniLM-L6-v2, a 384-dimensional sentence transformer model from HuggingFace. Chosen for its balance of embedding quality and inference speed.
  4. Indexing: Embeddings stored in Milvus with an IVF_FLAT index (1,024 cells) using COSINE similarity metric.

Retrieval Configuration

ParameterValueDescription
embedding_modelall-MiniLM-L6-v2384-dimensional sentence transformer from HuggingFace
vector_dbMilvusOpen-source vector database for similarity search
index_typeIVF_FLATInverted file index with flat quantization — exact search within cells
nlist1,024Number of cluster cells in the IVF index
metric_typeCOSINECosine similarity for semantic distance measurement
top_k5Number of most similar chunks returned per query

When the Agent Uses book_qa

The agent calls book_qa(question) when the user asks conceptual questions where documentation context is more reliable than real-time data. Typical triggers:

  • "What is the difference between KEY and EVEN distribution styles?"
  • "How does WLM queue concurrency scaling work?"
  • "What are the best practices for choosing a sort key?"
  • "Explain what the stl_alert_event_log table records"
  • "When should I use VACUUM SORT ONLY vs. VACUUM DELETE ONLY?"

RAG vs. Real-Time Data

For operational questions ("which tables need vacuuming right now?"), the agent uses the Redshift performance tools to query live data. For conceptual questions about how Redshift works, the agent uses book_qa to retrieve documentation. The agent's system prompt guides it to choose the appropriate tool type for each question category.