ViduraRAGMilvusVector Search

RAG Knowledge Base

The RAG (Retrieval-Augmented Generation) system provides the LangChain agent with access to AWS Redshift documentation. When a user asks a conceptual question about Redshift best practices, the book_qa() tool searches the vector database for relevant documentation chunks and returns them as context.

Knowledge Base

The knowledge base consists of two AWS documentation files totaling approximately 7.5 MB:

File	Size	Content
redshift-dg.txt	5.2 MB	Amazon Redshift Developer Guide — SQL syntax, query optimization, distribution styles, sort keys, and performance tuning
redshift-mgmt.txt	2.3 MB	Amazon Redshift Management Guide — Cluster management, maintenance windows, snapshots, resizing, and security

Vector Database Configuration

Milvus Collection Schema

from pymilvus import CollectionSchema, FieldSchema, DataType

fields = [
    FieldSchema(
        name="id",
        dtype=DataType.INT64,
        is_primary=True,
        auto_id=True        # Milvus auto-generates IDs
    ),
    FieldSchema(
        name="embedding",
        dtype=DataType.FLOAT_VECTOR,
        dim=384             # MiniLM-L6-v2 output dimension
    ),
    FieldSchema(
        name="text",
        dtype=DataType.VARCHAR,
        max_length=65535    # Full chunk text storage
    ),
    FieldSchema(
        name="source",
        dtype=DataType.VARCHAR,
        max_length=500      # Source filename
    ),
]

schema = CollectionSchema(fields, description="Redshift Documentation")
collection = Collection("redshift_books", schema)

Index Configuration

IVF_FLAT Index

# Index type: IVF_FLAT
# Inverted File Index with flat (brute-force) search within clusters
index_params = {
    "index_type": "IVF_FLAT",
    "metric_type": "COSINE",  # Cosine similarity (normalized dot product)
    "params": {"nlist": 1024} # Number of Voronoi cells/clusters
}

collection.create_index(field_name="embedding", index_params=index_params)

# Search parameters
search_params = {
    "metric_type": "COSINE",
    "params": {"nprobe": 10}  # Search 10 nearest clusters
}
# nprobe=10 means: find 10 most relevant clusters, then brute-force search within them
# Higher nprobe = more accurate but slower

Ingestion Pipeline

Document Ingestion (ingest_books.py)

from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer

# Text splitter configuration
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Max characters per chunk
    chunk_overlap=150,    # Overlap to preserve context across chunks
    separators=["

", "
", " ", ""]  # Split hierarchy
)

# Embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
# Output: 384-dimensional float vectors
# Model size: ~22MB, runs on CPU efficiently

def ingest_file(filepath: str, collection: Collection):
    with open(filepath, 'r') as f:
        text = f.read()

    chunks = splitter.split_text(text)
    embeddings = model.encode(chunks, batch_size=32, show_progress_bar=True)

    # Insert into Milvus
    collection.insert([
        embeddings.tolist(),      # float32 vectors
        chunks,                   # original text
        [filepath] * len(chunks)  # source filename
    ])

Retrieval Algorithm

book_qa() Retrieval

def get_book_context(query: str, top_k: int = 5) -> str:
    # 1. Embed the query using the same model
    query_embedding = model.encode([query])[0].tolist()

    # 2. Search Milvus for nearest neighbors
    results = collection.search(
        data=[query_embedding],
        anns_field="embedding",
        param=search_params,     # nprobe=10, COSINE metric
        limit=top_k,             # Return top 5 chunks
        output_fields=["text", "source"]
    )

    # 3. Format results
    contexts = []
    for hit in results[0]:
        score = hit.score         # COSINE similarity (0-1, higher=better)
        text = hit.entity.text
        source = hit.entity.source
        contexts.append(f"[Score: {score:.3f} | Source: {source}]
{text}")

    return "

---

".join(contexts)

Cosine Similarity Explained

Cosine similarity measures the angle between two vectors in the embedding space, regardless of their magnitude:

python

# Cosine similarity formula:
# sim(A, B) = (A · B) / (||A|| × ||B||)
#
# Where:
# - A · B is the dot product of the two vectors
# - ||A|| is the Euclidean norm of vector A
# - Result ranges from -1 (opposite) to 1 (identical)
#
# For sentence embeddings, this measures semantic similarity:
# - 0.9+ = Very similar meaning
# - 0.7-0.9 = Related topic
# - < 0.5 = Likely unrelated

Why MiniLM-L6-v2?

The all-MiniLM-L6-v2 model from Sentence Transformers offers an excellent speed/quality tradeoff for retrieval tasks. At 384 dimensions, it's compact enough to run on CPU while maintaining high semantic accuracy for technical documentation search.