ViduraRAGMilvusVector Search
RAG Knowledge Base
The RAG (Retrieval-Augmented Generation) system provides the LangChain agent with access to AWS Redshift documentation. When a user asks a conceptual question about Redshift best practices, the book_qa() tool searches the vector database for relevant documentation chunks and returns them as context.
Knowledge Base
The knowledge base consists of two AWS documentation files totaling approximately 7.5 MB:
| File | Size | Content |
|---|---|---|
| redshift-dg.txt | 5.2 MB | Amazon Redshift Developer Guide — SQL syntax, query optimization, distribution styles, sort keys, and performance tuning |
| redshift-mgmt.txt | 2.3 MB | Amazon Redshift Management Guide — Cluster management, maintenance windows, snapshots, resizing, and security |
Vector Database Configuration
Milvus Collection Schema
from pymilvus import CollectionSchema, FieldSchema, DataType
fields = [
FieldSchema(
name="id",
dtype=DataType.INT64,
is_primary=True,
auto_id=True # Milvus auto-generates IDs
),
FieldSchema(
name="embedding",
dtype=DataType.FLOAT_VECTOR,
dim=384 # MiniLM-L6-v2 output dimension
),
FieldSchema(
name="text",
dtype=DataType.VARCHAR,
max_length=65535 # Full chunk text storage
),
FieldSchema(
name="source",
dtype=DataType.VARCHAR,
max_length=500 # Source filename
),
]
schema = CollectionSchema(fields, description="Redshift Documentation")
collection = Collection("redshift_books", schema)Index Configuration
IVF_FLAT Index
# Index type: IVF_FLAT
# Inverted File Index with flat (brute-force) search within clusters
index_params = {
"index_type": "IVF_FLAT",
"metric_type": "COSINE", # Cosine similarity (normalized dot product)
"params": {"nlist": 1024} # Number of Voronoi cells/clusters
}
collection.create_index(field_name="embedding", index_params=index_params)
# Search parameters
search_params = {
"metric_type": "COSINE",
"params": {"nprobe": 10} # Search 10 nearest clusters
}
# nprobe=10 means: find 10 most relevant clusters, then brute-force search within them
# Higher nprobe = more accurate but slowerIngestion Pipeline
Document Ingestion (ingest_books.py)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
# Text splitter configuration
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Max characters per chunk
chunk_overlap=150, # Overlap to preserve context across chunks
separators=["
", "
", " ", ""] # Split hierarchy
)
# Embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
# Output: 384-dimensional float vectors
# Model size: ~22MB, runs on CPU efficiently
def ingest_file(filepath: str, collection: Collection):
with open(filepath, 'r') as f:
text = f.read()
chunks = splitter.split_text(text)
embeddings = model.encode(chunks, batch_size=32, show_progress_bar=True)
# Insert into Milvus
collection.insert([
embeddings.tolist(), # float32 vectors
chunks, # original text
[filepath] * len(chunks) # source filename
])Retrieval Algorithm
book_qa() Retrieval
def get_book_context(query: str, top_k: int = 5) -> str:
# 1. Embed the query using the same model
query_embedding = model.encode([query])[0].tolist()
# 2. Search Milvus for nearest neighbors
results = collection.search(
data=[query_embedding],
anns_field="embedding",
param=search_params, # nprobe=10, COSINE metric
limit=top_k, # Return top 5 chunks
output_fields=["text", "source"]
)
# 3. Format results
contexts = []
for hit in results[0]:
score = hit.score # COSINE similarity (0-1, higher=better)
text = hit.entity.text
source = hit.entity.source
contexts.append(f"[Score: {score:.3f} | Source: {source}]
{text}")
return "
---
".join(contexts)Cosine Similarity Explained
Cosine similarity measures the angle between two vectors in the embedding space, regardless of their magnitude:
python
# Cosine similarity formula:
# sim(A, B) = (A · B) / (||A|| × ||B||)
#
# Where:
# - A · B is the dot product of the two vectors
# - ||A|| is the Euclidean norm of vector A
# - Result ranges from -1 (opposite) to 1 (identical)
#
# For sentence embeddings, this measures semantic similarity:
# - 0.9+ = Very similar meaning
# - 0.7-0.9 = Related topic
# - < 0.5 = Likely unrelatedWhy MiniLM-L6-v2?
The
all-MiniLM-L6-v2 model from Sentence Transformers offers an excellent speed/quality tradeoff for retrieval tasks. At 384 dimensions, it's compact enough to run on CPU while maintaining high semantic accuracy for technical documentation search.