Architecture
Vidura is a LangChain ReAct agent backed by GPT-4o-mini and deployed as a FastAPI service. It provides 26 tools across three categories: Redshift performance tools, AWS Cost Intelligence Dashboard (CID) tools, and a RAG tool for Redshift documentation lookups.
System Diagram
┌─────────────────────────────────────────────────────────────┐
│ Quper Web Frontend │
│ POST /api/vidura → FastAPI /ask │
└─────────────────────────┬───────────────────────────────────┘
│ HTTP (JSON)
┌─────────────────────────▼───────────────────────────────────┐
│ FastAPI Server (main.py) │
│ POST /ask ──► LangChain ReAct Agent (GPT-4o-mini, temp=0) │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Redshift Tools │ │ CID Tools │ │
│ │ (19 tools) │ │ (6 tools) │ │
│ │ psycopg2 driver │ │ boto3 Athena │ │
│ └────────────────────┘ └────────────────────┘ │
│ ┌────────────────────┐ │
│ │ RAG Tool (1) │ │
│ │ Milvus + MiniLM │ │
│ └────────────────────┘ │
└──────────────┬──────────────────────┬───────────────────────┘
│ psycopg2 │ boto3
┌──────────────▼──────┐ ┌───────────▼─────────────────────┐
│ Amazon Redshift │ │ AWS Athena (CID Views) │
│ (cluster metrics, │ │ (s3://cid-data/cudos/kpi/...) │
│ system tables) │ │ │
└─────────────────────┘ └─────────────────────────────────┘Component Breakdown
FastAPI Server
The entry point is a FastAPI application (main.py) that exposes a single POST endpoint at /ask. It accepts a JSON body with a prompt field, invokes the LangChain agent, and returns the final response. The agent is run in a thread pool (asyncio.to_thread) to avoid blocking the async event loop during synchronous tool execution.
LangChain ReAct Agent
The agent uses LangChain's create_tool_calling_agent with GPT-4o-mini as the reasoning model (temperature=0 for deterministic analytical outputs). The ReAct pattern alternates between:
- Thought — The LLM reasons about what information is needed
- Action — A tool is selected and called with arguments
- Observation — The tool result is fed back to the LLM
- This cycle repeats until the LLM produces a Final Answer
Redshift Tools (19 tools)
Python functions decorated with LangChain's @tool decorator. Each tool connects to Redshift via psycopg2 and executes a targeted SQL query against Redshift system views. Tools are organized into three categories: query performance, health monitoring, and storage analysis. See the Redshift Tools section for full tool documentation.
CID Tools (6 tools)
Tools that query AWS Cost Intelligence Dashboard (CID) views via boto3 Athena client. The CID data lives in S3 as Parquet files; Athena provides SQL access. The list_cid_views() tool discovers available views at runtime, and query_cid_view() executes parameterized queries against them.
RAG Tool — Milvus + MiniLM
A single book_qa(question) tool that performs semantic search over pre-embedded Redshift documentation. The knowledge base contains ~7,500 chunks from two source documents, embedded with all-MiniLM-L6-v2 (384 dimensions) and stored in Milvus with an IVF_FLAT index. The tool retrieves the top-k relevant chunks and returns them as context for the agent to synthesize an answer.
Temperature = 0
temperature=0 (fully deterministic). For analytical tasks — identifying savings opportunities, diagnosing performance issues, calculating metrics — deterministic outputs are essential so users can trust and reproduce the analysis. Creative variation is not desirable in a FinOps context.