ViduraArchitecture

Architecture

Vidura is a LangChain ReAct agent backed by GPT-4o-mini and deployed as a FastAPI service. It provides 26 tools across three categories: Redshift performance tools, AWS Cost Intelligence Dashboard (CID) tools, and a RAG tool for Redshift documentation lookups.

System Diagram

Vidura System Architecture

┌─────────────────────────────────────────────────────────────┐
│                   Quper Web Frontend                         │
│           POST /api/vidura → FastAPI /ask                    │
└─────────────────────────┬───────────────────────────────────┘
                          │ HTTP (JSON)
┌─────────────────────────▼───────────────────────────────────┐
│              FastAPI Server (main.py)                        │
│  POST /ask ──► LangChain ReAct Agent (GPT-4o-mini, temp=0)  │
│                                                              │
│  ┌────────────────────┐  ┌────────────────────┐             │
│  │  Redshift Tools    │  │    CID Tools        │             │
│  │  (19 tools)        │  │    (6 tools)        │             │
│  │  psycopg2 driver   │  │    boto3 Athena      │             │
│  └────────────────────┘  └────────────────────┘             │
│  ┌────────────────────┐                                      │
│  │   RAG Tool (1)     │                                      │
│  │   Milvus + MiniLM  │                                      │
│  └────────────────────┘                                      │
└──────────────┬──────────────────────┬───────────────────────┘
               │ psycopg2             │ boto3
┌──────────────▼──────┐  ┌───────────▼─────────────────────┐
│  Amazon Redshift    │  │  AWS Athena (CID Views)          │
│  (cluster metrics,  │  │  (s3://cid-data/cudos/kpi/...)   │
│   system tables)    │  │                                  │
└─────────────────────┘  └─────────────────────────────────┘

Component Breakdown

FastAPI Server

The entry point is a FastAPI application (main.py) that exposes a single POST endpoint at /ask. It accepts a JSON body with a prompt field, invokes the LangChain agent, and returns the final response. The agent is run in a thread pool (asyncio.to_thread) to avoid blocking the async event loop during synchronous tool execution.

LangChain ReAct Agent

The agent uses LangChain's create_tool_calling_agent with GPT-4o-mini as the reasoning model (temperature=0 for deterministic analytical outputs). The ReAct pattern alternates between:

Thought — The LLM reasons about what information is needed
Action — A tool is selected and called with arguments
Observation — The tool result is fed back to the LLM
This cycle repeats until the LLM produces a Final Answer

Redshift Tools (19 tools)

Python functions decorated with LangChain's @tool decorator. Each tool connects to Redshift via psycopg2 and executes a targeted SQL query against Redshift system views. Tools are organized into three categories: query performance, health monitoring, and storage analysis. See the Redshift Tools section for full tool documentation.

CID Tools (6 tools)

Tools that query AWS Cost Intelligence Dashboard (CID) views via boto3 Athena client. The CID data lives in S3 as Parquet files; Athena provides SQL access. The list_cid_views() tool discovers available views at runtime, and query_cid_view() executes parameterized queries against them.

RAG Tool — Milvus + MiniLM

A single book_qa(question) tool that performs semantic search over pre-embedded Redshift documentation. The knowledge base contains ~7,500 chunks from two source documents, embedded with all-MiniLM-L6-v2 (384 dimensions) and stored in Milvus with an IVF_FLAT index. The tool retrieves the top-k relevant chunks and returns them as context for the agent to synthesize an answer.

Temperature = 0

The agent is configured with temperature=0 (fully deterministic). For analytical tasks — identifying savings opportunities, diagnosing performance issues, calculating metrics — deterministic outputs are essential so users can trust and reproduce the analysis. Creative variation is not desirable in a FinOps context.