ViduraLangChain AgentGPT-4o-mini

LangChain Agent Configuration

Vidura uses a LangChain ReAct agent (Reasoning + Acting) configured with GPT-4o-mini as the reasoning model and 25 tools for data retrieval. The agent iteratively reasons about user queries, selects tools, executes them, and synthesizes responses.

Agent Initialization

agent.py — Agent Creation
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent

# Language model configuration
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,          # Deterministic outputs for analytical tasks
    api_key=os.getenv("OPENAI_API_KEY"),
)

# Combine all tools
TOOLS = REDSHIFT_TOOLS + CID_TOOLS  # 19 + 6 = 25 tools

# Create the agent with tool-calling capability
agent = create_tool_calling_agent(
    llm=llm,
    tools=TOOLS,
    prompt=system_prompt    # FinOps analyst persona
)

System Prompt (FinOps Persona)

The system prompt establishes the agent's identity and behavioral rules. It directs the agent to act as an AWS FinOps analyst who communicates clearly in structured markdown:

System Prompt Structure
You are an expert AWS FinOps Analyst and Redshift Performance Engineer.

PRIMARY FOCUS: AWS Cost Intelligence Dashboard (CID) analysis
SECONDARY FOCUS: Redshift cluster performance optimization

BEHAVIORAL RULES:
1. NEVER dump raw data — always analyze and synthesize findings
2. ALWAYS calculate potential savings in $/month AND $/year
3. Limit large datasets to top items by financial impact
4. Use markdown formatting consistently:
   - ## for main sections
   - ### for subsections
   - Tables for comparative data
   - Bold for key numbers and recommendations
5. Status indicators:
   - 💰 = Savings opportunity
   - ⚠️ = Warning / attention needed
   - 🔴 = Critical / immediate action
   - ✅ = Optimized / healthy
   - 📊 = Data visualization note
6. Always compare against benchmarks or targets
7. Provide action items in priority order with ROI

RESPONSE STRUCTURE:
## Executive Summary (2-3 sentences)
## Key Findings (data tables)
## Top Opportunities (ranked by savings)
## KPI Scorecard (current vs. target)
## Recommended Actions (this week / this month / this quarter)

ReAct Pattern

The agent uses the ReAct (Reasoning + Acting) pattern, alternating between reasoning steps and tool calls:

Example ReAct Trace
User: "What's causing our EC2 costs to spike this week?"

Thought: I need to look at EC2 cost trends. The CID summary_view
         should show cost by service. Then I'll use the EC2-specific
         view to break it down by instance type.

Action: query_cid_view("cid_ec2_running_cost", limit=20)

Observation: [Returns table of EC2 costs by instance type]
             m5.4xlarge: $8,400 (+$2,100 vs last week, +33%)
             c5.2xlarge: $4,200 (+$800, +24%)
             ...

Thought: The m5.4xlarge instances account for most of the spike.
         I should check Compute Optimizer for rightsizing recommendations.

Action: query_cid_view("co_ec2_instance_options", limit=10)

Observation: [Returns rightsizing recommendations]
             m5.4xlarge → m5.2xlarge: potential savings $4,200/month
             ...

Thought: I have enough data to provide a comprehensive answer
         with savings recommendations.

Final Answer: [Structured markdown response with tables and recommendations]

Tool Calling Protocol

The agent uses LangChain's tool-calling interface, which maps to OpenAI function calling under the hood. Each tool is defined with a name, description, and input schema:

python
@tool
def vacuum_stats() -> str:
    """
    Returns the top 5 tables requiring vacuum based on unsorted row
    percentage. Includes reclaimable space estimates and last vacuum
    timestamps. Use this when asked about vacuum scheduling or table
    maintenance priorities.
    """
    # Implementation...

Tool Descriptions Matter

The docstring of each tool function is critical — it's what the LLM reads to decide whether to call the tool. Poorly written descriptions lead to incorrect tool selection. The descriptions are written to be precise about what the tool returns and when to use it.

Agent Invocation

FastAPI Handler
@app.post("/ask")
async def ask_endpoint(request: AskRequest) -> AskResponse:
    """
    Main entry point for natural language analytics queries.
    """
    try:
        # Run agent in thread pool to avoid blocking async loop
        result = await asyncio.to_thread(
            agent.invoke,
            {"messages": [{"role": "user", "content": request.prompt}]}
        )

        # Extract final message content
        response_text = result["messages"][-1].content

        return AskResponse(response=response_text)

    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Agent execution failed: {str(e)}"
        )