ViduraLangChain AgentGPT-4o-mini
LangChain Agent Configuration
Vidura uses a LangChain ReAct agent (Reasoning + Acting) configured with GPT-4o-mini as the reasoning model and 25 tools for data retrieval. The agent iteratively reasons about user queries, selects tools, executes them, and synthesizes responses.
Agent Initialization
agent.py — Agent Creation
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent
# Language model configuration
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0, # Deterministic outputs for analytical tasks
api_key=os.getenv("OPENAI_API_KEY"),
)
# Combine all tools
TOOLS = REDSHIFT_TOOLS + CID_TOOLS # 19 + 6 = 25 tools
# Create the agent with tool-calling capability
agent = create_tool_calling_agent(
llm=llm,
tools=TOOLS,
prompt=system_prompt # FinOps analyst persona
)System Prompt (FinOps Persona)
The system prompt establishes the agent's identity and behavioral rules. It directs the agent to act as an AWS FinOps analyst who communicates clearly in structured markdown:
System Prompt Structure
You are an expert AWS FinOps Analyst and Redshift Performance Engineer.
PRIMARY FOCUS: AWS Cost Intelligence Dashboard (CID) analysis
SECONDARY FOCUS: Redshift cluster performance optimization
BEHAVIORAL RULES:
1. NEVER dump raw data — always analyze and synthesize findings
2. ALWAYS calculate potential savings in $/month AND $/year
3. Limit large datasets to top items by financial impact
4. Use markdown formatting consistently:
- ## for main sections
- ### for subsections
- Tables for comparative data
- Bold for key numbers and recommendations
5. Status indicators:
- 💰 = Savings opportunity
- ⚠️ = Warning / attention needed
- 🔴 = Critical / immediate action
- ✅ = Optimized / healthy
- 📊 = Data visualization note
6. Always compare against benchmarks or targets
7. Provide action items in priority order with ROI
RESPONSE STRUCTURE:
## Executive Summary (2-3 sentences)
## Key Findings (data tables)
## Top Opportunities (ranked by savings)
## KPI Scorecard (current vs. target)
## Recommended Actions (this week / this month / this quarter)ReAct Pattern
The agent uses the ReAct (Reasoning + Acting) pattern, alternating between reasoning steps and tool calls:
Example ReAct Trace
User: "What's causing our EC2 costs to spike this week?"
Thought: I need to look at EC2 cost trends. The CID summary_view
should show cost by service. Then I'll use the EC2-specific
view to break it down by instance type.
Action: query_cid_view("cid_ec2_running_cost", limit=20)
Observation: [Returns table of EC2 costs by instance type]
m5.4xlarge: $8,400 (+$2,100 vs last week, +33%)
c5.2xlarge: $4,200 (+$800, +24%)
...
Thought: The m5.4xlarge instances account for most of the spike.
I should check Compute Optimizer for rightsizing recommendations.
Action: query_cid_view("co_ec2_instance_options", limit=10)
Observation: [Returns rightsizing recommendations]
m5.4xlarge → m5.2xlarge: potential savings $4,200/month
...
Thought: I have enough data to provide a comprehensive answer
with savings recommendations.
Final Answer: [Structured markdown response with tables and recommendations]Tool Calling Protocol
The agent uses LangChain's tool-calling interface, which maps to OpenAI function calling under the hood. Each tool is defined with a name, description, and input schema:
python
@tool
def vacuum_stats() -> str:
"""
Returns the top 5 tables requiring vacuum based on unsorted row
percentage. Includes reclaimable space estimates and last vacuum
timestamps. Use this when asked about vacuum scheduling or table
maintenance priorities.
"""
# Implementation...Tool Descriptions Matter
The docstring of each tool function is critical — it's what the LLM reads to decide whether to call the tool. Poorly written descriptions lead to incorrect tool selection. The descriptions are written to be precise about what the tool returns and when to use it.
Agent Invocation
FastAPI Handler
@app.post("/ask")
async def ask_endpoint(request: AskRequest) -> AskResponse:
"""
Main entry point for natural language analytics queries.
"""
try:
# Run agent in thread pool to avoid blocking async loop
result = await asyncio.to_thread(
agent.invoke,
{"messages": [{"role": "user", "content": request.prompt}]}
)
# Extract final message content
response_text = result["messages"][-1].content
return AskResponse(response=response_text)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Agent execution failed: {str(e)}"
)