Agents¶
Agents are specialized workers that answer different types of questions.
SQLAgent writes queries. VegaLiteAgent creates charts. ChatAgent answers questions. Each agent has a specific job.
Most users never customize agents. The eight default agents handle typical data exploration needs.
Skip to¶
- See which agents exist - What each agent does
- Add custom agents - Extend Lumen with new capabilities
- Remove agents - Use only some agents
- Configure agent models - Control which LLM each agent uses
Default agents¶
Lumen includes eight agents automatically. You don't need to configure anything.
| Agent | What it does |
|---|---|
| SQLAgent | Writes and runs SQL queries |
| VegaLiteAgent | Creates charts and visualizations |
| AnalystAgent | Explains query results and finds insights |
| ChatAgent | Answers questions and provides guidance |
| TableListAgent | Lists available tables and columns |
| DocumentListAgent | Manages uploaded documents |
| SourceAgent | Handles data uploads |
| ValidationAgent | Checks if results answer the question |
These agents work together automatically. The coordinator picks which agents to use for each question.
Use specific agents only¶
Include only the agents you need:
import lumen.ai as lmai
from lumen.ai.agents import ChatAgent, SQLAgent, VegaLiteAgent
ui = lmai.ExplorerUI(
data='penguins.csv',
default_agents=[ChatAgent, SQLAgent, VegaLiteAgent]
)
ui.servable()
Why limit agents?
- Faster planning (fewer options to consider)
- Lower costs (fewer agents = fewer LLM calls during planning)
- Simpler behavior (predictable agent selection)
Most users should keep all default agents. Only customize if you have specific needs.
Add a custom agent¶
Add your own agent for specialized tasks:
import lumen.ai as lmai
from lumen.ai.context import ContextModel
from pydantic import Field
class MyInputs(ContextModel):
data: dict = Field(description="The data to process")
class MyOutputs(ContextModel):
summary: str = Field(description="Summary result")
class SummaryAgent(lmai.agents.Agent):
purpose = "Creates executive summaries of data"
input_schema = MyInputs
output_schema = MyOutputs
async def respond(self, messages, context, **kwargs):
# Your logic here
return [outputs], context
ui = lmai.ExplorerUI(
data='penguins.csv',
agents=[SummaryAgent()] # (1)!
)
ui.servable()
- Adds your agent alongside the default agents
See Creating custom agents below for complete examples.
Use different models per agent¶
Configure which LLM model each agent uses:
import lumen.ai as lmai
model_config = {
"default": {"model": "gpt-4o-mini"}, # Cheap model for most agents
"sql": {"model": "gpt-4o"}, # Powerful model for SQL
"vega_lite": {"model": "gpt-4o"}, # Powerful model for charts
"analyst": {"model": "gpt-4o"}, # Powerful model for analysis
}
llm = lmai.llm.OpenAI(model_kwargs=model_config)
ui = lmai.ExplorerUI(data='penguins.csv', llm=llm)
ui.servable()
Model types match agent names:
- SQLAgent uses the
"sql"model - VegaLiteAgent uses the
"vega_lite"model - AnalystAgent uses the
"analyst"model - ChatAgent uses the
"chat"model (falls back to"default"if not specified)
Agent class names are converted to model keys automatically (e.g., SQLAgent → "sql", VegaLiteAgent → "vega_lite").
See LLM Providers for complete details.
Creating custom agents¶
Custom agents let you add specialized capabilities to Lumen.
When to create a custom agent¶
Create a custom agent when:
- You need domain-specific analysis (financial metrics, scientific calculations)
- You want to integrate external APIs or services
- You need specialized data transformations
- Built-in agents don't match your workflow
Don't create a custom agent when:
- You can solve it with custom analyses (simpler approach)
- You can use tools instead (tools don't require async/await)
- A built-in agent already handles it
Basic custom agent structure¶
import lumen.ai as lmai
from lumen.ai.context import ContextModel
from pydantic import Field
# Define what the agent needs
class MyInputs(ContextModel):
pipeline: object = Field(description="Data pipeline to process")
# Define what the agent provides
class MyOutputs(ContextModel):
summary: str = Field(description="Summary of findings")
class MyAgent(lmai.agents.Agent):
purpose = "Summarizes data in executive format"
input_schema = MyInputs # (1)!
output_schema = MyOutputs # (2)!
prompts = {
"main": {
"template": "Summarize this data: {{ memory['data'] }}"
}
}
async def respond(self, messages, context, **kwargs):
# Render prompt
system = await self._render_prompt("main", messages, context)
# Get LLM response
response = await self.llm.invoke(messages, system=system)
# Return outputs and updated context
return [response], {"summary": str(response)}
- Agent requires
pipelinein context to run - Agent adds
summaryto context after running
Complete working example¶
This agent calculates statistical metrics:
Now you can ask "What are the statistics for this dataset?" and the agent will run.
Agent components explained¶
purpose - One-sentence description of what the agent does. The coordinator uses this to decide when to invoke the agent.
input_schema - TypedDict defining what data the agent needs from context. The agent can only run when these requirements are met.
output_schema - TypedDict defining what data the agent adds to context. Other agents can use these outputs.
prompts - Dictionary of prompt templates. Most agents only need a "main" prompt.
respond() - The async method that does the work. Must return (outputs_list, updated_context_dict).
Control when agents are used¶
Use conditions to specify when the agent should run:
import param
class ReportAgent(lmai.agents.Agent):
purpose = "Creates PDF reports"
conditions = param.List(default=[
"Use when user explicitly asks for a report or PDF",
"Use after data analysis is complete",
"NOT for simple questions or queries"
])
input_schema = MyInputs
output_schema = MyOutputs
The coordinator reads these conditions when deciding which agent to use.
Prevent agent conflicts¶
Use not_with to prevent agents from being used together:
class FastSummaryAgent(lmai.agents.Agent):
purpose = "Quick data summaries"
not_with = param.List(default=["DetailedAnalysisAgent"])
Common patterns¶
import httpx
class WeatherAgent(lmai.agents.Agent):
purpose = "Fetches current weather data"
async def respond(self, messages, context, **kwargs):
async with httpx.AsyncClient() as client:
response = await client.get("https://api.weather.gov/...")
weather_data = response.json()
summary = f"Current temperature: {weather_data['temp']}°F"
return [summary], {"weather": summary}
class PDFAgent(lmai.agents.Agent):
purpose = "Extracts text from PDF documents"
async def respond(self, messages, context, **kwargs):
documents = context.get('documents', [])
extracted_text = []
for doc in documents:
if doc['type'] == 'pdf':
text = extract_pdf_text(doc['content'])
extracted_text.append(text)
return [extracted_text], {"pdf_text": extracted_text}
import great_expectations as gx
class DataQualityAgent(lmai.agents.Agent):
purpose = "Checks data quality using Great Expectations"
async def respond(self, messages, context, **kwargs):
df = context['pipeline'].data
# Run validations
results = run_quality_checks(df)
# Summarize findings
system = await self._render_prompt(
"main", messages, context, results=results
)
summary = await self.llm.invoke(messages, system=system)
return [summary], {"quality_report": str(summary)}
Common issues¶
"Agent has unmet requirements"¶
The agent's input_schema requires data that doesn't exist in context.
How to fix:
from typing import NotRequired
class MyInputs(ContextModel):
pipeline: object # Required
analysis: NotRequired[str] # Optional
Or ensure another agent provides the required data first.
Agent never gets invoked¶
The coordinator doesn't think the agent is relevant.
How to fix:
- Make the
purposemore specific and clear - Add
conditionsthat describe when to use it - Check that
input_schemarequirements can be satisfied - Enable
log_level='DEBUG'in the UI to see coordinator decisions
Agent fails with "KeyError"¶
The agent tried to access context data that doesn't exist.
Always check before accessing context
Best practices¶
Keep agents focused. One agent should do one thing well. Don't create a "do everything" agent.
Write clear purposes. The coordinator uses purpose to decide when to invoke agents. Make it specific and actionable.
Test with real queries. Different LLM models behave differently. Test your agent with your actual LLM.
Handle missing data gracefully. Always check for required data before using it. Provide helpful error messages.
Use tools for simple functions. If your agent doesn't need async/await or complex prompting, use a tool instead.
Don't duplicate built-ins. Check if a built-in agent already does what you need before creating a custom one.