Skip to content

Agents

Agents are specialized workers that answer different types of questions.

SQLAgent writes queries. VegaLiteAgent creates charts. ChatAgent answers questions. Each agent has a specific job.

Most users never customize agents. The eight default agents handle typical data exploration needs.

Skip to

Default agents

Lumen includes eight agents automatically. You don't need to configure anything.

Agent What it does
SQLAgent Writes and runs SQL queries
VegaLiteAgent Creates charts and visualizations
AnalystAgent Explains query results and finds insights
ChatAgent Answers questions and provides guidance
TableListAgent Lists available tables and columns
DocumentListAgent Manages uploaded documents
SourceAgent Handles data uploads
ValidationAgent Checks if results answer the question

These agents work together automatically. The coordinator picks which agents to use for each question.

Use specific agents only

Include only the agents you need:

Limit to specific agents
import lumen.ai as lmai
from lumen.ai.agents import ChatAgent, SQLAgent, VegaLiteAgent

ui = lmai.ExplorerUI(
    data='penguins.csv',
    default_agents=[ChatAgent, SQLAgent, VegaLiteAgent]
)
ui.servable()

Why limit agents?

  • Faster planning (fewer options to consider)
  • Lower costs (fewer agents = fewer LLM calls during planning)
  • Simpler behavior (predictable agent selection)

Most users should keep all default agents. Only customize if you have specific needs.

Add a custom agent

Add your own agent for specialized tasks:

Minimal custom agent
import lumen.ai as lmai
from lumen.ai.context import ContextModel
from pydantic import Field

class MyInputs(ContextModel):
    data: dict = Field(description="The data to process")

class MyOutputs(ContextModel):
    summary: str = Field(description="Summary result")

class SummaryAgent(lmai.agents.Agent):
    purpose = "Creates executive summaries of data"

    input_schema = MyInputs
    output_schema = MyOutputs

    async def respond(self, messages, context, **kwargs):
        # Your logic here
        return [outputs], context

ui = lmai.ExplorerUI(
    data='penguins.csv',
    agents=[SummaryAgent()]  # (1)!
)
ui.servable()
  1. Adds your agent alongside the default agents

See Creating custom agents below for complete examples.

Use different models per agent

Configure which LLM model each agent uses:

Different models per agent
import lumen.ai as lmai

model_config = {
    "default": {"model": "gpt-4o-mini"},  # Cheap model for most agents
    "sql": {"model": "gpt-4o"},           # Powerful model for SQL
    "vega_lite": {"model": "gpt-4o"},     # Powerful model for charts
    "analyst": {"model": "gpt-4o"},       # Powerful model for analysis
}

llm = lmai.llm.OpenAI(model_kwargs=model_config)

ui = lmai.ExplorerUI(data='penguins.csv', llm=llm)
ui.servable()

Model types match agent names:

  • SQLAgent uses the "sql" model
  • VegaLiteAgent uses the "vega_lite" model
  • AnalystAgent uses the "analyst" model
  • ChatAgent uses the "chat" model (falls back to "default" if not specified)

Agent class names are converted to model keys automatically (e.g., SQLAgent"sql", VegaLiteAgent"vega_lite").

See LLM Providers for complete details.

Creating custom agents

Custom agents let you add specialized capabilities to Lumen.

When to create a custom agent

Create a custom agent when:

  • You need domain-specific analysis (financial metrics, scientific calculations)
  • You want to integrate external APIs or services
  • You need specialized data transformations
  • Built-in agents don't match your workflow

Don't create a custom agent when:

  • You can solve it with custom analyses (simpler approach)
  • You can use tools instead (tools don't require async/await)
  • A built-in agent already handles it

Basic custom agent structure

Custom agent structure
import lumen.ai as lmai
from lumen.ai.context import ContextModel
from pydantic import Field

# Define what the agent needs
class MyInputs(ContextModel):
    pipeline: object = Field(description="Data pipeline to process")

# Define what the agent provides
class MyOutputs(ContextModel):
    summary: str = Field(description="Summary of findings")

class MyAgent(lmai.agents.Agent):
    purpose = "Summarizes data in executive format"

    input_schema = MyInputs  # (1)!
    output_schema = MyOutputs  # (2)!

    prompts = {
        "main": {
            "template": "Summarize this data: {{ memory['data'] }}"
        }
    }

    async def respond(self, messages, context, **kwargs):
        # Render prompt
        system = await self._render_prompt("main", messages, context)

        # Get LLM response
        response = await self.llm.invoke(messages, system=system)

        # Return outputs and updated context
        return [response], {"summary": str(response)}
  1. Agent requires pipeline in context to run
  2. Agent adds summary to context after running

Complete working example

This agent calculates statistical metrics:

Statistics agent
import lumen.ai as lmai
from lumen.ai.context import ContextModel
from pydantic import Field
import pandas as pd

class StatsInputs(ContextModel):
    pipeline: object = Field(description="Data pipeline")

class StatsOutputs(ContextModel):
    statistics: str = Field(description="Statistical summary")

class StatisticsAgent(lmai.agents.Agent):
    purpose = "Calculates descriptive statistics for numerical columns"

    input_schema = StatsInputs
    output_schema = StatsOutputs

    prompts = {
        "main": {
            "template": """
Analyze these statistics and explain key findings:

{{ stats }}

Focus on:

- Notable values (very high/low)
- Spread and variability  
- Potential outliers
"""
        }
    }

    async def respond(self, messages, context, **kwargs):
        # Get data
        pipeline = context['pipeline']
        df = pipeline.data

        # Calculate stats
        stats = df.describe().to_string()

        # Get LLM interpretation
        system = await self._render_prompt("main", messages, context, stats=stats)
        interpretation = await self.llm.invoke(messages, system=system)

        # Return results
        return [interpretation], {"statistics": str(interpretation)}

# Use the agent
ui = lmai.ExplorerUI(
    data='penguins.csv',
    agents=[StatisticsAgent()]
)
ui.servable()

Now you can ask "What are the statistics for this dataset?" and the agent will run.

Agent components explained

purpose - One-sentence description of what the agent does. The coordinator uses this to decide when to invoke the agent.

input_schema - TypedDict defining what data the agent needs from context. The agent can only run when these requirements are met.

output_schema - TypedDict defining what data the agent adds to context. Other agents can use these outputs.

prompts - Dictionary of prompt templates. Most agents only need a "main" prompt.

respond() - The async method that does the work. Must return (outputs_list, updated_context_dict).

Control when agents are used

Use conditions to specify when the agent should run:

Agent with conditions
import param

class ReportAgent(lmai.agents.Agent):
    purpose = "Creates PDF reports"

    conditions = param.List(default=[
        "Use when user explicitly asks for a report or PDF",
        "Use after data analysis is complete",
        "NOT for simple questions or queries"
    ])

    input_schema = MyInputs
    output_schema = MyOutputs

The coordinator reads these conditions when deciding which agent to use.

Prevent agent conflicts

Use not_with to prevent agents from being used together:

Prevent conflicting agents
class FastSummaryAgent(lmai.agents.Agent):
    purpose = "Quick data summaries"

    not_with = param.List(default=["DetailedAnalysisAgent"])

Common patterns

Call external APIs
import httpx

class WeatherAgent(lmai.agents.Agent):
    purpose = "Fetches current weather data"

    async def respond(self, messages, context, **kwargs):
        async with httpx.AsyncClient() as client:
            response = await client.get("https://api.weather.gov/...")
            weather_data = response.json()

        summary = f"Current temperature: {weather_data['temp']}°F"
        return [summary], {"weather": summary}
Extract PDF text
class PDFAgent(lmai.agents.Agent):
    purpose = "Extracts text from PDF documents"

    async def respond(self, messages, context, **kwargs):
        documents = context.get('documents', [])

        extracted_text = []
        for doc in documents:
            if doc['type'] == 'pdf':
                text = extract_pdf_text(doc['content'])
                extracted_text.append(text)

        return [extracted_text], {"pdf_text": extracted_text}
Data quality checks
import great_expectations as gx

class DataQualityAgent(lmai.agents.Agent):
    purpose = "Checks data quality using Great Expectations"

    async def respond(self, messages, context, **kwargs):
        df = context['pipeline'].data

        # Run validations
        results = run_quality_checks(df)

        # Summarize findings
        system = await self._render_prompt(
            "main", messages, context, results=results
        )
        summary = await self.llm.invoke(messages, system=system)

        return [summary], {"quality_report": str(summary)}

Common issues

"Agent has unmet requirements"

The agent's input_schema requires data that doesn't exist in context.

How to fix:

Make fields optional
from typing import NotRequired

class MyInputs(ContextModel):
    pipeline: object  # Required
    analysis: NotRequired[str]  # Optional

Or ensure another agent provides the required data first.

Agent never gets invoked

The coordinator doesn't think the agent is relevant.

How to fix:

  1. Make the purpose more specific and clear
  2. Add conditions that describe when to use it
  3. Check that input_schema requirements can be satisfied
  4. Enable log_level='DEBUG' in the UI to see coordinator decisions

Agent fails with "KeyError"

The agent tried to access context data that doesn't exist.

Always check before accessing context

# Bad - assumes 'data' exists
data = context['data']  # ❌ KeyError if missing

# Good - checks first
data = context.get('data')  # ✅ Returns None if missing
if data is None:
    return [{"error": "No data available"}], context

Best practices

Keep agents focused. One agent should do one thing well. Don't create a "do everything" agent.

Write clear purposes. The coordinator uses purpose to decide when to invoke agents. Make it specific and actionable.

Test with real queries. Different LLM models behave differently. Test your agent with your actual LLM.

Handle missing data gracefully. Always check for required data before using it. Provide helpful error messages.

Use tools for simple functions. If your agent doesn't need async/await or complex prompting, use a tool instead.

Don't duplicate built-ins. Check if a built-in agent already does what you need before creating a custom one.