Data Sources¶
Connect Lumen to files, databases, or data warehouses.
Quick start¶
Works with CSV, Parquet, JSON, and URLs.
import lumen.ai as lmai
ui = lmai.ExplorerUI(data=['penguins.csv', 'earthquakes.parquet'])
ui.servable()
Supported sources¶
| Source | Use for |
|---|---|
| Files | CSV, Parquet, JSON (local or URL) |
| DuckDB | Local SQL queries on files |
| Snowflake | Cloud data warehouse |
| BigQuery | Google's data warehouse |
| PostgreSQL | PostgreSQL via SQLAlchemy |
| MySQL | MySQL via SQLAlchemy |
| SQLite | SQLite via SQLAlchemy |
| Oracle | Oracle via SQLAlchemy |
| MSSQL | Microsoft SQL Server via SQLAlchemy |
| Intake | Data catalogs |
Database connections¶
Snowflake¶
from lumen.sources.snowflake import SnowflakeSource
import lumen.ai as lmai
source = SnowflakeSource(
account='your-account',
database='your-database',
authenticator='externalbrowser', # SSO
)
ui = lmai.ExplorerUI(data=source)
ui.servable()
Authentication options:
authenticator='externalbrowser'- SSO (recommended)authenticator='snowflake'- Username/password (needspassword=)authenticator='oauth'- OAuth token (needstoken=)
Select specific tables:
source = SnowflakeSource(
account='your-account',
database='your-database',
tables=['CUSTOMERS', 'ORDERS']
)
BigQuery¶
from lumen.sources.bigquery import BigQuerySource
source = BigQuerySource(
project_id='your-project-id',
tables=['dataset.table1', 'dataset.table2']
)
ui = lmai.ExplorerUI(data=source)
ui.servable()
Authentication:
Or set service account:
PostgreSQL¶
from lumen.sources.sqlalchemy import SQLAlchemySource
source = SQLAlchemySource(
url='postgresql://user:password@localhost:5432/database'
)
ui = lmai.ExplorerUI(data=source)
ui.servable()
Or use individual parameters:
source = SQLAlchemySource(
drivername='postgresql+psycopg2',
username='user',
password='password',
host='localhost',
port=5432,
database='mydb'
)
MySQL¶
from lumen.sources.sqlalchemy import SQLAlchemySource
source = SQLAlchemySource(
url='mysql+pymysql://user:password@localhost:3306/database'
)
SQLite¶
from lumen.sources.sqlalchemy import SQLAlchemySource
source = SQLAlchemySource(url='sqlite:///data.db')
Advanced file handling¶
DuckDB for SQL on files¶
Run SQL directly on CSV/Parquet files:
from lumen.sources.duckdb import DuckDBSource
source = DuckDBSource(
tables={
'penguins': 'penguins.csv',
'quakes': "read_csv('https://earthquake.usgs.gov/data.csv')",
}
)
Load remote files:
source = DuckDBSource(
tables=['https://datasets.holoviz.org/penguins/v1/penguins.csv'],
initializers=[
'INSTALL httpfs;',
'LOAD httpfs;'
] # (1)!
)
- Required for HTTP/S3 access
Multiple sources¶
from lumen.sources.snowflake import SnowflakeSource
from lumen.sources.duckdb import DuckDBSource
snowflake = SnowflakeSource(account='...', database='...')
local = DuckDBSource(tables=['local.csv'])
ui = lmai.ExplorerUI(data=[snowflake, local])
ui.servable()
Custom table names¶
source = DuckDBSource(
tables={
'customers': 'customer_data.csv', # (1)!
'orders': 'order_history.parquet',
}
)
- Use 'customers' instead of 'customer_data.csv' in queries
Troubleshooting¶
"Table not found" - Table names are case-sensitive. Check exact names.
"Connection failed" - Verify credentials and network access.
"File not found" - Use absolute paths or URLs. Relative paths are relative to where you run the command.
Slow queries - If using DuckDB on files, it's fast. Slowness usually comes from the database or network, not Lumen.
Best practices¶
Start with files for development. Move to databases for production.
Use URLs for shared datasets that don't change often.
Limit tables when possible - faster planning and lower LLM costs.
Name tables clearly - Use meaningful names instead of generic file names.