Skip to content

Transforms

lumen.transforms

base

The Transform components allow transforming tables in arbitrary ways.

DataFrame = pd.DataFrame | dDataFrame module-attribute

Series = pd.Series | dSeries module-attribute

pd_version = Version(pd.__version__) module-attribute

Aggregate

Bases: Transform

Aggregate one or more columns or indexes, see pandas.DataFrame.groupby.

by must be provided.

df.groupby(<by>)[<columns>].<method>()[.reset_index()]

by = param.ListSelector(doc='\n Columns or indexes to group by.') class-attribute instance-attribute
columns = param.ListSelector(allow_None=True, doc='\n Columns to aggregate.') class-attribute instance-attribute
kwargs = param.Dict(default={}, doc='\n Keyword arguments to the aggregation method.') class-attribute instance-attribute
method = param.String(default='mean', doc='\n Name of the pandas aggregation method, e.g. max, min, count.') class-attribute instance-attribute
transform_type = 'aggregate' class-attribute
with_index = param.Boolean(default=True, doc='\n Whether to make the groupby columns indexes.') class-attribute instance-attribute
apply(table)

Astype

Bases: Transform

Astype transforms the type of one or more columns.

dtypes = param.Dict(doc='Mapping from column name to new type.') class-attribute instance-attribute
transform_type = 'as_type' class-attribute
apply(table)

Columns

Bases: Transform

Columns selects a subset of columns.

df[<columns>]

columns = param.ListSelector(doc='\n The subset of columns to select.') class-attribute instance-attribute
transform_type = 'columns' class-attribute
apply(table)

Compute

Bases: Transform

Compute turns a dask.dataframe.DataFrame into a pandas.DataFrame.

transform_type = 'compute' class-attribute
apply(table)

Corr

Bases: Transform

Corr computes pairwise correlation of columns, excluding NA/null values.

method = param.Selector(default='pearson', objects=['pearson', 'kendall', 'spearman'], doc='\n Method of correlation.') class-attribute instance-attribute
min_periods = param.Integer(default=1, doc='\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.') class-attribute instance-attribute
numeric_only = param.Boolean(default=False, doc='\n Include only `float`, `int` or `boolean` data.') class-attribute instance-attribute
transform_type = 'corr' class-attribute
apply(table)

Count

Bases: Transform

Counts non-nan values in each column of the DataFrame and returns a new DataFrame with a single row with a count for each original column, see pandas.DataFrame.count.

df.count(axis=, level=, numeric_only=).to_frame().T

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute
level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.') class-attribute instance-attribute
numeric_only = param.Boolean(default=False, doc='\n Include only float, int or boolean data.') class-attribute instance-attribute
transform_type = 'count' class-attribute
apply(table)

DropNA

Bases: Transform

DropNA drops rows with any missing values.

df.dropna(axis=<axis>, how=<how>, thresh=<thresh>, subset=<subset>)

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute
how = param.Selector(default='any', objects=['any', 'all'], doc='\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.') class-attribute instance-attribute
subset = param.ListSelector(default=None, doc='\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.') class-attribute instance-attribute
thresh = param.Integer(default=None, doc='\n Require that many non-NA values.') class-attribute instance-attribute
transform_type = 'dropna' class-attribute
apply(table)

Eval

Bases: Transform

Applies an eval assignment expression to a DataFrame. The expression can reference columns on the original table by referencing table.<column> and must assign to a variable that will become a new column in the DataFrame, e.g. to divide a value column by one thousand and assign the result to a new column called kilo_value you can write an expr like:

kilo_value = table.value / 1000

See pandas.eval for more information.

expr = param.String(doc='\n The expression to apply to the table.') class-attribute instance-attribute
transform_type = 'eval' class-attribute
apply(table)

Filter

Bases: Transform

Filter transform implement the filtering behavior of Filter components.

The filter conditions must be declared as a list of tuple containing the name of the column to be filtered and one of the following:

  • scalar: A scalar value will be matched using equality operators
  • tuple: A tuple value specifies a numeric or date range.
  • list: A list value specifies a set of categories to match against.
  • list(tuple): A list of tuples specifies a list of ranges.
conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.') class-attribute instance-attribute
apply(df)

HistoryTransform

Bases: Transform

HistoryTransform accumulates a history of the queried data.

The internal buffer accumulates data up to the supplied length and (optionally) adds a date_column to the data.

date_column = param.Selector(doc='\n If defined adds a date column with the supplied name.') class-attribute instance-attribute
length = param.Integer(default=10, bounds=(1, None), doc='\n Accumulates a history of data.') class-attribute instance-attribute
transform_type = 'history' class-attribute instance-attribute
apply(table)

Accumulates a history of the data in a buffer up to the declared length and optionally adds the current datetime to the declared date_column.

Parameters:

Name Type Description Default
table DataFrame

The queried table as a DataFrame.

required

Returns:

Type Description
DataFrame

A DataFrame containing the buffered history of the data.

Iloc

Bases: Transform

Iloc allows selecting the data with integer indexing, see pandas.DataFrame.iloc.

df.iloc[<start>:<end>]

end = param.Integer(default=None) class-attribute instance-attribute
start = param.Integer(default=None) class-attribute instance-attribute
transform_type = 'iloc' class-attribute
apply(table)

Melt

Bases: Transform

Melt applies the pandas.melt operation given the id_vars and value_vars.

id_vars = param.ListSelector(default=[], doc='\n Column(s) to use as identifier variables.') class-attribute instance-attribute
ignore_index = param.Boolean(default=True, doc='\n If True, original index is ignored. If False, the original\n index is retained. Index labels will be repeated as\n necessary.') class-attribute instance-attribute
transform_type = 'melt' class-attribute
value_name = param.String(default='value', doc="\n Name to use for the 'value' column.") class-attribute instance-attribute
value_vars = param.ListSelector(default=None, doc='\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.') class-attribute instance-attribute
var_name = param.String(default=None, doc="\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.") class-attribute instance-attribute
apply(table)

Pivot

Bases: Transform

Pivot applies pandas.DataFrame.pivot given an index, columns, and values.

columns = param.String(default=None, doc="\n Column to use to make new frame's columns.") class-attribute instance-attribute
index = param.String(default=None, doc="\n Column to use to make new frame's index.\n If None, uses existing index.") class-attribute instance-attribute
transform_type = 'pivot' class-attribute
values = param.ListSelector(default=None, doc="\n Column(s) to use for populating new frame's values.\n If not specified, all remaining columns will be used\n and the result will have hierarchically indexed columns.") class-attribute instance-attribute
apply(table)

PivotTable

Bases: Transform

PivotTable applies pandas.pivot_table` to the data.

aggfunc = param.String(default='mean', doc="\n Function, list of functions, dict, default 'mean'") class-attribute instance-attribute
columns = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.') class-attribute instance-attribute
index = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.') class-attribute instance-attribute
values = param.ListSelector(default=[], doc='\n Column or columns to aggregate.') class-attribute instance-attribute
apply(table)

Query

Bases: Transform

Query applies the pandas.DataFrame.query method.

df.query(<query>)

query = param.String(doc='\n The query to apply to the table.') class-attribute instance-attribute
transform_type = 'query' class-attribute
apply(table)

Rename

Bases: Transform

Rename renames columns or indexes, see pandas.DataFrame.rename.

df.rename(mapper=, columns=, index=, level=, axis=, copy=)

axis = param.ClassSelector(default=None, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute
columns = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=1` is equivalent to\n `columns=mapper`).') class-attribute instance-attribute
copy = param.Boolean(default=False, doc='\n Also copy underlying data.') class-attribute instance-attribute
index = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=0` is equivalent to\n `index=mapper`).') class-attribute instance-attribute
level = param.ClassSelector(default=None, class_=(int, str), doc='\n In case of a MultiIndex, only rename labels in the specified level.') class-attribute instance-attribute
mapper = param.Dict(default=None, doc="\n Dict to apply to that axis' values. Use either `mapper` and `axis` to\n specify the axis to target with `mapper`, or `index` and `columns`.") class-attribute instance-attribute
transform_type = 'rename' class-attribute
apply(table)

RenameAxis

Bases: Transform

Set the name of the axis for the index or columns, see pandas.DataFrame.rename_axis.

df.rename_axis(mapper=, columns=, index=, axis=, copy=)

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute
columns = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.") class-attribute instance-attribute
copy = param.Boolean(default=True, doc='\n Also copy underlying data.') class-attribute instance-attribute
index = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.") class-attribute instance-attribute
mapper = param.ClassSelector(default=None, class_=(str, list), doc='\n Value to set the axis name attribute.') class-attribute instance-attribute
transform_type = 'rename_axis' class-attribute
apply(table)

ResetIndex

Bases: Transform

ResetIndex resets DataFrame indexes to columns or drops them, see pandas.DataFrame.reset_index

df.reset_index(drop=<drop>, col_fill=<col_fill>, col_level=<col_level>, level=<level>)

col_fill = param.String(default='', doc='\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.') class-attribute instance-attribute
col_level = param.ClassSelector(default=0, class_=(int, str), doc='\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the\n first level.') class-attribute instance-attribute
drop = param.Boolean(default=False, doc='\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.') class-attribute instance-attribute
level = param.ClassSelector(default=None, class_=(int, str, list), doc='\n Only remove the given levels from the index. Removes all levels\n by default.') class-attribute instance-attribute
transform_type = 'reset_index' class-attribute
apply(table)

Sample

Bases: Transform

Sample returns a random sample of items.

df.sample(n=<n>, frac=<frac>, replace=<replace>)

frac = param.Number(default=None, bounds=(0, 1), doc='\n Fraction of axis items to return.') class-attribute instance-attribute
n = param.Integer(default=None, doc='\n Number of items to return.') class-attribute instance-attribute
replace = param.Boolean(default=False, doc='\n Sample with or without replacement.') class-attribute instance-attribute
transform_type = 'sample' class-attribute
apply(table)

SetIndex

Bases: Transform

SetIndex promotes DataFrame columns to indexes, see pandas.DataFrame.set_index.

df.set_index(<keys>, drop=<drop>, append=<append>, verify_integrity=<verify_integrity>)

append = param.Boolean(default=False, doc='\n Whether to append columns to existing index.') class-attribute instance-attribute
drop = param.Boolean(default=True, doc='\n Delete columns to be used as the new index.') class-attribute instance-attribute
keys = param.ClassSelector(default=None, class_=(str, list), doc='\n This parameter can be either a single column key or a list\n containing column keys.') class-attribute instance-attribute
transform_type = 'set_index' class-attribute
verify_integrity = param.Boolean(default=False, doc='\n Check the new index for duplicates. Otherwise defer the check\n until necessary. Setting to False will improve the performance\n of this method.') class-attribute instance-attribute
apply(table)

Sort

Bases: Transform

Sort on one or more columns, see pandas.DataFrame.sort_values.

df.sort_values(<by>, ascending=<ascending>)

ascending = param.ClassSelector(default=True, class_=(bool, list), doc='\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.') class-attribute instance-attribute
by = param.ListSelector(default=[], doc='\n Columns or indexes to sort by.') class-attribute instance-attribute
transform_type = 'sort' class-attribute
apply(table)

Stack

Bases: Transform

Stack applies pandas.DataFrame.stack to the declared level.

df.stack(<level>)

dropna = param.Boolean(default=True, doc='\n Whether to drop rows in the resulting Frame/Series with missing values.\n Stacking a column level onto the index axis can create combinations of\n index and column values that are missing from the original\n dataframe.') class-attribute instance-attribute
level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to stack.') class-attribute instance-attribute
transform_type = 'stack' class-attribute
apply(table)

Sum

Bases: Transform

Sums numeric values in each column of the DataFrame and returns a new DataFrame with a single row containing the sum for each original column, see pandas.DataFrame.sum.

df.count(axis=, level=).to_frame().T

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute
level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.') class-attribute instance-attribute
transform_type = 'sum' class-attribute
apply(table)

Transform

Bases: MultiTypeComponent

Transform components implement transforms of DataFrame objects.

control_panel property
controls = param.List(default=[], doc='\n Parameters that should be exposed as widgets in the UI.') class-attribute instance-attribute
transform_type = None class-attribute
apply(table)

Given a table transform it in some way and return it.

Parameters:

Name Type Description Default
table DataFrame

The queried table as a DataFrame.

required

Returns:

Type Description
DataFrame

A DataFrame containing the transformed data.

apply_to(table, **kwargs) classmethod

Calls the apply method based on keyword arguments passed to define transform.

Parameters:

Name Type Description Default
table DataFrame
required

Returns:

Type Description
A DataFrame with the results of the transformation.
from_spec(spec) classmethod

Resolves a Transform specification.

Parameters:

Name Type Description Default
spec dict[str, Any] | str

Specification declared as a dictionary of parameter values.

required

Returns:

Type Description
The resolved Transform object.

Unstack

Bases: Transform

Unstack applies pandas.DataFrame.unstack to the declared level.

df.unstack(<level>)

fill_value = param.ClassSelector(default=None, class_=(int, str, dict), doc='\n Replace NaN with this value if the unstack produces missing values.') class-attribute instance-attribute
level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to unstack.') class-attribute instance-attribute
transform_type = 'unstack' class-attribute
apply(table)

project_lnglat

Bases: Transform

project_lnglat projects the given longitude/latitude columns to Web Mercator.

Converts latitude and longitude values into WGS84 (Web Mercator) coordinates (meters East of Greenwich and meters North of the Equator).

latitude = param.String(default='longitude', doc='Latitude column') class-attribute instance-attribute
longitude = param.String(default='longitude', doc='Longitude column') class-attribute instance-attribute
transform_type = 'project_lnglat' class-attribute
apply(table)

sql

SQLColumns

Bases: SQLTransform

columns = param.List(default=[], doc='Columns to return.') class-attribute instance-attribute
transform_type = 'sql_columns' class-attribute
apply(sql_in)

SQLCount

Bases: SQLTransform

transform_type = 'sql_count' class-attribute
apply(sql_in)

SQLDistinct

Bases: SQLTransform

columns = param.List(default=[], doc='Columns to return distinct values for.') class-attribute instance-attribute
transform_type = 'sql_distinct' class-attribute
apply(sql_in)

SQLFilter

Bases: SQLFilterBase

Apply WHERE clause filtering to the entire query result.

This transform wraps the input query in a subquery and applies filters to the result set.

conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.') class-attribute instance-attribute
transform_type = 'sql_filter' class-attribute
apply(sql_in)

SQLFilterBase

Bases: SQLTransform

Base class for SQL filtering transforms that provides common filtering logic.

SQLFormat

Bases: SQLTransform

Format SQL expressions with parameterized replacements.

This transform allows for replacing placeholders in SQL queries using either Python string format-style placeholders {name} or sqlglot-style placeholders :name.

parameters = param.Dict(default={}, doc='\n Dictionary of parameter names and values to replace in the SQL template.') class-attribute instance-attribute
transform_type = 'sql_format' class-attribute
apply(sql_in)

Apply the formatting to the input SQL, replacing placeholders with values.

Parameters:

Name Type Description Default
sql_in str

The input SQL query to format. This is used as a base query that will have the formatted sql_expr applied to it, typically as a subquery.

required

Returns:

Type Description
str

The formatted SQL query with all placeholders replaced.

SQLGroupBy

Bases: SQLTransform

Performs a Group-By and aggregation

aggregates = param.Dict(doc='\n Mapping of aggregate functions to use to which column(s) to use them on,\n e.g. {"AVG": "col1", "SUM": ["col1", "col2"]}.') class-attribute instance-attribute
by = param.List(doc='Columns to group by.') class-attribute instance-attribute
transform_type = 'sql_group_by' class-attribute
apply(sql_in)

SQLLimit

Bases: SQLTransform

Performs a LIMIT SQL operation on the query. If the query already has a LIMIT clause, it will only be applied if the existing limit is less than the new limit.

limit = param.Integer(default=1000, allow_None=True, doc='Limit on the number of rows to return') class-attribute instance-attribute
transform_type = 'sql_limit' class-attribute
apply(sql_in)

SQLMinMax

Bases: SQLTransform

columns = param.List(default=[], doc='Columns to return min/max values for.') class-attribute instance-attribute
transform_type = 'sql_minmax' class-attribute
apply(sql_in)

SQLOverride

Bases: SQLTransform

override = param.String() class-attribute instance-attribute
apply(sql_in)

SQLPreFilter

Bases: SQLFilterBase

Apply filtering conditions to source tables before executing the main query.

This transform wraps source tables in subqueries with WHERE clauses, allowing filtering to be applied even when the main query doesn't select the filter columns.

For example: Input: "SELECT n_genes FROM obs" With conditions: [("obs", [("obs_id", ["cell1", "cell2"])])] Output: "SELECT n_genes FROM (SELECT * FROM obs WHERE obs_id IN ('cell1', 'cell2'))"

conditions = param.List(doc='\n List of filter conditions expressed as tuples of (table_name, filter_conditions)\n where filter_conditions is a list of (column_name, filter_value) tuples.\n Example: [("obs", [("obs_id", ["cell1", "cell2"])])]') class-attribute instance-attribute
transform_type = 'sql_prefilter' class-attribute
apply(sql_in)

SQLRemoveSourceSeparator

Bases: SQLTransform

Class to exclude the source and separator.

separator = param.String(default=SOURCE_TABLE_SEPARATOR, doc='\n Separator used to split the source and table name in the SQL query.') class-attribute instance-attribute
apply(sql_in)

Exclude the source and separator from the SQL query.

Parameters:

Name Type Description Default
sql_in str

The initial SQL query to be manipulated.

required

Returns:

Type Description
string

New SQL query derived from the above query.

SQLSample

Bases: SQLTransform

Samples rows from a SQL query using TABLESAMPLE or similar functionality, depending on the dialect's support.

percent = param.Number(default=10.0, bounds=(0.0, 100.0), doc='\n percent of rows to sample. Must be between 0 and 100.') class-attribute instance-attribute
sample_kwargs = param.Dict(default={}, doc='\n Other keyword arguments, like method, bucket_numerator, bucket_denominator, bucket_field.') class-attribute instance-attribute
seed = param.Integer(default=None, allow_None=True, doc='\n Random seed for reproducible sampling.') class-attribute instance-attribute
size = param.Integer(default=None, allow_None=True, doc='\n Absolute number of rows to sample. If specified, takes precedence over percent.') class-attribute instance-attribute
transform_type = 'sql_sample' class-attribute
apply(sql_in)

SQLSelectFrom

Bases: SQLFormat

sql_expr = param.String(default='SELECT * FROM {table}', doc='\n The SQL expression to use if the sql_in does NOT\n already contain a SELECT statement.') class-attribute instance-attribute
tables = param.ClassSelector(default=None, class_=(list, dict), doc='\n Dictionary of tables to replace or use in the SQL expression.\n If None, the original table will be used.') class-attribute instance-attribute
transform_type = 'sql_select_from' class-attribute
apply(sql_in)

SQLTransform

Bases: Transform

Base class for SQL transforms using sqlglot.

comments = param.Boolean(default=False, doc='Whether to include comments in the output SQL') class-attribute instance-attribute
error_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.RAISE), doc='Error level for parsing') class-attribute instance-attribute
identify = param.Boolean(default=False, doc='\n Delimit all identifiers, e.g. turn `FROM database.table` into `FROM "database"."table"`.\n This is useful for dialects that don\'t support unquoted identifiers.') class-attribute instance-attribute
optimize = param.Boolean(default=False, doc="\n Whether to optimize the generated SQL query; may produce invalid results, especially with\n duckdb's read_* functions.") class-attribute instance-attribute
pretty = param.Boolean(default=False, doc='Prettify output SQL, i.e. add newlines and indentation') class-attribute instance-attribute
read = param.String(default=None, doc='Source dialect for parsing; if None, automatically detects') class-attribute instance-attribute
unsupported_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.WARN), doc='When using `to_sql`, how to handle unsupported dialect features.') class-attribute instance-attribute
write = param.String(default=None, doc='Target dialect for output; if None, defaults to read dialect') class-attribute instance-attribute
apply(sql_in)

Given an SQL statement, manipulate it, and return a new SQL statement.

Parameters:

Name Type Description Default
sql_in str

The initial SQL query to be manipulated.

required

Returns:

Type Description
string

New SQL query derived from the above query.

apply_to(sql_in, **kwargs) classmethod

Calls the apply method based on keyword arguments passed to define transform.

Parameters:

Name Type Description Default
sql_in str
required

Returns:

Type Description
SQL statement after application of transformation.
parse_sql(sql_in)

Parse SQL string into sqlglot AST.

Parameters:

Name Type Description Default
sql_in str

SQL string to parse

required

Returns:

Type Description
Expression

Parsed SQL expression

to_sql(expression)

Convert sqlglot expression back to SQL string.

Parameters:

Name Type Description Default
expression Expression

Expression to convert to SQL

required

Returns:

Type Description
string

SQL string representation