Transforms¶
lumen.transforms
¶
base
¶
The Transform components allow transforming tables in arbitrary ways.
DataFrame = pd.DataFrame | dDataFrame
module-attribute
¶
Series = pd.Series | dSeries
module-attribute
¶
pd_version = Version(pd.__version__)
module-attribute
¶
Aggregate
¶
Bases:
Aggregate one or more columns or indexes, see pandas.DataFrame.groupby.
by must be provided.
df.groupby(<by>)[<columns>].<method>()[.reset_index()]
by = param.ListSelector(doc='\n Columns or indexes to group by.')
class-attribute
instance-attribute
¶
columns = param.ListSelector(allow_None=True, doc='\n Columns to aggregate.')
class-attribute
instance-attribute
¶
kwargs = param.Dict(default={}, doc='\n Keyword arguments to the aggregation method.')
class-attribute
instance-attribute
¶
method = param.String(default='mean', doc='\n Name of the pandas aggregation method, e.g. max, min, count.')
class-attribute
instance-attribute
¶
transform_type = 'aggregate'
class-attribute
¶
with_index = param.Boolean(default=True, doc='\n Whether to make the groupby columns indexes.')
class-attribute
instance-attribute
¶
apply(table)
¶
Astype
¶
Columns
¶
Compute
¶
Corr
¶
Bases:
Corr computes pairwise correlation of columns, excluding NA/null values.
method = param.Selector(default='pearson', objects=['pearson', 'kendall', 'spearman'], doc='\n Method of correlation.')
class-attribute
instance-attribute
¶
min_periods = param.Integer(default=1, doc='\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.')
class-attribute
instance-attribute
¶
numeric_only = param.Boolean(default=False, doc='\n Include only `float`, `int` or `boolean` data.')
class-attribute
instance-attribute
¶
transform_type = 'corr'
class-attribute
¶
apply(table)
¶
Count
¶
Bases:
Counts non-nan values in each column of the DataFrame and returns
a new DataFrame with a single row with a count for each original
column, see pandas.DataFrame.count.
df.count(axis=
axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.')
class-attribute
instance-attribute
¶
numeric_only = param.Boolean(default=False, doc='\n Include only float, int or boolean data.')
class-attribute
instance-attribute
¶
transform_type = 'count'
class-attribute
¶
apply(table)
¶
DropNA
¶
Bases:
DropNA drops rows with any missing values.
df.dropna(axis=<axis>, how=<how>, thresh=<thresh>, subset=<subset>)
axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")
class-attribute
instance-attribute
¶
how = param.Selector(default='any', objects=['any', 'all'], doc='\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.')
class-attribute
instance-attribute
¶
subset = param.ListSelector(default=None, doc='\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.')
class-attribute
instance-attribute
¶
thresh = param.Integer(default=None, doc='\n Require that many non-NA values.')
class-attribute
instance-attribute
¶
transform_type = 'dropna'
class-attribute
¶
apply(table)
¶
Eval
¶
Bases:
Applies an eval assignment expression to a DataFrame. The
expression can reference columns on the original table by
referencing table.<column> and must assign to a variable that
will become a new column in the DataFrame, e.g. to divide a
value column by one thousand and assign the result to a new column
called kilo_value you can write an expr like:
kilo_value = table.value / 1000
See pandas.eval for more information.
Filter
¶
Bases:
Filter transform implement the filtering behavior of Filter components.
The filter conditions must be declared as a list of tuple containing
the name of the column to be filtered and one of the following:
- scalar: A scalar value will be matched using equality operators
- tuple: A tuple value specifies a numeric or date range.
- list: A list value specifies a set of categories to match against.
- list(tuple): A list of tuples specifies a list of ranges.
HistoryTransform
¶
Bases:
HistoryTransform accumulates a history of the queried data.
The internal buffer accumulates data up to the supplied length
and (optionally) adds a date_column to the data.
date_column = param.Selector(doc='\n If defined adds a date column with the supplied name.')
class-attribute
instance-attribute
¶
length = param.Integer(default=10, bounds=(1, None), doc='\n Accumulates a history of data.')
class-attribute
instance-attribute
¶
transform_type = 'history'
class-attribute
instance-attribute
¶
apply(table)
¶
Accumulates a history of the data in a buffer up to the
declared length and optionally adds the current datetime to
the declared date_column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
|
The queried table as a DataFrame. |
required |
Returns:
| Type | Description |
|---|---|
|
A DataFrame containing the buffered history of the data. |
Iloc
¶
Bases:
Iloc allows selecting the data with integer indexing, see pandas.DataFrame.iloc.
df.iloc[<start>:<end>]
Melt
¶
Bases:
Melt applies the pandas.melt operation given the id_vars and value_vars.
id_vars = param.ListSelector(default=[], doc='\n Column(s) to use as identifier variables.')
class-attribute
instance-attribute
¶
ignore_index = param.Boolean(default=True, doc='\n If True, original index is ignored. If False, the original\n index is retained. Index labels will be repeated as\n necessary.')
class-attribute
instance-attribute
¶
transform_type = 'melt'
class-attribute
¶
value_name = param.String(default='value', doc="\n Name to use for the 'value' column.")
class-attribute
instance-attribute
¶
value_vars = param.ListSelector(default=None, doc='\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.')
class-attribute
instance-attribute
¶
var_name = param.String(default=None, doc="\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.")
class-attribute
instance-attribute
¶
apply(table)
¶
Pivot
¶
Bases:
Pivot applies pandas.DataFrame.pivot given an index, columns, and values.
columns = param.String(default=None, doc="\n Column to use to make new frame's columns.")
class-attribute
instance-attribute
¶
index = param.String(default=None, doc="\n Column to use to make new frame's index.\n If None, uses existing index.")
class-attribute
instance-attribute
¶
transform_type = 'pivot'
class-attribute
¶
values = param.ListSelector(default=None, doc="\n Column(s) to use for populating new frame's values.\n If not specified, all remaining columns will be used\n and the result will have hierarchically indexed columns.")
class-attribute
instance-attribute
¶
apply(table)
¶
PivotTable
¶
Bases:
PivotTable applies pandas.pivot_table` to the data.
aggfunc = param.String(default='mean', doc="\n Function, list of functions, dict, default 'mean'")
class-attribute
instance-attribute
¶
columns = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.')
class-attribute
instance-attribute
¶
index = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.')
class-attribute
instance-attribute
¶
values = param.ListSelector(default=[], doc='\n Column or columns to aggregate.')
class-attribute
instance-attribute
¶
apply(table)
¶
Query
¶
Rename
¶
Bases:
Rename renames columns or indexes, see pandas.DataFrame.rename.
df.rename(mapper=
axis = param.ClassSelector(default=None, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")
class-attribute
instance-attribute
¶
columns = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=1` is equivalent to\n `columns=mapper`).')
class-attribute
instance-attribute
¶
copy = param.Boolean(default=False, doc='\n Also copy underlying data.')
class-attribute
instance-attribute
¶
index = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=0` is equivalent to\n `index=mapper`).')
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=None, class_=(int, str), doc='\n In case of a MultiIndex, only rename labels in the specified level.')
class-attribute
instance-attribute
¶
mapper = param.Dict(default=None, doc="\n Dict to apply to that axis' values. Use either `mapper` and `axis` to\n specify the axis to target with `mapper`, or `index` and `columns`.")
class-attribute
instance-attribute
¶
transform_type = 'rename'
class-attribute
¶
apply(table)
¶
RenameAxis
¶
Bases:
Set the name of the axis for the index or columns,
see pandas.DataFrame.rename_axis.
df.rename_axis(mapper=
axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")
class-attribute
instance-attribute
¶
columns = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.")
class-attribute
instance-attribute
¶
copy = param.Boolean(default=True, doc='\n Also copy underlying data.')
class-attribute
instance-attribute
¶
index = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.")
class-attribute
instance-attribute
¶
mapper = param.ClassSelector(default=None, class_=(str, list), doc='\n Value to set the axis name attribute.')
class-attribute
instance-attribute
¶
transform_type = 'rename_axis'
class-attribute
¶
apply(table)
¶
ResetIndex
¶
Bases:
ResetIndex resets DataFrame indexes to columns or drops them, see pandas.DataFrame.reset_index
df.reset_index(drop=<drop>, col_fill=<col_fill>, col_level=<col_level>, level=<level>)
col_fill = param.String(default='', doc='\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.')
class-attribute
instance-attribute
¶
col_level = param.ClassSelector(default=0, class_=(int, str), doc='\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the\n first level.')
class-attribute
instance-attribute
¶
drop = param.Boolean(default=False, doc='\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.')
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=None, class_=(int, str, list), doc='\n Only remove the given levels from the index. Removes all levels\n by default.')
class-attribute
instance-attribute
¶
transform_type = 'reset_index'
class-attribute
¶
apply(table)
¶
Sample
¶
Bases:
Sample returns a random sample of items.
df.sample(n=<n>, frac=<frac>, replace=<replace>)
frac = param.Number(default=None, bounds=(0, 1), doc='\n Fraction of axis items to return.')
class-attribute
instance-attribute
¶
n = param.Integer(default=None, doc='\n Number of items to return.')
class-attribute
instance-attribute
¶
replace = param.Boolean(default=False, doc='\n Sample with or without replacement.')
class-attribute
instance-attribute
¶
transform_type = 'sample'
class-attribute
¶
apply(table)
¶
SetIndex
¶
Bases:
SetIndex promotes DataFrame columns to indexes, see pandas.DataFrame.set_index.
df.set_index(<keys>, drop=<drop>, append=<append>, verify_integrity=<verify_integrity>)
append = param.Boolean(default=False, doc='\n Whether to append columns to existing index.')
class-attribute
instance-attribute
¶
drop = param.Boolean(default=True, doc='\n Delete columns to be used as the new index.')
class-attribute
instance-attribute
¶
keys = param.ClassSelector(default=None, class_=(str, list), doc='\n This parameter can be either a single column key or a list\n containing column keys.')
class-attribute
instance-attribute
¶
transform_type = 'set_index'
class-attribute
¶
verify_integrity = param.Boolean(default=False, doc='\n Check the new index for duplicates. Otherwise defer the check\n until necessary. Setting to False will improve the performance\n of this method.')
class-attribute
instance-attribute
¶
apply(table)
¶
Sort
¶
Bases:
Sort on one or more columns, see pandas.DataFrame.sort_values.
df.sort_values(<by>, ascending=<ascending>)
ascending = param.ClassSelector(default=True, class_=(bool, list), doc='\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.')
class-attribute
instance-attribute
¶
by = param.ListSelector(default=[], doc='\n Columns or indexes to sort by.')
class-attribute
instance-attribute
¶
transform_type = 'sort'
class-attribute
¶
apply(table)
¶
Stack
¶
Bases:
Stack applies pandas.DataFrame.stack to the declared level.
df.stack(<level>)
dropna = param.Boolean(default=True, doc='\n Whether to drop rows in the resulting Frame/Series with missing values.\n Stacking a column level onto the index axis can create combinations of\n index and column values that are missing from the original\n dataframe.')
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to stack.')
class-attribute
instance-attribute
¶
transform_type = 'stack'
class-attribute
¶
apply(table)
¶
Sum
¶
Bases:
Sums numeric values in each column of the DataFrame and returns a
new DataFrame with a single row containing the sum for each
original column, see pandas.DataFrame.sum.
df.count(axis=
axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.')
class-attribute
instance-attribute
¶
transform_type = 'sum'
class-attribute
¶
apply(table)
¶
Transform
¶
Bases:
Transform components implement transforms of DataFrame objects.
control_panel
property
¶
controls = param.List(default=[], doc='\n Parameters that should be exposed as widgets in the UI.')
class-attribute
instance-attribute
¶
transform_type = None
class-attribute
¶
apply(table)
¶
Given a table transform it in some way and return it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
|
The queried table as a DataFrame. |
required |
Returns:
| Type | Description |
|---|---|
|
A DataFrame containing the transformed data. |
apply_to(table, **kwargs)
classmethod
¶
Calls the apply method based on keyword arguments passed to define transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
table
|
|
|
required |
Returns:
| Type | Description |
|---|---|
A DataFrame with the results of the transformation.
|
|
from_spec(spec)
classmethod
¶
Resolves a Transform specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
|
Specification declared as a dictionary of parameter values. |
required |
Returns:
| Type | Description |
|---|---|
The resolved Transform object.
|
|
Unstack
¶
Bases:
Unstack applies pandas.DataFrame.unstack to the declared level.
df.unstack(<level>)
fill_value = param.ClassSelector(default=None, class_=(int, str, dict), doc='\n Replace NaN with this value if the unstack produces missing values.')
class-attribute
instance-attribute
¶
level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to unstack.')
class-attribute
instance-attribute
¶
transform_type = 'unstack'
class-attribute
¶
apply(table)
¶
project_lnglat
¶
Bases:
project_lnglat projects the given longitude/latitude columns to Web Mercator.
Converts latitude and longitude values into WGS84 (Web Mercator) coordinates (meters East of Greenwich and meters North of the Equator).
sql
¶
SQLColumns
¶
SQLDistinct
¶
SQLFilter
¶
Bases:
Apply WHERE clause filtering to the entire query result.
This transform wraps the input query in a subquery and applies filters to the result set.
SQLFilterBase
¶
Bases:
Base class for SQL filtering transforms that provides common filtering logic.
SQLFormat
¶
Bases:
Format SQL expressions with parameterized replacements.
This transform allows for replacing placeholders in SQL queries using either Python string format-style placeholders {name} or sqlglot-style placeholders :name.
parameters = param.Dict(default={}, doc='\n Dictionary of parameter names and values to replace in the SQL template.')
class-attribute
instance-attribute
¶
transform_type = 'sql_format'
class-attribute
¶
apply(sql_in)
¶
Apply the formatting to the input SQL, replacing placeholders with values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_in
|
|
The input SQL query to format. This is used as a base query that will have the formatted sql_expr applied to it, typically as a subquery. |
required |
Returns:
| Type | Description |
|---|---|
|
The formatted SQL query with all placeholders replaced. |
SQLGroupBy
¶
Bases:
Performs a Group-By and aggregation
aggregates = param.Dict(doc='\n Mapping of aggregate functions to use to which column(s) to use them on,\n e.g. {"AVG": "col1", "SUM": ["col1", "col2"]}.')
class-attribute
instance-attribute
¶
by = param.List(doc='Columns to group by.')
class-attribute
instance-attribute
¶
transform_type = 'sql_group_by'
class-attribute
¶
apply(sql_in)
¶
SQLLimit
¶
Bases:
Performs a LIMIT SQL operation on the query. If the query already has a LIMIT clause, it will only be applied if the existing limit is less than the new limit.
SQLMinMax
¶
SQLOverride
¶
SQLPreFilter
¶
Bases:
Apply filtering conditions to source tables before executing the main query.
This transform wraps source tables in subqueries with WHERE clauses, allowing filtering to be applied even when the main query doesn't select the filter columns.
For example: Input: "SELECT n_genes FROM obs" With conditions: [("obs", [("obs_id", ["cell1", "cell2"])])] Output: "SELECT n_genes FROM (SELECT * FROM obs WHERE obs_id IN ('cell1', 'cell2'))"
conditions = param.List(doc='\n List of filter conditions expressed as tuples of (table_name, filter_conditions)\n where filter_conditions is a list of (column_name, filter_value) tuples.\n Example: [("obs", [("obs_id", ["cell1", "cell2"])])]')
class-attribute
instance-attribute
¶
transform_type = 'sql_prefilter'
class-attribute
¶
apply(sql_in)
¶
SQLRemoveSourceSeparator
¶
Bases:
Class to exclude the source and separator.
separator = param.String(default=SOURCE_TABLE_SEPARATOR, doc='\n Separator used to split the source and table name in the SQL query.')
class-attribute
instance-attribute
¶
apply(sql_in)
¶
Exclude the source and separator from the SQL query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_in
|
|
The initial SQL query to be manipulated. |
required |
Returns:
| Type | Description |
|---|---|
|
New SQL query derived from the above query. |
SQLSample
¶
Bases:
Samples rows from a SQL query using TABLESAMPLE or similar functionality, depending on the dialect's support.
percent = param.Number(default=10.0, bounds=(0.0, 100.0), doc='\n percent of rows to sample. Must be between 0 and 100.')
class-attribute
instance-attribute
¶
sample_kwargs = param.Dict(default={}, doc='\n Other keyword arguments, like method, bucket_numerator, bucket_denominator, bucket_field.')
class-attribute
instance-attribute
¶
seed = param.Integer(default=None, allow_None=True, doc='\n Random seed for reproducible sampling.')
class-attribute
instance-attribute
¶
size = param.Integer(default=None, allow_None=True, doc='\n Absolute number of rows to sample. If specified, takes precedence over percent.')
class-attribute
instance-attribute
¶
transform_type = 'sql_sample'
class-attribute
¶
apply(sql_in)
¶
SQLSelectFrom
¶
Bases:
sql_expr = param.String(default='SELECT * FROM {table}', doc='\n The SQL expression to use if the sql_in does NOT\n already contain a SELECT statement.')
class-attribute
instance-attribute
¶
tables = param.ClassSelector(default=None, class_=(list, dict), doc='\n Dictionary of tables to replace or use in the SQL expression.\n If None, the original table will be used.')
class-attribute
instance-attribute
¶
transform_type = 'sql_select_from'
class-attribute
¶
apply(sql_in)
¶
SQLTransform
¶
Bases:
Base class for SQL transforms using sqlglot.
comments = param.Boolean(default=False, doc='Whether to include comments in the output SQL')
class-attribute
instance-attribute
¶
error_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.RAISE), doc='Error level for parsing')
class-attribute
instance-attribute
¶
identify = param.Boolean(default=False, doc='\n Delimit all identifiers, e.g. turn `FROM database.table` into `FROM "database"."table"`.\n This is useful for dialects that don\'t support unquoted identifiers.')
class-attribute
instance-attribute
¶
optimize = param.Boolean(default=False, doc="\n Whether to optimize the generated SQL query; may produce invalid results, especially with\n duckdb's read_* functions.")
class-attribute
instance-attribute
¶
pretty = param.Boolean(default=False, doc='Prettify output SQL, i.e. add newlines and indentation')
class-attribute
instance-attribute
¶
read = param.String(default=None, doc='Source dialect for parsing; if None, automatically detects')
class-attribute
instance-attribute
¶
unsupported_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.WARN), doc='When using `to_sql`, how to handle unsupported dialect features.')
class-attribute
instance-attribute
¶
write = param.String(default=None, doc='Target dialect for output; if None, defaults to read dialect')
class-attribute
instance-attribute
¶
apply(sql_in)
¶
Given an SQL statement, manipulate it, and return a new SQL statement.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_in
|
|
The initial SQL query to be manipulated. |
required |
Returns:
| Type | Description |
|---|---|
|
New SQL query derived from the above query. |
apply_to(sql_in, **kwargs)
classmethod
¶
Calls the apply method based on keyword arguments passed to define transform.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_in
|
|
|
required |
Returns:
| Type | Description |
|---|---|
SQL statement after application of transformation.
|
|
parse_sql(sql_in)
¶
Parse SQL string into sqlglot AST.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sql_in
|
|
SQL string to parse |
required |
Returns:
| Type | Description |
|---|---|
|
Parsed SQL expression |
to_sql(expression)
¶
Convert sqlglot expression back to SQL string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expression
|
|
Expression to convert to SQL |
required |
Returns:
| Type | Description |
|---|---|
|
SQL string representation |