Transforms¶

`lumen.transforms` ¶

`base` ¶

The Transform components allow transforming tables in arbitrary ways.

`DataFrame = pd.DataFrame | dDataFrame` `module-attribute` ¶

`Series = pd.Series | dSeries` `module-attribute` ¶

`pd_version = Version(pd.version)` `module-attribute` ¶

`Aggregate` ¶

Bases: Transform

Aggregate one or more columns or indexes, see pandas.DataFrame.groupby.

by must be provided.

df.groupby(<by>)[<columns>].<method>()[.reset_index()]

`by = param.ListSelector(doc='\n Columns or indexes to group by.')` `class-attribute` `instance-attribute` ¶

`columns = param.ListSelector(allow_None=True, doc='\n Columns to aggregate.')` `class-attribute` `instance-attribute` ¶

`kwargs = param.Dict(default={}, doc='\n Keyword arguments to the aggregation method.')` `class-attribute` `instance-attribute` ¶

`method = param.String(default='mean', doc='\n Name of the pandas aggregation method, e.g. max, min, count.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'aggregate'` `class-attribute` ¶

`with_index = param.Boolean(default=True, doc='\n Whether to make the groupby columns indexes.')` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Astype` ¶

Bases: Transform

Astype transforms the type of one or more columns.

`dtypes = param.Dict(doc='Mapping from column name to new type.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'as_type'` `class-attribute` ¶

`apply(table)` ¶

`Columns` ¶

Bases: Transform

Columns selects a subset of columns.

df[<columns>]

`columns = param.ListSelector(doc='\n The subset of columns to select.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'columns'` `class-attribute` ¶

`apply(table)` ¶

`Compute` ¶

Bases: Transform

Compute turns a dask.dataframe.DataFrame into a pandas.DataFrame.

`transform_type = 'compute'` `class-attribute` ¶

`apply(table)` ¶

`Corr` ¶

Bases: Transform

Corr computes pairwise correlation of columns, excluding NA/null values.

`method = param.Selector(default='pearson', objects=['pearson', 'kendall', 'spearman'], doc='\n Method of correlation.')` `class-attribute` `instance-attribute` ¶

`min_periods = param.Integer(default=1, doc='\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.')` `class-attribute` `instance-attribute` ¶

numeric_only = param.Boolean(default=False, doc='\n Include only `float`, `int` or `boolean` data.') `class-attribute` `instance-attribute` ¶

`transform_type = 'corr'` `class-attribute` ¶

`apply(table)` ¶

`Count` ¶

Bases: Transform

Counts non-nan values in each column of the DataFrame and returns a new DataFrame with a single row with a count for each original column, see pandas.DataFrame.count.

df.count(axis=, level=, numeric_only=).to_frame().T

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.')` `class-attribute` `instance-attribute` ¶

`numeric_only = param.Boolean(default=False, doc='\n Include only float, int or boolean data.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'count'` `class-attribute` ¶

`apply(table)` ¶

`DropNA` ¶

Bases: Transform

DropNA drops rows with any missing values.

df.dropna(axis=<axis>, how=<how>, thresh=<thresh>, subset=<subset>)

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`how = param.Selector(default='any', objects=['any', 'all'], doc='\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.')` `class-attribute` `instance-attribute` ¶

`subset = param.ListSelector(default=None, doc='\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.')` `class-attribute` `instance-attribute` ¶

`thresh = param.Integer(default=None, doc='\n Require that many non-NA values.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'dropna'` `class-attribute` ¶

`apply(table)` ¶

`Eval` ¶

Bases: Transform

Applies an eval assignment expression to a DataFrame. The expression can reference columns on the original table by referencing table.<column> and must assign to a variable that will become a new column in the DataFrame, e.g. to divide a value column by one thousand and assign the result to a new column called kilo_value you can write an expr like:

kilo_value = table.value / 1000

See pandas.eval for more information.

`expr = param.String(doc='\n The expression to apply to the table.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'eval'` `class-attribute` ¶

`apply(table)` ¶

`Filter` ¶

Bases: Transform

Filter transform implement the filtering behavior of Filter components.

The filter conditions must be declared as a list of tuple containing the name of the column to be filtered and one of the following:

scalar: A scalar value will be matched using equality operators
tuple: A tuple value specifies a numeric or date range.
list: A list value specifies a set of categories to match against.
list(tuple): A list of tuples specifies a list of ranges.

`conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.')` `class-attribute` `instance-attribute` ¶

`apply(df)` ¶

`HistoryTransform` ¶

Bases: Transform

HistoryTransform accumulates a history of the queried data.

The internal buffer accumulates data up to the supplied length and (optionally) adds a date_column to the data.

`date_column = param.Selector(doc='\n If defined adds a date column with the supplied name.')` `class-attribute` `instance-attribute` ¶

`length = param.Integer(default=10, bounds=(1, None), doc='\n Accumulates a history of data.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'history'` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

Accumulates a history of the data in a buffer up to the declared length and optionally adds the current datetime to the declared date_column.

Parameters:

Name	Type	Description	Default
`table`	`DataFrame`	The queried table as a DataFrame.	required

Returns:

Type	Description
`DataFrame`	A DataFrame containing the buffered history of the data.

`Iloc` ¶

Bases: Transform

Iloc allows selecting the data with integer indexing, see pandas.DataFrame.iloc.

df.iloc[<start>:<end>]

`end = param.Integer(default=None)` `class-attribute` `instance-attribute` ¶

`start = param.Integer(default=None)` `class-attribute` `instance-attribute` ¶

`transform_type = 'iloc'` `class-attribute` ¶

`apply(table)` ¶

`Melt` ¶

Bases: Transform

Melt applies the pandas.melt operation given the id_vars and value_vars.

`id_vars = param.ListSelector(default=[], doc='\n Column(s) to use as identifier variables.')` `class-attribute` `instance-attribute` ¶

`ignore_index = param.Boolean(default=True, doc='\n If True, original index is ignored. If False, the original\n index is retained. Index labels will be repeated as\n necessary.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'melt'` `class-attribute` ¶

`value_name = param.String(default='value', doc="\n Name to use for the 'value' column.")` `class-attribute` `instance-attribute` ¶

value_vars = param.ListSelector(default=None, doc='\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.') `class-attribute` `instance-attribute` ¶

var_name = param.String(default=None, doc="\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.") `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Pivot` ¶

Bases: Transform

Pivot applies pandas.DataFrame.pivot given an index, columns, and values.

`columns = param.String(default=None, doc="\n Column to use to make new frame's columns.")` `class-attribute` `instance-attribute` ¶

`index = param.String(default=None, doc="\n Column to use to make new frame's index.\n If None, uses existing index.")` `class-attribute` `instance-attribute` ¶

`transform_type = 'pivot'` `class-attribute` ¶

`values = param.ListSelector(default=None, doc="\n Column(s) to use for populating new frame's values.\n If not specified, all remaining columns will be used\n and the result will have hierarchically indexed columns.")` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`PivotTable` ¶

Bases: Transform

PivotTable applies pandas.pivot_table` to the data.

`aggfunc = param.String(default='mean', doc="\n Function, list of functions, dict, default 'mean'")` `class-attribute` `instance-attribute` ¶

`columns = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table column. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.')` `class-attribute` `instance-attribute` ¶

`index = param.ListSelector(default=[], doc='\n Column, Grouper, array, or list of the previous\n Keys to group by on the pivot table index. If a list is passed,\n it can contain any of the other types (except list). If an array is\n passed, it must be the same length as the data and will be used in\n the same manner as column values.')` `class-attribute` `instance-attribute` ¶

`values = param.ListSelector(default=[], doc='\n Column or columns to aggregate.')` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Query` ¶

Bases: Transform

Query applies the pandas.DataFrame.query method.

df.query(<query>)

`query = param.String(doc='\n The query to apply to the table.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'query'` `class-attribute` ¶

`apply(table)` ¶

`Rename` ¶

Bases: Transform

Rename renames columns or indexes, see pandas.DataFrame.rename.

df.rename(mapper=, columns=, index=, level=, axis=, copy=)

`axis = param.ClassSelector(default=None, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

columns = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=1` is equivalent to\n `columns=mapper`).') `class-attribute` `instance-attribute` ¶

`copy = param.Boolean(default=False, doc='\n Also copy underlying data.')` `class-attribute` `instance-attribute` ¶

index = param.Dict(default=None, doc='\n Alternative to specifying axis (`mapper, axis=0` is equivalent to\n `index=mapper`).') `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=None, class_=(int, str), doc='\n In case of a MultiIndex, only rename labels in the specified level.')` `class-attribute` `instance-attribute` ¶

mapper = param.Dict(default=None, doc="\n Dict to apply to that axis' values. Use either `mapper` and `axis` to\n specify the axis to target with `mapper`, or `index` and `columns`.") `class-attribute` `instance-attribute` ¶

`transform_type = 'rename'` `class-attribute` ¶

`apply(table)` ¶

`RenameAxis` ¶

Bases: Transform

Set the name of the axis for the index or columns, see pandas.DataFrame.rename_axis.

df.rename_axis(mapper=, columns=, index=, axis=, copy=)

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`columns = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.")` `class-attribute` `instance-attribute` ¶

`copy = param.Boolean(default=True, doc='\n Also copy underlying data.')` `class-attribute` `instance-attribute` ¶

`index = param.ClassSelector(default=None, class_=(str, list, dict), doc="\n A scalar, list-like, dict-like to apply to that axis' values.\n Note that the columns parameter is not allowed if the object\n is a Series. This parameter only apply for DataFrame type objects.\n Use either mapper and axis to specify the axis to target with\n mapper, or index and/or columns.")` `class-attribute` `instance-attribute` ¶

`mapper = param.ClassSelector(default=None, class_=(str, list), doc='\n Value to set the axis name attribute.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'rename_axis'` `class-attribute` ¶

`apply(table)` ¶

`ResetIndex` ¶

Bases: Transform

ResetIndex resets DataFrame indexes to columns or drops them, see pandas.DataFrame.reset_index

df.reset_index(drop=<drop>, col_fill=<col_fill>, col_level=<col_level>, level=<level>)

`col_fill = param.String(default='', doc='\n If the columns have multiple levels, determines how the other\n levels are named. If None then the index name is repeated.')` `class-attribute` `instance-attribute` ¶

`col_level = param.ClassSelector(default=0, class_=(int, str), doc='\n If the columns have multiple levels, determines which level the\n labels are inserted into. By default it is inserted into the\n first level.')` `class-attribute` `instance-attribute` ¶

`drop = param.Boolean(default=False, doc='\n Do not try to insert index into dataframe columns. This resets\n the index to the default integer index.')` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=None, class_=(int, str, list), doc='\n Only remove the given levels from the index. Removes all levels\n by default.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'reset_index'` `class-attribute` ¶

`apply(table)` ¶

`Sample` ¶

Bases: Transform

Sample returns a random sample of items.

df.sample(n=<n>, frac=<frac>, replace=<replace>)

`frac = param.Number(default=None, bounds=(0, 1), doc='\n Fraction of axis items to return.')` `class-attribute` `instance-attribute` ¶

`n = param.Integer(default=None, doc='\n Number of items to return.')` `class-attribute` `instance-attribute` ¶

`replace = param.Boolean(default=False, doc='\n Sample with or without replacement.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sample'` `class-attribute` ¶

`apply(table)` ¶

`SetIndex` ¶

Bases: Transform

SetIndex promotes DataFrame columns to indexes, see pandas.DataFrame.set_index.

df.set_index(<keys>, drop=<drop>, append=<append>, verify_integrity=<verify_integrity>)

`append = param.Boolean(default=False, doc='\n Whether to append columns to existing index.')` `class-attribute` `instance-attribute` ¶

`drop = param.Boolean(default=True, doc='\n Delete columns to be used as the new index.')` `class-attribute` `instance-attribute` ¶

`keys = param.ClassSelector(default=None, class_=(str, list), doc='\n This parameter can be either a single column key or a list\n containing column keys.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'set_index'` `class-attribute` ¶

`verify_integrity = param.Boolean(default=False, doc='\n Check the new index for duplicates. Otherwise defer the check\n until necessary. Setting to False will improve the performance\n of this method.')` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Sort` ¶

Bases: Transform

Sort on one or more columns, see pandas.DataFrame.sort_values.

df.sort_values(<by>, ascending=<ascending>)

`ascending = param.ClassSelector(default=True, class_=(bool, list), doc='\n Sort ascending vs. descending. Specify list for multiple sort\n orders. If this is a list of bools, must match the length of\n the by.')` `class-attribute` `instance-attribute` ¶

`by = param.ListSelector(default=[], doc='\n Columns or indexes to sort by.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sort'` `class-attribute` ¶

`apply(table)` ¶

`Stack` ¶

Bases: Transform

Stack applies pandas.DataFrame.stack to the declared level.

df.stack(<level>)

`dropna = param.Boolean(default=True, doc='\n Whether to drop rows in the resulting Frame/Series with missing values.\n Stacking a column level onto the index axis can create combinations of\n index and column values that are missing from the original\n dataframe.')` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to stack.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'stack'` `class-attribute` ¶

`apply(table)` ¶

`Sum` ¶

Bases: Transform

Sums numeric values in each column of the DataFrame and returns a new DataFrame with a single row containing the sum for each original column, see pandas.DataFrame.sum.

df.count(axis=, level=).to_frame().T

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sum'` `class-attribute` ¶

`apply(table)` ¶

`Transform` ¶

Bases: MultiTypeComponent

Transform components implement transforms of DataFrame objects.

`control_panel` `property` ¶

`controls = param.List(default=[], doc='\n Parameters that should be exposed as widgets in the UI.')` `class-attribute` `instance-attribute` ¶

`transform_type = None` `class-attribute` ¶

`apply(table)` ¶

Given a table transform it in some way and return it.

Parameters:

Name	Type	Description	Default
`table`	`DataFrame`	The queried table as a DataFrame.	required

Returns:

Type	Description
`DataFrame`	A DataFrame containing the transformed data.

`apply_to(table, **kwargs)` `classmethod` ¶

Calls the apply method based on keyword arguments passed to define transform.

Parameters:

Name	Type	Description	Default
`table`	`DataFrame`		required

Returns:

Type	Description
`A DataFrame with the results of the transformation.`

`from_spec(spec)` `classmethod` ¶

Resolves a Transform specification.

Parameters:

Name	Type	Description	Default
`spec`	`dict[str, Any] \| str`	Specification declared as a dictionary of parameter values.	required

Returns:

Type	Description
`The resolved Transform object.`

`Unstack` ¶

Bases: Transform

Unstack applies pandas.DataFrame.unstack to the declared level.

df.unstack(<level>)

`fill_value = param.ClassSelector(default=None, class_=(int, str, dict), doc='\n Replace NaN with this value if the unstack produces missing values.')` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=(-1), class_=(int, list, str), doc='\n The indexes to unstack.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'unstack'` `class-attribute` ¶

`apply(table)` ¶

`project_lnglat` ¶

Bases: Transform

project_lnglat projects the given longitude/latitude columns to Web Mercator.

Converts latitude and longitude values into WGS84 (Web Mercator) coordinates (meters East of Greenwich and meters North of the Equator).

`latitude = param.String(default='longitude', doc='Latitude column')` `class-attribute` `instance-attribute` ¶

`longitude = param.String(default='longitude', doc='Longitude column')` `class-attribute` `instance-attribute` ¶

`transform_type = 'project_lnglat'` `class-attribute` ¶

`apply(table)` ¶

`sql` ¶

`SQLColumns` ¶

Bases: SQLTransform

`columns = param.List(default=[], doc='Columns to return.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_columns'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLCount` ¶

Bases: SQLTransform

`transform_type = 'sql_count'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLDistinct` ¶

Bases: SQLTransform

`columns = param.List(default=[], doc='Columns to return distinct values for.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_distinct'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLFilter` ¶

Bases: SQLFilterBase

Apply WHERE clause filtering to the entire query result.

This transform wraps the input query in a subquery and applies filters to the result set.

`conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_filter'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLFilterBase` ¶

Bases: SQLTransform

Base class for SQL filtering transforms that provides common filtering logic.

`SQLFormat` ¶

Bases: SQLTransform

Format SQL expressions with parameterized replacements.

This transform allows for replacing placeholders in SQL queries using either Python string format-style placeholders {name} or sqlglot-style placeholders :name.

`parameters = param.Dict(default={}, doc='\n Dictionary of parameter names and values to replace in the SQL template.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_format'` `class-attribute` ¶

`apply(sql_in)` ¶

Apply the formatting to the input SQL, replacing placeholders with values.

Parameters:

Name	Type	Description	Default
`sql_in`	`str`	The input SQL query to format. This is used as a base query that will have the formatted sql_expr applied to it, typically as a subquery.	required

Returns:

Type	Description
`str`	The formatted SQL query with all placeholders replaced.

`SQLGroupBy` ¶

Bases: SQLTransform

Performs a Group-By and aggregation

`aggregates = param.Dict(doc='\n Mapping of aggregate functions to use to which column(s) to use them on,\n e.g. {"AVG": "col1", "SUM": ["col1", "col2"]}.')` `class-attribute` `instance-attribute` ¶

`by = param.List(doc='Columns to group by.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_group_by'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLLimit` ¶

Bases: SQLTransform

Performs a LIMIT SQL operation on the query. If the query already has a LIMIT clause, it will only be applied if the existing limit is less than the new limit.

`limit = param.Integer(default=1000, allow_None=True, doc='Limit on the number of rows to return')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_limit'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLMinMax` ¶

Bases: SQLTransform

`columns = param.List(default=[], doc='Columns to return min/max values for.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_minmax'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLOverride` ¶

Bases: SQLTransform

`override = param.String()` `class-attribute` `instance-attribute` ¶

`apply(sql_in)` ¶

`SQLPreFilter` ¶

Bases: SQLFilterBase

Apply filtering conditions to source tables before executing the main query.

This transform wraps source tables in subqueries with WHERE clauses, allowing filtering to be applied even when the main query doesn't select the filter columns.

For example: Input: "SELECT n_genes FROM obs" With conditions: [("obs", [("obs_id", ["cell1", "cell2"])])] Output: "SELECT n_genes FROM (SELECT * FROM obs WHERE obs_id IN ('cell1', 'cell2'))"

`conditions = param.List(doc='\n List of filter conditions expressed as tuples of (table_name, filter_conditions)\n where filter_conditions is a list of (column_name, filter_value) tuples.\n Example: [("obs", [("obs_id", ["cell1", "cell2"])])]')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_prefilter'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLRemoveSourceSeparator` ¶

Bases: SQLTransform

Class to exclude the source and separator.

`separator = param.String(default=SOURCE_TABLE_SEPARATOR, doc='\n Separator used to split the source and table name in the SQL query.')` `class-attribute` `instance-attribute` ¶

`apply(sql_in)` ¶

Exclude the source and separator from the SQL query.

Parameters:

Name	Type	Description	Default
`sql_in`	`str`	The initial SQL query to be manipulated.	required

Returns:

Type	Description
`string`	New SQL query derived from the above query.

`SQLSample` ¶

Bases: SQLTransform

Samples rows from a SQL query using TABLESAMPLE or similar functionality, depending on the dialect's support.

`percent = param.Number(default=10.0, bounds=(0.0, 100.0), doc='\n percent of rows to sample. Must be between 0 and 100.')` `class-attribute` `instance-attribute` ¶

`sample_kwargs = param.Dict(default={}, doc='\n Other keyword arguments, like method, bucket_numerator, bucket_denominator, bucket_field.')` `class-attribute` `instance-attribute` ¶

`seed = param.Integer(default=None, allow_None=True, doc='\n Random seed for reproducible sampling.')` `class-attribute` `instance-attribute` ¶

`size = param.Integer(default=None, allow_None=True, doc='\n Absolute number of rows to sample. If specified, takes precedence over percent.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_sample'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLSelectFrom` ¶

Bases: SQLFormat

`sql_expr = param.String(default='SELECT * FROM {table}', doc='\n The SQL expression to use if the sql_in does NOT\n already contain a SELECT statement.')` `class-attribute` `instance-attribute` ¶

`tables = param.ClassSelector(default=None, class_=(list, dict), doc='\n Dictionary of tables to replace or use in the SQL expression.\n If None, the original table will be used.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'sql_select_from'` `class-attribute` ¶

`apply(sql_in)` ¶

`SQLTransform` ¶

Bases: Transform

Base class for SQL transforms using sqlglot.

`comments = param.Boolean(default=False, doc='Whether to include comments in the output SQL')` `class-attribute` `instance-attribute` ¶

`error_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.RAISE), doc='Error level for parsing')` `class-attribute` `instance-attribute` ¶

identify = param.Boolean(default=False, doc='\n Delimit all identifiers, e.g. turn `FROM database.table` into `FROM "database"."table"`.\n This is useful for dialects that don\'t support unquoted identifiers.') `class-attribute` `instance-attribute` ¶

`optimize = param.Boolean(default=False, doc="\n Whether to optimize the generated SQL query; may produce invalid results, especially with\n duckdb's read_* functions.")` `class-attribute` `instance-attribute` ¶

`pretty = param.Boolean(default=False, doc='Prettify output SQL, i.e. add newlines and indentation')` `class-attribute` `instance-attribute` ¶

`read = param.String(default=None, doc='Source dialect for parsing; if None, automatically detects')` `class-attribute` `instance-attribute` ¶

unsupported_level = param.ClassSelector(class_=(sqlglot.ErrorLevel), default=(sqlglot.ErrorLevel.WARN), doc='When using `to_sql`, how to handle unsupported dialect features.') `class-attribute` `instance-attribute` ¶

`write = param.String(default=None, doc='Target dialect for output; if None, defaults to read dialect')` `class-attribute` `instance-attribute` ¶

`apply(sql_in)` ¶

Given an SQL statement, manipulate it, and return a new SQL statement.

Parameters:

Name	Type	Description	Default
`sql_in`	`str`	The initial SQL query to be manipulated.	required

Returns:

Type	Description
`string`	New SQL query derived from the above query.

`apply_to(sql_in, **kwargs)` `classmethod` ¶

Calls the apply method based on keyword arguments passed to define transform.

Parameters:

Name	Type	Description	Default
`sql_in`	`str`		required

Returns:

Type	Description
`SQL statement after application of transformation.`

`parse_sql(sql_in)` ¶

Parse SQL string into sqlglot AST.

Parameters:

Name	Type	Description	Default
`sql_in`	`str`	SQL string to parse	required

Returns:

Type	Description
`Expression`	Parsed SQL expression

`to_sql(expression)` ¶

Convert sqlglot expression back to SQL string.

Parameters:

Name	Type	Description	Default
`expression`	`Expression`	Expression to convert to SQL	required

Returns:

Type	Description
`string`	SQL string representation

Transforms¶

lumen.transforms ¶

base ¶

DataFrame = pd.DataFrame | dDataFrame module-attribute ¶

Series = pd.Series | dSeries module-attribute ¶

pd_version = Version(pd.__version__) module-attribute ¶

Aggregate ¶

by = param.ListSelector(doc='\n Columns or indexes to group by.') class-attribute instance-attribute ¶

columns = param.ListSelector(allow_None=True, doc='\n Columns to aggregate.') class-attribute instance-attribute ¶

kwargs = param.Dict(default={}, doc='\n Keyword arguments to the aggregation method.') class-attribute instance-attribute ¶

method = param.String(default='mean', doc='\n Name of the pandas aggregation method, e.g. max, min, count.') class-attribute instance-attribute ¶

transform_type = 'aggregate' class-attribute ¶

with_index = param.Boolean(default=True, doc='\n Whether to make the groupby columns indexes.') class-attribute instance-attribute ¶

apply(table) ¶

Astype ¶

dtypes = param.Dict(doc='Mapping from column name to new type.') class-attribute instance-attribute ¶

transform_type = 'as_type' class-attribute ¶

apply(table) ¶

Columns ¶

columns = param.ListSelector(doc='\n The subset of columns to select.') class-attribute instance-attribute ¶

transform_type = 'columns' class-attribute ¶

apply(table) ¶

Compute ¶

transform_type = 'compute' class-attribute ¶

apply(table) ¶

Corr ¶

method = param.Selector(default='pearson', objects=['pearson', 'kendall', 'spearman'], doc='\n Method of correlation.') class-attribute instance-attribute ¶

min_periods = param.Integer(default=1, doc='\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.') class-attribute instance-attribute ¶

numeric_only = param.Boolean(default=False, doc='\n Include only `float`, `int` or `boolean` data.') class-attribute instance-attribute ¶

transform_type = 'corr' class-attribute ¶

apply(table) ¶

Count ¶

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute ¶

level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.') class-attribute instance-attribute ¶

numeric_only = param.Boolean(default=False, doc='\n Include only float, int or boolean data.') class-attribute instance-attribute ¶

transform_type = 'count' class-attribute ¶

apply(table) ¶

DropNA ¶

axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'") class-attribute instance-attribute ¶

how = param.Selector(default='any', objects=['any', 'all'], doc='\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.') class-attribute instance-attribute ¶

subset = param.ListSelector(default=None, doc='\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.') class-attribute instance-attribute ¶

thresh = param.Integer(default=None, doc='\n Require that many non-NA values.') class-attribute instance-attribute ¶

transform_type = 'dropna' class-attribute ¶

apply(table) ¶

Eval ¶

expr = param.String(doc='\n The expression to apply to the table.') class-attribute instance-attribute ¶

transform_type = 'eval' class-attribute ¶

apply(table) ¶

Filter ¶

conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.') class-attribute instance-attribute ¶

apply(df) ¶

HistoryTransform ¶

date_column = param.Selector(doc='\n If defined adds a date column with the supplied name.') class-attribute instance-attribute ¶

length = param.Integer(default=10, bounds=(1, None), doc='\n Accumulates a history of data.') class-attribute instance-attribute ¶

transform_type = 'history' class-attribute instance-attribute ¶

apply(table) ¶

Iloc ¶

end = param.Integer(default=None) class-attribute instance-attribute ¶

start = param.Integer(default=None) class-attribute instance-attribute ¶

transform_type = 'iloc' class-attribute ¶

apply(table) ¶

Melt ¶

id_vars = param.ListSelector(default=[], doc='\n Column(s) to use as identifier variables.') class-attribute instance-attribute ¶

ignore_index = param.Boolean(default=True, doc='\n If True, original index is ignored. If False, the original\n index is retained. Index labels will be repeated as\n necessary.') class-attribute instance-attribute ¶

transform_type = 'melt' class-attribute ¶

value_name = param.String(default='value', doc="\n Name to use for the 'value' column.") class-attribute instance-attribute ¶

value_vars = param.ListSelector(default=None, doc='\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.') class-attribute instance-attribute ¶

var_name = param.String(default=None, doc="\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.") class-attribute instance-attribute ¶

apply(table) ¶

Pivot ¶

columns = param.String(default=None, doc="\n Column to use to make new frame's columns.") class-attribute instance-attribute ¶

index = param.String(default=None, doc="\n Column to use to make new frame's index.\n If None, uses existing index.") class-attribute instance-attribute ¶

transform_type = 'pivot' class-attribute ¶

values = param.ListSelector(default=None, doc="\n Column(s) to use for populating new frame's values.\n If not specified, all remaining columns will be used\n and the result will have hierarchically indexed columns.") class-attribute instance-attribute ¶

apply(table) ¶

PivotTable ¶

aggfunc = param.String(default='mean', doc="\n Function, list of functions, dict, default 'mean'") class-attribute instance-attribute ¶

values = param.ListSelector(default=[], doc='\n Column or columns to aggregate.') class-attribute instance-attribute ¶

apply(table) ¶

Query ¶

`lumen.transforms` ¶

`base` ¶

`DataFrame = pd.DataFrame | dDataFrame` `module-attribute` ¶

`Series = pd.Series | dSeries` `module-attribute` ¶

`pd_version = Version(pd.version)` `module-attribute` ¶

`Aggregate` ¶

`by = param.ListSelector(doc='\n Columns or indexes to group by.')` `class-attribute` `instance-attribute` ¶

`columns = param.ListSelector(allow_None=True, doc='\n Columns to aggregate.')` `class-attribute` `instance-attribute` ¶

`kwargs = param.Dict(default={}, doc='\n Keyword arguments to the aggregation method.')` `class-attribute` `instance-attribute` ¶

`method = param.String(default='mean', doc='\n Name of the pandas aggregation method, e.g. max, min, count.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'aggregate'` `class-attribute` ¶

`with_index = param.Boolean(default=True, doc='\n Whether to make the groupby columns indexes.')` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Astype` ¶

`dtypes = param.Dict(doc='Mapping from column name to new type.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'as_type'` `class-attribute` ¶

`apply(table)` ¶

`Columns` ¶

`columns = param.ListSelector(doc='\n The subset of columns to select.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'columns'` `class-attribute` ¶

`apply(table)` ¶

`Compute` ¶

`transform_type = 'compute'` `class-attribute` ¶

`apply(table)` ¶

`Corr` ¶

`method = param.Selector(default='pearson', objects=['pearson', 'kendall', 'spearman'], doc='\n Method of correlation.')` `class-attribute` `instance-attribute` ¶

`min_periods = param.Integer(default=1, doc='\n Minimum number of observations required per pair of columns\n to have a valid result. Currently only available for Pearson\n and Spearman correlation.')` `class-attribute` `instance-attribute` ¶

numeric_only = param.Boolean(default=False, doc='\n Include only `float`, `int` or `boolean` data.') `class-attribute` `instance-attribute` ¶

`transform_type = 'corr'` `class-attribute` ¶

`apply(table)` ¶

`Count` ¶

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`level = param.ClassSelector(default=None, class_=(int, list, str), doc='\n The indexes to stack.')` `class-attribute` `instance-attribute` ¶

`numeric_only = param.Boolean(default=False, doc='\n Include only float, int or boolean data.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'count'` `class-attribute` ¶

`apply(table)` ¶

`DropNA` ¶

`axis = param.ClassSelector(default=0, class_=(int, str), doc="\n The axis to rename. 0 or 'index', 1 or 'columns'")` `class-attribute` `instance-attribute` ¶

`how = param.Selector(default='any', objects=['any', 'all'], doc='\n Determine if row or column is removed from DataFrame, when we have\n at least one NA or all NA.')` `class-attribute` `instance-attribute` ¶

`subset = param.ListSelector(default=None, doc='\n Labels along other axis to consider, e.g. if you are dropping rows\n these would be a list of columns to include.')` `class-attribute` `instance-attribute` ¶

`thresh = param.Integer(default=None, doc='\n Require that many non-NA values.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'dropna'` `class-attribute` ¶

`apply(table)` ¶

`Eval` ¶

`expr = param.String(doc='\n The expression to apply to the table.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'eval'` `class-attribute` ¶

`apply(table)` ¶

`Filter` ¶

`conditions = param.List(doc='\n List of filter conditions expressed as tuples of the column\n name and the filter value.')` `class-attribute` `instance-attribute` ¶

`apply(df)` ¶

`HistoryTransform` ¶

`date_column = param.Selector(doc='\n If defined adds a date column with the supplied name.')` `class-attribute` `instance-attribute` ¶

`length = param.Integer(default=10, bounds=(1, None), doc='\n Accumulates a history of data.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'history'` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Iloc` ¶

`end = param.Integer(default=None)` `class-attribute` `instance-attribute` ¶

`start = param.Integer(default=None)` `class-attribute` `instance-attribute` ¶

`transform_type = 'iloc'` `class-attribute` ¶

`apply(table)` ¶

`Melt` ¶

`id_vars = param.ListSelector(default=[], doc='\n Column(s) to use as identifier variables.')` `class-attribute` `instance-attribute` ¶

`ignore_index = param.Boolean(default=True, doc='\n If True, original index is ignored. If False, the original\n index is retained. Index labels will be repeated as\n necessary.')` `class-attribute` `instance-attribute` ¶

`transform_type = 'melt'` `class-attribute` ¶

`value_name = param.String(default='value', doc="\n Name to use for the 'value' column.")` `class-attribute` `instance-attribute` ¶

value_vars = param.ListSelector(default=None, doc='\n Column(s) to unpivot. If not specified, uses all columns that\n are not set as `id_vars`.') `class-attribute` `instance-attribute` ¶

var_name = param.String(default=None, doc="\n Name to use for the 'variable' column. If None it uses\n ``frame.columns.name`` or 'variable'.") `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Pivot` ¶

`columns = param.String(default=None, doc="\n Column to use to make new frame's columns.")` `class-attribute` `instance-attribute` ¶

`index = param.String(default=None, doc="\n Column to use to make new frame's index.\n If None, uses existing index.")` `class-attribute` `instance-attribute` ¶

`transform_type = 'pivot'` `class-attribute` ¶

`values = param.ListSelector(default=None, doc="\n Column(s) to use for populating new frame's values.\n If not specified, all remaining columns will be used\n and the result will have hierarchically indexed columns.")` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`PivotTable` ¶

`aggfunc = param.String(default='mean', doc="\n Function, list of functions, dict, default 'mean'")` `class-attribute` `instance-attribute` ¶

`values = param.ListSelector(default=[], doc='\n Column or columns to aggregate.')` `class-attribute` `instance-attribute` ¶

`apply(table)` ¶

`Query` ¶

`query = param.String(doc='\n The query to apply to the table.')` `class-attribute` `instance-attribute` ¶