Inspection Reductions#
Each Datashader canvas
function call accepts an agg
argument which is a Reduction
that is used to aggregate values in each pixel (histogram bin) to return to the user. Each Reduction
is in one of two categories:
Mathematical combination of data such as the
count
of data points per pixel or themean
of a column of the supplied dataset.Selection of data from a column of the supplied dataset, or the index of the corresponding row in the dataset.
This notebook explains how to use selection reductions.
1. first
and last
selection reductions#
The simplest selection reduction is the first
reduction. This returns, for each pixel in the canvas, the value of a particular column in the dataset corresponding to the first data point that maps to that pixel. This is best illustrated with an example.
Firstly create a sample dataset:
import datashader as ds
import pandas as pd
df = pd.DataFrame(dict(
x = [ 0, 0, 1, 1, 0, 0, 2, 2],
y = [ 0, 0, 0, 0, 1, 1, 1, 1],
value = [ 9, 8, 7, 6, 2, 3, 4, 5],
other = [11, 12, 13, 14, 15, 16, 17, 18],
))
There are 8 rows in the dataset with columns for x
and y
coordinates as well as a value
and an other
column.
Next create a Datashader canvas
with a height of 2 pixels and a width of 3 pixels:
canvas = ds.Canvas(plot_height=2, plot_width=3)
Two rows of the dataset map to each canvas pixel with the exception of pixels [0, 2]
and [1, 1]
which do not have any rows mapped to them.
Now call canvas.line
using a first
reduction:
canvas.points(df, 'x', 'y', ds.first('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 9., 7., nan], [ 2., nan, 4.]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
The returned xarray.DataArray
is the same shape as the canvas and contains values taken from the 'value'
column corresponding to the first row that maps to each pixel. Pixels which do not have any rows mapped to them contain NaN
values.
Here are the results using a last
selection reduction:
canvas.points(df, 'x', 'y', ds.last('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 8., 6., nan], [ 3., nan, 5.]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
2. max
and min
selection reductions#
A max
selection reduction returns, for each pixel in the canvas, the maximum value of the specified column of all rows that map to that pixel. For example:
canvas.points(df, 'x', 'y', ds.max('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 9., 7., nan], [ 3., nan, 5.]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
The corresponding min
selection reduction is:
canvas.points(df, 'x', 'y', ds.min('value'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 8., 6., nan], [ 2., nan, 4.]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
3. first_n
, last_n
, max_n
and min_n
selection reductions#
These provide the same functionality as first
, last
, max
and min
reductions except that they return multiple values per pixel. For example, the max_n
reduction with n=3
returns the 3 largest values, in descending order, for each pixel:
canvas.points(df, 'x', 'y', ds.max_n('value', n=3))
<xarray.DataArray (y: 2, x: 3, n: 3)> Size: 144B array([[[ 9., 8., nan], [ 7., 6., nan], [nan, nan, nan]], [[ 3., 2., nan], [nan, nan, nan], [ 5., 4., nan]]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 * n (n) int64 24B 0 1 2 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
The returned xarray.DataArray
has shape (ny, nx, n)
which is (2, 3, 3)
in this example. The third dimension contains the maximum n
values in order for each pixel, and where there are fewer than n
values available nan
is used instead as usual.
4. where
selection reductions#
A where
reduction takes two arguments, a selector
reduction and a lookup_column
name. The selector
reduction, such as a first
or max
, selects which row of the dataset to return information about for each pixel. But the information returned is that from the lookup_column
rather than the column used by the selector
.
Again this is best illustrated by an example:
canvas.points(df, 'x', 'y', ds.where(ds.max('value'), 'other'))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[11., 13., nan], [16., nan, 18.]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
This returns, for each pixel, the value of the 'other'
column corresponding to the maximum of the 'value'
column of the data points that map to that pixel.
Although it is possible to use a first
or last
as a selector
with a lookup_column
, such as
ds.where(ds.first('value'), 'other')
this is unnecessary as it is identical to the simpler
ds.where(ds.first('other'))
5. where
selection reductions returning a row index#
The lookup_column
argument to where
is optional. If not specified, where
defaults to returning the index of the row in the dataset corresponding to the selector
for each pixel.
canvas.points(df, 'x', 'y', ds.where(ds.max('value')))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 0, 2, -1], [ 5, -1, 7]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
There are 8 rows in the dataframe so row indices returned are in the range 0 to 7. An index of -1 is returned for pixels that do not have any data points mapped to them.
first
and last
can be used as where
reduction selector
s that return row indexes, for example:
canvas.points(df, 'x', 'y', ds.where(ds.first('value')))
<xarray.DataArray (y: 2, x: 3)> Size: 48B array([[ 0, 2, -1], [ 4, -1, 6]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)
where
reductions can also use a selector
that is a first_n
, last_n
, max_n
or min_n
reduction, for example:
canvas.points(df, 'x', 'y', ds.where(ds.first_n('value', 3)))
<xarray.DataArray (y: 2, x: 3, n: 3)> Size: 144B array([[[ 0, 1, -1], [ 2, 3, -1], [-1, -1, -1]], [[ 4, 5, -1], [-1, -1, -1], [ 6, 7, -1]]]) Coordinates: * x (x) float64 24B 0.3333 1.0 1.667 * y (y) float64 16B 0.25 0.75 * n (n) int64 24B 0 1 2 Attributes: x_range: (0.0, 2.0) y_range: (0.0, 1.0)