Penguin Crossfilter#

import numpy as np
import pandas as pd
import panel as pn

import holoviews as hv
import hvplot.pandas # noqa

pn.extension(template='fast')

pn.state.template.logo = 'https://github.com/allisonhorst/palmerpenguins/raw/main/man/figures/logo.png'

Introduction#

welcome = "## Welcome and meet the Palmer penguins!"

penguins_art = pn.pane.PNG('https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/man/figures/palmerpenguins.png', height=160)

credit = "### Artwork by @allison_horst"

instructions = """
Use the box-select and lasso-select tools to select a subset of penguins
and reveal more information about the selected subgroup through the power
of cross-filtering.
"""

license = """
### License

Data are available by CC-0 license in accordance with the Palmer Station LTER Data Policy and the LTER Data Access Policy for Type I data."
"""

art = pn.Column(
    welcome, penguins_art, credit, instructions, license,
    sizing_mode='stretch_width'
).servable(area='sidebar')

art

Building some plots#

Let us first load the Palmer penguin dataset (Gorman et al.) which contains measurements about a number of penguin species:

penguins = pd.read_csv('https://datasets.holoviz.org/penguins/v1/penguins.csv')
penguins = penguins[~penguins.sex.isnull()].reset_index().sort_values('species')

penguins.head()
index species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
94 100 Adelie Biscoe 35.0 17.9 192.0 3725.0 female 2009
95 101 Adelie Biscoe 41.0 20.0 203.0 4725.0 male 2009
96 102 Adelie Biscoe 37.7 16.0 183.0 3075.0 female 2009
97 103 Adelie Biscoe 37.8 20.0 190.0 4250.0 male 2009

Next we will set up a linked selections instance that will allow us to perform cross-filtering on the plots we will create in the next step:

ls = hv.link_selections.instance()

def count(selected):
    return f"## {len(selected)}/{len(penguins)} penguins selected"

selected = pn.pane.Markdown(
    pn.bind(count, ls.selection_param(penguins)),
    align='center', width=400, margin=(0, 100, 0, 0)
)

header = pn.Row(
    pn.layout.HSpacer(), selected,
    sizing_mode='stretch_width'
).servable(area='header')

selected

Now we can start plotting the data with hvPlot, which provides a familiar API to pandas .plot users but generates interactive plots and use the linked selections object to allow cross-filtering across the plots:

colors = {
    'Adelie': '#1f77b4',
    'Gentoo': '#ff7f0e',
    'Chinstrap': '#2ca02c'
}

scatter = penguins.hvplot.points(
    'bill_length_mm', 'bill_depth_mm', c='species',
    cmap=colors, responsive=True, min_height=300
)

histogram = penguins.hvplot.hist(
    'body_mass_g', by='species', color=hv.dim('species').categorize(colors),
    legend=False, alpha=0.5, responsive=True, min_height=300
)

bars = penguins.hvplot.bar(
    'species', 'index', c='species', cmap=colors,
    responsive=True, min_height=300, ylabel=''
).aggregate(function=np.count_nonzero)

violin = penguins.hvplot.violin(
    'flipper_length_mm', by=['species', 'sex'], cmap='Category20',
    responsive=True, min_height=300, legend='bottom_right'
).opts(split='sex')

plots = pn.pane.HoloViews(
    ls(scatter.opts(show_legend=False) + bars + histogram + violin).opts(sizing_mode='stretch_both').cols(2)
).servable(title='Palmer Penguins')

plots