Interactive Pipelines#
The plots built up over the first few tutorials were all highly interactive in the web browser, with interactivity provided by Bokeh plotting tools within the plots or in some cases by HoloViews generating a Bokeh widget to select for a groupby
over a categorical variable. However, when you are exploring a dataset, you might want to see how any aspect of the data or plot changes if varied interactively. Luckily, hvPlot makes it almost trivially easy to do this, so that you can very easily explore any parameter or setting in your code.
hvPlot registers the .interactive()
method on many of the PyData data structures, e.g. a Pandas or GeoPandas or Dask DataFrame, an Xarray DataSet. Calling .interactive()
returns an interactive object (e.g. an interactive Pandas DataFrame), that can be used as if it was the original object (e.g. calling regular Pandas methods) and whose output (e.g. a DataFrame view) will be re-computed everytime one of its inputs change. The inputs are widgets (e.g. a drop-down list), that replace values you would usually hard-code and manually update to observe how they affect the output. When such an interactive object is displayed in a notebook, it includes the widgets that you have used together with the regular output.
Panel widgets#
Before using .interactive()
we will need a widget library, and here we will be using Panel to generate Bokeh widgets under user control, just as hvPlot uses Panel to generate widgets for a groupby
as shown previously. Let’s first get ahold of a Panel widget to see how they work. Here, let’s create a Panel floating-point number slider to specify an earthquake magnitude between zero and nine:
import pathlib
import holoviews as hv
import hvplot.pandas # noqa
import numpy as np
import pandas as pd
import panel as pn
pn.extension(sizing_mode='stretch_width')
mag_slider = pn.widgets.FloatSlider(name='Minimum Magnitude', start=0, end=9, value=6)
mag_slider
The widget is a JavaScript object, but there are bidirectional connections between JS and Python that let us see and change the value of this slider using its value
parameter:
mag_slider.value
6
mag_slider.value = 7
Exercise#
Try moving the slider around and rerunning the mag_slider.value
above to access the current slider value. As you can see, you can easily get the value of any widget to use in subsequent cells, but you’d need to re-run any cell that accesses that value for it to get updated.
hvPlot .interactive()#
hvPlot provides an easy way to connect widgets directly into an expression you want to control.
First, let’s read in our data:
%%time
df = pd.read_parquet(pathlib.Path('../data/earthquakes-projected.parq'))
df = df.set_index('time').tz_localize(None)
CPU times: user 3.4 s, sys: 619 ms, total: 4.02 s
Wall time: 2.41 s
Now, let’s do a little filtering that we might want to control with such a widget, such as selecting the highest-magnitude events:
WEB_MERCATOR_LIMITS = (-20037508.342789244, 20037508.342789244)
df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']][df['northing'] < WEB_MERCATOR_LIMITS[1]]
df2[df2['mag'] > 5].head()
mag | depth | latitude | longitude | place | type | |
---|---|---|---|---|---|---|
time | ||||||
2000-01-31 07:25:59.740 | 5.4 | 33.0 | 38.114 | 88.604 | southern Xinjiang, China | earthquake |
2000-01-29 08:13:10.730 | 5.4 | 60.7 | -8.633 | 111.137 | Java, Indonesia | earthquake |
2000-01-29 02:53:54.890 | 5.1 | 100.0 | 4.857 | 126.259 | Kepulauan Talaud, Indonesia | earthquake |
2000-01-28 22:57:51.700 | 5.6 | 83.4 | -9.691 | 118.764 | Sumbawa region, Indonesia | earthquake |
2000-01-28 22:42:26.250 | 5.5 | 10.0 | -1.347 | 89.083 | South Indian Ocean | earthquake |
What if instead of ‘5’, we want the output above always to reflect the current value of mag_slider
? We can do that by using hvPlot’s .interactive()
support, passing in a widget almost anywhere we want in a pipeline:
dfi = df2.interactive()
dfi[dfi['mag'] > mag_slider].head()
Here, .interactive
is a wrapper around your DataFrame or Xarray object that lets you provide Panel widgets almost anywhere you’d otherwise be using a number. Just as importing hvplot.pandas
provides a .hvplot()
method or object on your dataframe, it also provides a .interactive
method or object that gives you a general-purpose interactive Dataframe
driven by widgets. .interactive
stores a copy of your pipeline (series of method calls or other expressions on your data) and dynamically replays the pipeline whenever that widget changes.
.interactive
supports just about any output you might want to get out of such a pipeline, such as text or numbers:
dfi[dfi['mag'] > mag_slider].shape
Or Matplotlib plots:
dfi[dfi['mag'] > mag_slider].plot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))
Each time you drag the widget, hvPlot replays the pipeline and updates the output shown.
Of course, .interactive
also supports .hvplot()
, here with a new copy of a widget so that it will be independent of the other cells above:
mag_slider2 = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)
dfi[dfi['mag'] > mag_slider2].hvplot(y='depth', kind='hist', bins=np.linspace(0, 50, 51))
You can see that the depth distribution varies dramatically as you vary the minimum magnitude, with the lowest magnitude events apparently only detectable at short depths. There also seems to be some artifact at depth 10, which is the largest bin regardless of the filtering for all but the largest magnitudes.
Date widgets#
A .interactive()
pipeline can contain any number of widgets, including any from the Panel reference gallery. For instance, let’s make a widget to specify a date range covering the dates found in this data:
date = pn.widgets.DateRangeSlider(name='Date', start=df.index[0], end=df.index[-1])
date
Now we can access the value of this slider:
date.value
(Timestamp('2000-01-31 23:52:00.619000'),
Timestamp('2018-12-01 00:00:13.284000'))
As this widget is specifying a range, this time the value is returned as a tuple. If you prefer, you can get the components of the tuple directly via the value_start
and value_end
parameters respectively:
f'Start is at {date.value_start} and the end is at {date.value_end}'
'Start is at 2000-01-31 23:52:00.619000 and the end is at 2018-12-01 00:00:13.284000'
Once again, try specifying different ranges with the widgets and rerunning the cell above.
Now let’s use this widget to expand our expression to filter by date as well as magnitude:
mag = pn.widgets.FloatSlider(name='Minimum magnitude', start=0, end=9, value=6)
filtered = dfi[
(dfi['mag'] > mag) &
(dfi.index >= date.param.value_start) &
(dfi.index <= date.param.value_end)]
filtered.head()
You can now use either the magnitude or the date range (or both) to filter the data, and the output will update. Note that here you want to move the start date of the range slider rather than the end; otherwise, you may not see the table change because the earthquakes are displayed in date order.
Exercise#
To specify the minimum earthquake magnitude, notice that we supplied the whole mag
widget but .interactive()
used only the value
parameter of this widget by default. To be explicit, you may use mag.param.value
instead if you wish. Try it!
Exercise#
For readability, seven columns were chosen before displaying the DataFrame
. Have a look at df.columns
and pick a different set of columns for display.
Functions as inputs#
Quite often the data structure you want to explore in a pipeline, may itself be the outcome of another pipeline. It may for instance be a Pandas Dataframe created by extracting and transforming the output of a database or an API call, or it could be the dynamic output of some simulation or pre-processing. With hvplot.bind
you can start with an arbitrary custom function that returns the data structure you want to explore and then bind that function’s argument to widgets. Then when those widgets change, the function will get called to get the updated output.
To keep this example self-contained we’ll illustrate this process using a simple function that filters the earthquakes dataset by event type and returns a DataFrame. Of course, this function could include any computation that returns a DataFrame, including selecting data files on disk or making a query to a database.
def input_function(event_type):
df2 = df[['mag', 'depth', 'latitude', 'longitude', 'place', 'type']]
return df2[df2['type'] == event_type]
We can then create a Panel Select
widget with a few options and bind it to the input_function
. Calling .interactive()
on the bound object is what allows it to be used in an interactive pipeline, as we previously did with dfi
.
event_types = pn.widgets.Select(options=['earthquake', 'quarry blast', 'explosion', 'ice quake'])
inputi = hvplot.bind(input_function, event_types).interactive()
inputi[inputi['mag'] > mag].head(2)
.interactive() and HoloViews#
.interactive()
lets you work naturally with the compositional HoloViews plots provided by .hvplot()
. Here, let’s combine such plots using the HoloViews +
operator:
mag_hist = filtered.hvplot(y='mag', kind='hist', width=300)
depth_hist = filtered.hvplot(y='depth', kind='hist', width=300)
mag_hist + depth_hist
These are the same two histograms we saw earlier, but now we can filter them on data dimensions like time
that aren’t even explicitly shown in the plot, using the Panel widgets.
Filtering earthquakes on a map#
To display the earthquakes on a map, we will first create a subset of the data to make it quick to update without needing Datashader.:
subset_df = df[
(df.northing < WEB_MERCATOR_LIMITS[1]) &
(df.mag > 4) &
(df.index >= pd.Timestamp('2017-01-01')) &
(df.index <= pd.Timestamp('2018-01-01'))]
Now we can make a new interactive DataFrame
from this new subselection:
subset_dfi = subset_df.interactive(sizing_mode='stretch_width')
And now we can declare our widgets and use them to filter the interactive DataFrame
as before:
date_subrange = pn.widgets.DateRangeSlider(
name='Date', start=subset_df.index[0], end=subset_df.index[-1])
mag_subrange = pn.widgets.FloatSlider(name='Magnitude', start=3, end=9, value=3)
filtered_subrange = subset_dfi[
(subset_dfi.mag > mag_subrange) &
(subset_dfi.index >= date_subrange.param.value_start) &
(subset_dfi.index <= date_subrange.param.value_end)]
Now we can plot the earthquakes on an ESRI tilesource, including the filtering widgets as follows:
geo = filtered_subrange.hvplot(
'easting', 'northing', color='mag', kind='points',
xaxis=None, yaxis=None, responsive=True, min_height=500, tiles='ESRI')
geo
Terminating methods for .interactive
#
The examples above all illustrate cases where you can display the output of .interactive()
and not worry about its type, which is no longer a DataFrame or a HoloViews object, but an Interactive
object:
type(geo)
hvplot.interactive.Interactive
What if you need to work with some part of the interactive pipeline, e.g. to feed it to some function or object that does not understand Interactive
objects? In such a case, you can use what is called a terminating method
on your Interactive object to get at the underlying object for you to use.
For instance, let’s create magnitude and depth histograms on this subset of the data as in an earlier notebook and see if we can enable linked selections on them:
mag_subhist = filtered_subrange.hvplot(y='mag', kind='hist', responsive=True, min_height=200)
depth_subhist = filtered_subrange.hvplot(y='depth', kind='hist', responsive=True, min_height=200)
combined = mag_subhist + depth_subhist
combined
Note that this looks like a HoloViews layout with some widgets, but this object is not a HoloViews object. Instead it is still an Interactive
object:
type(combined)
hvplot.interactive.Interactive
link_selections
does not currently understand Interactive
objects, and so it will raise an exception when given one. If we need a HoloViews Layout
, e.g. for calling link_selections
, we can build a layout from the constituent objects using the .holoviews()
terminating method on Interactive
:
layout = mag_subhist.holoviews() + depth_subhist.holoviews()
layout
This is now a HoloViews object, so we can use it with link_selections
:
print(type(layout))
ls = hv.link_selections.instance()
ls(mag_subhist.holoviews()) + ls(depth_subhist.holoviews())
<class 'holoviews.core.layout.Layout'>
You can use the box selection tool to see how selections compare between these plots. However, you will note that the widgets are no longer displayed. To address this, we can display the widgets separately using a different terminating method, namely .widgets()
:
filtered_subrange.widgets()
For reference, the terminating methods for an Interactive
object are:
.holoviews()
: Give me a HoloViews object.panel()
: Give me a Panel ParamFunction.widgets()
: Give me a layout of widgets associated with this interactive object.layout()
: Give me the layout of the widgets and displaypn.Column(obj.widgets(), obj.panel())
wherepn.Column
will be described in the Dashboards notebook.
Conclusion#
Using the techniques above, you can build up a collection of plots and other outputs with Panel widgets to control individual bits of computation and display.
What if you want to collect these pieces and put them together into a standalone app or dashboard? If so, then the next tutorial will show you how to do so!