ML Annotators#
Annotating ML data in Python with Bokeh and HoloViews#
The Bokeh Python plotting library lets users build interactive apps and plots in a web browser for a very wide variety of data types. The high-level library HoloViews builds on Bokeh, making it easier to use for common data-processing tasks, and the corresponding GeoViews library adds support for plotting in geographic coordinate systems.
These tools now (as of development releases in 1/2020) all support interactively collecting data from the user, not just interacting with existing data, with components provided by HoloViews (and by GeoViews for data on maps) that make it simple to get data into Python ready to process and use for tagging data for machine-learning pipelines (or any other purpose!). These “annotation” and “drawing” tools can be used to annotate existing data sets or geographic locations, to classify each example or regions into categories or with numeric or other labels.
These tools make it possible to work directly with data in its native values (as data) and then immediately use it for further processing in Python. Other tools like labelImg will usually be faster and easier to use for the specific things they do, so if one of those meets your need, use it! Meanwhile, use the Bokeh/HoloViews annotation tools if you want to quickly create a fully custom app for special purposes, especially if you want to stay working with data you are already using in Python, in its native coordinates.
Here, we will focus only on the easy-to-use, high-level “annotator” components from HoloViews; fully custom control is always available by using Bokeh’s drawing tools directly.
import holoviews as hv
import geoviews as gv
hv.extension('bokeh')
Basic HoloViews/GeoViews usage#
Let’s imagine our task is to mark the locations of trees on satellite images. For convenience, we’ll use a tile-based web mapping server where these images have been cleaned up and put into geographic lat,lon coordinates already. In GeoViews, you can easily get a tile source to work with:
tiles = gv.tile_sources.EsriImagery()
tiles
Next, we’ll need an object to collect some lat,lon locations. Here’s an example with three points already identified:
pts = dict(
Longitude = [-121.932619100, -121.932362392, -121.933530027],
Latitude = [ 36.631164244, 36.629475356, 36.630623206])
opts = dict(size=10, line_color='black', padding=0.1, min_height=400)
points = gv.Points(pts).opts(**opts)
points
These particular points are locations of actual trees somewhere in Monterey, California, as you can see if you overlay them onto the tiles (where *
means “overlay” in HoloViews):
tiles * points
The Bokeh tools in the tool bar let you pan and zoom on this plot interactively, but the data in it is fixed. What if we wanted to label all the trees that we can see here, i.e. add more data points? That’s where the HoloViews Annotators come in.
HoloViews Point Annotator#
A HoloViews (or GeoViews) annotator lets you add, change, or add information to data in a Bokeh plot, then get the data back into Python easily. Here, let’s make an annotator for the points, then overlay the annotated points on the map tiles like we had before:
points_annotator = hv.annotate.instance()
hv.annotate.compose(tiles, points_annotator(points, annotations=dict(Size=int, Type=str)))
You’ll see that there is now a table of coordinates and also that there is now a PointDraw tool in the toolbar:
Once you select that tool, you should be able to click and drag any of the existing points and see the location update in the table. Whether you click on the table or the points, the same object should be selected in each, so that you can see how the graphical and tabular representations relate.
The PointDraw tool also allows us to add completely new points; once the tool is selected, just click on the plot above in locations not already containing a point and you can see a new point and a new table row appear ready for editing. You can also delete points by selecting them in the plot or the table then moving back to the plot (if needed) and pressing Backspace or Delete (depending on operating system).
Whether for existing or newly added points, you can use the table to edit the latitude and longitude values numerically or add an optional “Size” estimate or “Type” description for each point.
There are also save and restore tools that help make sure you don’t lose work once you’ve added a lot of data, but we won’t have time to cover those here.
Now that we’ve added some points, let’s get the data back into Python as a Pandas DataFrame:
points_annotator.annotated.dframe()
Longitude | Latitude | Size | Type | |
---|---|---|---|---|
0 | -121.932619 | 36.631164 | 0 | |
1 | -121.932362 | 36.629475 | 0 | |
2 | -121.933530 | 36.630623 | 0 |
You should see that you can access the current set of user-provided or user-modified points and their user-provided labels from within Python, ready for saving to disk or any subsequent processing you need to do.
We can also access the currently selected
points, in case we care only about a subset of the points (which will be empty if no points/rows are selected):
points_annotator.selected.dframe()
Longitude | Latitude | Size | Type |
---|
HoloViews Rectangle Annotator#
HoloViews data types that can currently be annotated include:
Points
/Scatter
Curve
Path
Polygons
Rectangles
Let’s look at the Rectangles
annotator, which behaves very similarly to the Points annotator:
Select the
BoxEdit
tool in the toolbar:Click and drag on an existing Rectangle to move it
Double click to start drawing a new Rectangle at one corner, and double click to complete the rectangle at the opposite corner
Select a rectangle and press the Backspace or Delete key (depending on OS) to delete it, while pointing at the plot (not the table)
Edit the box coordinates in the table to resize it
rectangles = gv.Rectangles([(0, 0, 3, 3), (12, 12, 15, 15)]).opts(fill_alpha=0.2)
box_annotator = hv.annotate.instance()
labels = gv.tile_sources.StamenLabels()
layout = box_annotator(rectangles, name="Rectangles")
hv.annotate.compose(tiles, layout, labels)
/home/runner/work/examples/examples/ml_annotators/envs/default/lib/python3.10/site-packages/geoviews/plotting/bokeh/plot.py:106: UserWarning: BoxEditTool requires Bokeh >= 3.3
fig = super().initialize_plot(ranges, plot, plots, **opts)
As for Points, we can access the data using the annotated
property on the annotator instance, and then use these coordinates as part of any subsequent workflow:
box_annotator.annotated.dframe()
x0 | y0 | x1 | y1 | |
---|---|---|---|---|
0 | 0 | 0 | 3 | 3 |
1 | 12 | 12 | 15 | 15 |
Annotating paths/polygons#
Annotated points associate data with each point, and annotated Rectangles associate data with the entire Rectangle. Annotated Paths and Polygons allow both, i.e. associating one value with the entire object (“this polygon is Arizona”), and associating specific values with each vertex of the object (“this position along the border has elevation X”). This capability makes these annotators more complex (see the HoloViews Annotators user guide and the PolyDraw and PolyEdit docs for more details), but we’ll do a brief demo here.
Drawing/Selecting Deleting Paths/Polygons#
Select the PolyDraw tool in the toolbar:
Double click to start a new object, single click to add each vertex, and double-click to complete it.
Delete paths/polygons by selecting and pressing Delete key (OSX) or Backspace key (PC)
Editing Paths/Polygons#
Select the PolyEdit tool in the toolbar:
Double click a Path/Polygon to start editing
Drag vertices to edit them, delete vertices by selecting them
To edit and annotate the vertices, use the draw tool or the first table to select a particular path/polygon and then navigate to the Vertices tab.
path = gv.Path([([-3.208222, -3.203861, -3.203865, -3.202945, -3.205764, -3.208222],
[55.868081, 55.867272, 55.867866, 55.868922, 55.869360, 55.868081]),
([-3.208646, -3.206124, -3.208234, -3.211137, -3.208646],
[55.864370, 55.863135, 55.861888, 55.862793, 55.864370])])
path_annotator = hv.annotate.instance()
layout = path_annotator(path, annotations=['Label'], vertex_annotations=['Value'])
hv.annotate.compose(tiles, layout)
To access the data we can make use of the iloc method on Path
objects to access a particular path, and then access the .data
or convert it to a dataframe:
path_annotator.annotated.iloc[0].dframe()
Longitude | Latitude | Label | Value | |
---|---|---|---|---|
0 | -3.208222 | 55.868081 | ||
1 | -3.203861 | 55.867272 | ||
2 | -3.203865 | 55.867866 | ||
3 | -3.202945 | 55.868922 | ||
4 | -3.205764 | 55.869360 | ||
5 | -3.208222 | 55.868081 |
By the way, for any of the annotators but most usefully for paths and polygons, we can also get the data back as a Shapely geometry if that’s more convenient to work with:
path_annotator.annotated.geom()
path_annotator.annotated.iloc[0].geom()
Integrating Annotators into your workflows#
As you can see above, it’s fairly straightforward to build an annotator to collect a specific type of data. To collect data at a large scale, you’ll want to focus on usability, which will often mean creating a special-purpose app to collect data across multiple images, multiple datasets, by multiple raters, etc. Doing so is beyond the scope of this introduction, but can be straightforward using the separate Panel library for building apps, also based on Bokeh and having full support for HoloViews. The annotator objects can be included directly in a Panel layout and connected to other Panel objects for seamless updating and integration into a larger workflow.
For more details, see:
HoloViz.org for a full tutorial on all the various libraries in the HoloViz ecosystem (including HoloViews, GeoViews, and Panel).
EarthML.pyviz.org for Earth-science related ML examples
EarthSim.pyviz.org for hydrology-related simulation examples
And please let us know if you find any rough edges or missing functionality in the annotators; these are relatively new to Bokeh, HoloViews, and GeoViews, and can probably be improved as more people try them out for new tasks as long as we know what’s needed!