Resampling Grids#
import numpy as np
import xarray as xr
import geoviews as gv
import datashader as dsh
from geoviews import opts
gv.extension('bokeh', 'matplotlib')
opts.defaults(
opts.Image(width=600, height=400, colorbar=True),
opts.Feature(apply_ranges=False),
opts.QuadMesh(width=600, height=400, colorbar=True))
In geographical applications grids and meshes of different kinds are very common and for visualization and analysis it is often very important to be able to resample them in different ways. Regridding can refer both to upsampling and downsampling a grid or mesh, which is achieved through interpolation and aggregation.
Naive approaches to regridding treat the space as flat, which is often simpler but can also give less accurate results when working with a spherical space such as the earth. In this user guide we will summarize how to work with different grid types including rectilinear, curvilinear grids and trimeshes. Additionally we will discuss different approaches to regridding working based on the assumption of a flat earth (using datashader) and a spherical earth (xESMF).
Rectilinear grids#
Rectilinear grids are one of the most standard formats and are defined by regularly sampled coordinates along the two axes. The air_temperature
dataset provided by xarray and used throughout the GeoViews documentation provides a good example.
ds = xr.tutorial.open_dataset('air_temperature').load().isel(time=slice(0, 100))
ds
<xarray.Dataset> Size: 1MB Dimensions: (lat: 25, time: 100, lon: 53) Coordinates: * lat (lat) float32 100B 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0 * lon (lon) float32 212B 200.0 202.5 205.0 207.5 ... 325.0 327.5 330.0 * time (time) datetime64[ns] 800B 2013-01-01 ... 2013-01-25T18:00:00 Data variables: air (time, lat, lon) float64 1MB 241.2 242.5 243.5 ... 295.4 294.9 Attributes: Conventions: COARDS title: 4x daily NMC reanalysis (1948) description: Data is from NMC initialized reanalysis\n(4x/day). These a... platform: Model references: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...
As we already know from the Gridded Dataset sections, an xarray of this kind can easily be wrapped in a GeoViews dataset:
gvds = gv.Dataset(ds).redim.range(air=(230, 300))
gvds
:Dataset [lat,lon,time] (4xDaily Air temperature at sigma level 995)
We can also easily plot this data by using the .to
method to group the data into a set of Image
elements indexed by ‘time’:
images = gvds.to(gv.Image, ['lon', 'lat'], dynamic=True)
It is important to note that if we look at the longitude coordinates above we can see that they are defined in the range (0, 360), while GeoViews generally expects it to be in the range (-180, 180). To correct this we can apply a simple fix:
ds['lon'] = np.where(ds.lon>180, ds.lon-360, ds.lon)
Now we are ready to display the data. Note that throughout this user guide we will be using Bokeh but we could easily switch to matplotlib if needed.
opts.defaults(opts.Image(cmap='viridis'))
images * gv.feature.coastline
Datashader#
HoloViews provides high-level wrappers around the datashader library, which make it possible to quickly resample or aggregate datasets of different kinds. Datashader knows nothing about non-flat coordinate systems, but provides a very fast, parallelized regridding operation for rectilinear grids. Here we will import the regrid
operation and pass it our stack of images from above. While this dataset is fairly small and regridding will actually upsample the image to match the dimensions of the plot, regrid
can very quickly downsample very large datasets.
One important thing to note about the resampling operations we will be working with in this user guide is that they are dynamic and linked to the plot dimensions and axis ranges. This means that whenever we zoom or pan the data will be resampled. If we want to disable this linked behavior and supply an explicit width and height we can disable the streams by passing streams=[]
as a keyword argument.
from holoviews.operation.datashader import regrid
regrid(images) * gv.feature.coastline
xESMF#
The xESMF library is specifically designed to provide an easy way to accurately resample grids defined in geographic coordinate systems and differs significantly from the simpler approach used by datashader, which applies simple upsampling and downsampling. xESMF is a wrapper around the ESMF regridding algorithms, which compute an interpolation weight matrix which is applied to remap the values of the source grid onto the destination grid.
In GeoViews these algorithms are made available via the weighted_regrid
operation, which supports the different interpolation modes including: ‘bilinear’, ‘nearest_s2d’, ‘nearest_d2s’ and ‘conservative’. Since generating the sparse weight matrix takes much longer than applying it the operation will cache the weight matrix on disk for later use; this optimization can be disabled via the reuse_weights
parameter or customized by defining a custom file_pattern
.
from geoviews.operation.regrid import weighted_regrid
weighted_regrid(images) * gv.feature.coastline
Since this operation creates local weight files we will want to clean up after ourselves once we are done, to do so we can call the weighted_regrid.clean_weight_files
method.
weighted_regrid.clean_weight_files()
Deleted 0 weight files
Onto an existing grid#
The operation also allows us to define a target grid, which we can either define manually or by using a utility provided by the xESMF library. Here we will define a $2^{\circ}\times2^{\circ}$ grid.
import xesmf as xe
grid = xe.util.grid_2d(-160, -35, 2, 15, 70, 2)
grid
<xarray.Dataset> Size: 55kB Dimensions: (y: 27, x: 62, y_b: 28, x_b: 63) Coordinates: lon (y, x) float64 13kB -159.0 -157.0 -155.0 ... -41.0 -39.0 -37.0 lat (y, x) float64 13kB 16.0 16.0 16.0 16.0 ... 68.0 68.0 68.0 68.0 lon_b (y_b, x_b) float64 14kB -160.0 -158.0 -156.0 ... -40.0 -38.0 -36.0 lat_b (y_b, x_b) float64 14kB 15.0 15.0 15.0 15.0 ... 69.0 69.0 69.0 69.0 Dimensions without coordinates: y, x, y_b, x_b Data variables: *empty*
Since the grid has 2D coordinate arrays the regridded data will be wrapped in and displayed as a QuadMesh:
target = gv.Dataset(grid, kdims=['lon', 'lat'])
weighted_regrid(images, target=target, streams=[]) * gv.feature.coastline
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
Curvilinear Grids#
Curvilinear grids are another very common mesh type, which are usually defined by multi-dimensional coordinate arrays:
ds = xr.tutorial.open_dataset('rasm').load()
ds
<xarray.Dataset> Size: 17MB Dimensions: (time: 36, y: 205, x: 275) Coordinates: * time (time) object 288B 1980-09-16 12:00:00 ... 1983-08-17 00:00:00 xc (y, x) float64 451kB 189.2 189.4 189.6 189.7 ... 17.4 17.15 16.91 yc (y, x) float64 451kB 16.53 16.78 17.02 17.27 ... 28.01 27.76 27.51 Dimensions without coordinates: y, x Data variables: Tair (time, y, x) float64 16MB nan nan nan nan ... 28.66 28.19 28.21 Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/l... institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 19... comment: Output from the Variable Infiltration Capacity... nco_openmp_thread_number: 1 NCO: netCDF Operators version 4.7.9 (Homepage = htt... history: Fri Aug 7 17:57:38 2020: ncatted -a bounds,,d...
Just like the rectilinear grids GeoViews understands this kind of data natively. So we again wrap this dataset in a gv.Dataset
and define a fixed range for the air teperature (Tair
) values:
gvds = gv.Dataset(ds).redim.range(Tair=(-25, 25))
gvds
:Dataset [time,xc,yc] (Surface air temperature)
Now we can plot this data directly as a gv.QuadMesh
, however this is generally quite slow, especially when we are working with bokeh where each grid point is rendered as a distinct polygon. We will therefore downsample the data by a factor of 3 along both dimensions:
opts.defaults(opts.Image(cmap='RdBu_r'), opts.QuadMesh(cmap='RdBu_r'))
quadmeshes = gvds.to(gv.QuadMesh, ['xc', 'yc'], dynamic=True)
quadmeshes.apply(lambda x: x.clone(x.data.Tair[::3, ::3])) * gv.feature.coastline
The problem is less severe when plotting using matplotlib but even then plotting can be fairly slow given a large enough mesh.
gv.output(quadmeshes.opts(cmap='RdBu_r') * gv.feature.coastline, backend='matplotlib', size=300)
If we want to explore a very large grid it therefore often makes sense to resample the data onto a rectilinear grid, which can be rendered much more efficiently. Once again we have the option of using the datashader based approach or the more accurate xESMF based approach.
Datashader#
To regrid a QuadMesh
using GeoViews we can import the rasterize
operation. In the background the operation will convert the QuadMesh
into a TriMesh
, which datashader understands. To optimize this conversion so it occurs only when aggregating the QuadMesh
for the first time we can activate the precompute
option. Additionally we have to define an aggregator, in this case to compute the mean Tair
value in a pixel:
from holoviews.operation.datashader import rasterize
rasterize(gv.project(quadmeshes), precompute=True, aggregator=dsh.mean('Tair')) * gv.feature.coastline
xESMF#
Now we will once again use the xESMF based regridding for which we can still use the weighted_regrid
operation, since it supports both rectilinear and curvilinear grids. Since the original data doesn’t have a very high resolution we will also disable the streams
linking the operation to the plot dimensions and axis ranges.
from geoviews.operation.regrid import weighted_regrid
weighted_regrid(quadmeshes, streams=[]) * gv.feature.coastline
Finally lets clean up after ourselves one last time:
weighted_regrid.clean_weight_files()
Deleted 0 weight files