Seattle Lidar#

datashaderhvplotbokehdaskxarray-spatial

Published: April 20, 2017 · Modified: November 22, 2024

Visualize Lidar Scattered Point Elevation Data#

This notebook uses hvPlot and Datashader to visualize Lidar elevation data from the Puget Sound Lidar consortium, a source of Lidar data for the Puget Sound region of the state of Washington, U.S.A.

Lidar Elevation Data#

Example X,Y,Z scattered point elevation data from the unpacked 7zip files (unpacked as .gnd files)

! head ../data/q47122d2101.gnd

X,Y,Z
1291149.60,181033.64,467.95
1291113.29,181032.53,460.24
1291065.38,181035.74,451.41
1291113.16,181037.32,455.51
1291116.68,181037.42,456.20
1291162.42,181038.90,467.81
1291111.90,181038.15,454.89
1291066.62,181036.73,451.41
1291019.10,181035.20,451.64

The Seattle area example below loads 25 .gnd elevation files like the one above. We’ll download, cache and read the data using intake.

NOTE: Downloading the data for the first time takes about 2 mins due to its large size.

import intake

cat = intake.open_catalog('./catalog.yml')
list(cat)

['seattle_lidar']

lidar = cat.seattle_lidar()
ddf_original = lidar.to_dask()
print(len(ddf_original))
ddf_original.head(2)

94507551

	X	Y	Z
0	1293274.13	181399.80	435.79
1	1293278.38	181393.05	434.58

Geographic Metadata#

Since the data are geographic, we need to know the coordinate reference system (CRS) of the X and Y. All the geographic metadata is stored in the data source entry in the intake catalog.

lidar.metadata['crs']

'State Plane Coordinate System Washington North FIPS 4601'

We intend to display the data overlayed on top of a web basemap (aka tiled web map) which usually uses the Web Mercator (EPSG:3857) projection. We convert the data now to this projection so we only have to do it once. Note that we could have installed GeoViews and called hvPlot with geo=True to enable on-the-fly projection of the data to the right system. However, since we are going to generate many plots in this notebook, it is worth pre-processing the data beforehand not to pay the projection cost on every new plot.

from pyproj.transformer import Transformer

# Washington State Plane North EPSG code and Mercator projection EPSG code
transformer = Transformer.from_crs('epsg:2855','epsg:3857')

FT_2_M = 0.3048 # conversion factor from feet to meters

def convert_coords(df):
    lon, lat = transformer.transform(df.X.values * FT_2_M, df.Y.values * FT_2_M)
    df['meterswest'], df['metersnorth'] = lon, lat
    return df[['meterswest', 'metersnorth', 'Z']]

Try out the convert_coords function on a subset of the data

convert_coords(ddf_original.head(2))

	meterswest	metersnorth	Z
0	-1.360741e+07	6.022197e+06	435.79
1	-1.360741e+07	6.022194e+06	434.58

Convert the coordinates#

Since our real dataset is large and partitioned using dask, we need to think about how to apply the convert_coords function to our data.

import dask
import dask.distributed
import dask.delayed
from dask.config import set

set({'distributed.worker.memory.target': 0.8,
     'distributed.worker.memory.spill': 0.9,
     'distributed.worker.memory.pause': 0.95,
     'distributed.worker.memory.terminate': 0.98})

<dask.config.set at 0x7fd2d4c99cd0>

dask.distributed.Client(memory_limit='12GB')

Client

Client-9bdfdd53-a8c3-11ef-8909-000d3a3bbecc

Connection method: Cluster object	Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

LocalCluster

bc5c4321

Dashboard: http://127.0.0.1:8787/status	Workers: 4
Total threads: 4	Total memory: 44.70 GiB
Status: running	Using processes: True

Scheduler Info

Scheduler

Scheduler-55583ede-04d6-46b3-bbe9-f7032d80d567

Comm: tcp://127.0.0.1:38899	Workers: 4
Dashboard: http://127.0.0.1:8787/status	Total threads: 4
Started: Just now	Total memory: 44.70 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:39009	Total threads: 1
Dashboard: http://127.0.0.1:34771/status	Memory: 11.18 GiB
Nanny: tcp://127.0.0.1:34547
Local directory: /tmp/dask-scratch-space/worker-m247sa7o

Worker: 1

Comm: tcp://127.0.0.1:38713	Total threads: 1
Dashboard: http://127.0.0.1:46287/status	Memory: 11.18 GiB
Nanny: tcp://127.0.0.1:46519
Local directory: /tmp/dask-scratch-space/worker-tnppqz2_

Worker: 2

Comm: tcp://127.0.0.1:46141	Total threads: 1
Dashboard: http://127.0.0.1:40499/status	Memory: 11.18 GiB
Nanny: tcp://127.0.0.1:39561
Local directory: /tmp/dask-scratch-space/worker-5qwkac4x

Worker: 3

Comm: tcp://127.0.0.1:34709	Total threads: 1
Dashboard: http://127.0.0.1:37271/status	Memory: 11.18 GiB
Nanny: tcp://127.0.0.1:37931
Local directory: /tmp/dask-scratch-space/worker-842ry64e

Explore the task graph to figure out a performant way to split up the coords conversion. First we’ll try with using dask.delayed.

dask.delayed(convert_coords)(ddf_original).visualize(engine='cytoscape')

We can see that even though we thought dask.delayed would help, in actuality we would be requiring all the processes to be done first and then the conversion would happen on the whole dataset in one go. Another approach would be to use dask.map_partitions to do the conversion on each piece of the data.

ddf = ddf_original.map_partitions(convert_coords)
ddf.visualize(engine='cytoscape', tasks=True)

Now that we have set up the task graph, we can use ddf directly to do the computations on the fly. However, since this dataset fits in memory (~8 GB), we will do the computation and keep the output in memory for quick use when plotting.

NOTE: This next cell takes about a minute to run.

%%time
df = ddf.compute()

CPU times: user 1.68 s, sys: 2.8 s, total: 4.48 s
Wall time: 46.7 s

df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 94507551 entries, 0 to 3822295
Data columns (total 3 columns):
 #   Column       Dtype  
---  ------       -----  
 0   meterswest   float64
 1   metersnorth  float64
 2   Z            float64
dtypes: float64(3)
memory usage: 2.8 GB

If all the data doesn’t fit in memory on your machine, try downsampling the data from each file to only keep 1/100th of the total data. To avoid unnecessary computation, it is better to do the downsampling first and then convert the coordinates.

ddf_small = ddf_original.sample(frac=0.01).map_partitions(convert_coords)
ddf_small.visualize(engine='cytoscape', tasks=True)

%%time
df_small = ddf_small.compute()

CPU times: user 600 ms, sys: 160 ms, total: 760 ms
Wall time: 16.1 s

Plot#

import hvplot.dask # noqa
import hvplot.pandas # noqa

To simplify the exploration of the time required to display different data, define a function that accepts a regular pandas dataframe or a dask delayed dataframe.

def plot(data, **kwargs):
    """Plot point elevation data, rasterizing by mean elevation"""
    options = dict(cmap='blues_r', width=800, height=800, xaxis=None, yaxis=None, tiles='ESRI')
    options |= kwargs
    return data.hvplot.points('meterswest', 'metersnorth', color='Z', rasterize=True,
                              aggregator='mean', **options)

Then we will construct the plot using the df_small dataset.

Benchmarking#

Here, we measure the execution time of the display function from IPython. Without including display, the timing would only reflect the rapid creation of the HoloViews DynamicMap objects. By including display, we also capture the duration of various internal Python operations required to render the plot on the screen, such as the rasterization process.

from IPython.display import display

%%time

display(plot(df_small))

CPU times: user 1.48 s, sys: 377 ms, total: 1.85 s
Wall time: 2.1 s

%%time

display(plot(df))

CPU times: user 3.13 s, sys: 229 ms, total: 3.36 s
Wall time: 3.24 s

%%time

display(plot(ddf))

CPU times: user 9.28 s, sys: 1.3 s, total: 10.6 s
Wall time: 2min 29s

Visualize Raster of Point Data#

If we compute a raster of the point data then we don’t need to store as much data in memory, which should allow faster interactions, and allow use with lots of other tools.

%%time
raster = plot(df, dynamic=False, width=1000, height=1000, tiles=None).data

CPU times: user 2.01 s, sys: 215 ms, total: 2.22 s
Wall time: 2.14 s

The result is an xarray.Dataset with x and y coordinates and a 2D array of Z values.

raster.metersnorth[1] - raster.metersnorth[0]

<xarray.DataArray 'metersnorth' ()> Size: 8B
array(10.29223724)

With these data we can use the geo tools found in the xarray-spatial library to compute and visualize the elevation using hillshading for instance. See Datashader User Guide for more datashader and xarray-spatial geo tooling.

from xrspatial import hillshade
import hvplot.xarray # noqa

illuminated = hillshade(raster.get('meterswest_metersnorth Z'))
illuminated.hvplot.image(
    'meterswest', 'metersnorth', data_aspect=1, cmap='blues', tiles='ESRI',
    width=600, height=600, padding=0, xlabel='Longitude', ylabel='Latitude', colorbar=False,
)

Ideas for future work#

It’d be nice to have a rasterio writer from xarray so that we could easily write chunked geotiffs from xarray objects.

Something like:

raster.to_rasterio(path=None, mode='w', compute=True)

This web page was generated from a Jupyter notebook and not all interactivity will work on this website.