Seattle Lidar#
Visualize Lidar Scattered Point Elevation Data#
This notebook uses Datashader to visualize Lidar elevation data from the Puget Sound Lidar consortium, a source of Lidar data for the Puget Sound region of the state of Washington, U.S.A.
Lidar Elevation Data#
Example X,Y,Z scattered point elevation data from the unpacked 7zip files (unpacked as .gnd files)
! head ../data/q47122d2101.gnd
X,Y,Z
1291149.60,181033.64,467.95
1291113.29,181032.53,460.24
1291065.38,181035.74,451.41
1291113.16,181037.32,455.51
1291116.68,181037.42,456.20
1291162.42,181038.90,467.81
1291111.90,181038.15,454.89
1291066.62,181036.73,451.41
1291019.10,181035.20,451.64
The Seattle area example below loads 25 .gnd
elevation files like the one above. We’ll download, cache and read the data using intake
.
import intake
cat = intake.open_catalog('./catalog.yml')
list(cat)
['seattle_lidar']
lidar = cat.seattle_lidar()
df = lidar.to_dask()
df.head()
X | Y | Z | |
---|---|---|---|
0 | 1293274.13 | 181399.80 | 435.79 |
1 | 1293278.38 | 181393.05 | 434.58 |
2 | 1293272.96 | 181401.08 | 435.86 |
3 | 1293271.02 | 181401.13 | 435.73 |
4 | 1293270.85 | 181407.80 | 436.15 |
Geographic Metadata#
Since the data are geographic, we need to know the coordinate reference system (CRS) of the X and Y. All the geographic metadata is stored in the data source entry in the intake catalog.
from pyproj.transformer import Transformer
lidar.metadata['crs']
'State Plane Coordinate System Washington North FIPS 4601'
transformer = Transformer.from_crs('epsg:2855','epsg:3857')
# Washington State Plane North EPSG code and Mercator projection EPSG code
FT_2_M = 0.3048
def convert_coords(df):
lon, lat = transformer.transform(df.X.values * FT_2_M, df.Y.values * FT_2_M)
df['meterswest'], df['metersnorth'] = lon, lat
return df[['meterswest', 'metersnorth', 'Z']]
Try out the convert_coords function on a subset of the data
convert_coords(df.head())
meterswest | metersnorth | Z | |
---|---|---|---|
0 | -1.360741e+07 | 6.022197e+06 | 435.79 |
1 | -1.360741e+07 | 6.022194e+06 | 434.58 |
2 | -1.360742e+07 | 6.022198e+06 | 435.86 |
3 | -1.360742e+07 | 6.022198e+06 | 435.73 |
4 | -1.360742e+07 | 6.022201e+06 | 436.15 |
Convert the coordinates#
Since our real dataset is large and partitioned using dask, we need to think about how to apply the convert_coords function to our data.
import dask
import dask.distributed
import dask.delayed
dask.distributed.Client()
Client
Client-0bbc3429-7a42-11ee-8b5d-6045bdd7b999
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
351ab4f8
Dashboard: http://127.0.0.1:8787/status | Workers: 2 |
Total threads: 2 | Total memory: 6.76 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-6b2f7a64-61d7-49ee-8991-facbc35f8efe
Comm: tcp://127.0.0.1:36683 | Workers: 2 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 2 |
Started: Just now | Total memory: 6.76 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:45513 | Total threads: 1 |
Dashboard: http://127.0.0.1:43043/status | Memory: 3.38 GiB |
Nanny: tcp://127.0.0.1:44809 | |
Local directory: /tmp/dask-scratch-space/worker-fiff8y0e |
Worker: 1
Comm: tcp://127.0.0.1:44223 | Total threads: 1 |
Dashboard: http://127.0.0.1:37839/status | Memory: 3.38 GiB |
Nanny: tcp://127.0.0.1:40017 | |
Local directory: /tmp/dask-scratch-space/worker-ftr9efvg |
Explore the task graph to figure out a performant way to split up the coords conversion. First we’ll try with using dask.delayed
.
dask.delayed(convert_coords)(df).visualize()
We can see that even though we thought dask.delayed
would help, in actuality we would be requiring all the processes to be done first and then the conversion would happen on the whole dataset in one go. Another approach would be to use dask.map_partitions
to do the conversion on each piece of the data.
df_merc = df.map_partitions(convert_coords)
df_merc.visualize()
Now that we have set up the task graph, we can use df_merc
directly to do the computations on the fly. However, since this dataset fits in memory on my computer, I will do the computation and keep the output in memory for quick use when plotting.
NOTE: This next cell takes about a minute to run. Take a look at the dask
dashboard at the location specified above to watch the task progression.
%%time
dataset = df_merc.compute()
2023-11-03 12:12:33,731 - distributed.worker.memory - WARNING - Unmanaged memory use is high. This may indicate a memory leak or the memory may not be released to the OS; see https://distributed.dask.org/en/latest/worker-memory.html#memory-not-released-back-to-the-os for more information. -- Unmanaged memory: 2.34 GiB -- Worker memory limit: 3.38 GiB
CPU times: user 4.5 s, sys: 10.3 s, total: 14.8 s
Wall time: 1min 43s
If all the data doesn’t fit in memory on your machine, try downsampling the data from each file to only keep 1/100th of the total data. To avoid unnecessary computation, it is better to do the downsampling first and then convert the coordinates.
small = df.sample(frac=0.01).map_partitions(convert_coords)
small.visualize()
%%time
small_dataset = small.compute()
CPU times: user 1.3 s, sys: 500 ms, total: 1.8 s
Wall time: 33.9 s
Plot#
import geoviews as gv
import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import rasterize, rd
hv.extension('bokeh')
To simplify the exploration of the time required to display different data, define a function that accepts a regular pandas
dataframe or a dask
delayed dataframe.
def plot(data, **kwargs):
"""Plot point elevation data, rasterizing by mean elevation"""
points = hv.Points(data, kdims=['meterswest', 'metersnorth'], vdims=['Z'])
image_opts = opts.Image(tools=['hover'], cmap='blues_r', colorbar=True,
width=800, height=800, xaxis=None, yaxis=None)
return rasterize(points, aggregator=rd.mean('Z'), precompute=True, **kwargs).options(image_opts)
We’ll also declare a tiler to use for background imagery.
tiles = gv.tile_sources.EsriImagery()
Then we will construct the plot out of rasterized point data and tiles.
tiles * plot(small_dataset)
Benchmarking#
%%time
tiles * plot(small_dataset)
CPU times: user 8.6 ms, sys: 53 µs, total: 8.66 ms
Wall time: 8.64 ms
%%time
tiles * plot(dataset)
CPU times: user 8.99 ms, sys: 0 ns, total: 8.99 ms
Wall time: 8.89 ms
%%time
tiles * plot(df_merc)
CPU times: user 11 ms, sys: 0 ns, total: 11 ms
Wall time: 11.4 ms
Visualize Raster of Point Data#
If we compute a raster of the point data then we don’t need to store as much data in memory, which should allow faster interactions, and allow use with lots of other tools.
%%time
raster = plot(dataset, dynamic=False, width=1000, height=1000).data
CPU times: user 2 s, sys: 689 ms, total: 2.69 s
Wall time: 7.98 s
The result is an xarray.Dataset
with x and y coordinates and a 2D array of Z values.
raster.metersnorth[1] - raster.metersnorth[0]
<xarray.DataArray 'metersnorth' ()> array(10.29223724)
With these data we can use the geo tools in datashader to compute and visualize the elevation using hillshading for instance. See Datashader User Guide #8 for more datashader and xrspatial geo tooling.
from xrspatial import hillshade
illuminated = hillshade(raster.get('meterswest_metersnorth Z'))
hv_shaded = hv.Image((raster.meterswest, raster.metersnorth, illuminated))
tiles * rasterize(hv_shaded.options(opts.Image(cmap='blues', height=600, width=600)))
Ideas for future work#
It’d be nice to have a rasterio
writer from xarray
so that we could easily write chunked geotiffs from xarray
objects.
Something like:
raster.to_rasterio(path=None, mode='w', compute=True)