Benchmarking different raster extraction methods

An experiment comparing the runtime of current raster extraction methods in R

Introduction

This script compares methods to extract raster values in polygons.

  • I consider a one band raster (one Landsat 7 band) for the analysis
  • On-disk methods, as for example on stars_proxy objects are not included in the benchmarking.

What could be evaluated in another experiment is the performance

  • on multi layer rasters (e.g. long time series data cube)
  • using on-disk methods such as stars_proxy objects

Libraries

Make sure to use stars >= 0.5.4.

library(stars)
library(sf)
library(raster)
library(terra)
library(exactextractr)
library(dplyr)
library(microbenchmark)

Get data for testing

Raster data

One Landsat 7 band is used for the benchmarking. It is included in the stars package.


# raster data
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x_ras = raster::raster(tif)
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
x_ras = x_ras$L7_ETMs
x_stars = stars::read_stars(tif)
x_stars = x_stars %>% dplyr::slice(band, 1)
x_terra = terra::rast(tif)
x_terra = x_terra$L7__1

Polygon data

Create voronois on the raster. And create objects of the different spatial classes that are needed for the extraction.

set.seed(1331)
p = st_sample(st_as_sfc(st_bbox(x_stars)), 5) %>% st_combine()
v_sfc = st_voronoi(p) %>% st_collection_extract("POLYGON")
v_sf = st_as_sf(v_sfc)
v_sf$id = 1:nrow(v_sf)
v_vect = vect(v_sf)
v_sp <- as(v_sf, "Spatial")

Plot the overlay

plot(x_stars, reset = FALSE)
plot(v_sfc, add = TRUE, border = 'red', lwd = 2)

Extract

Benchmark

Benchmark different methods of extracting and aggregating raster values into polygons.

mbm = microbenchmark(
  ext_raster = raster::extract(x = x_ras, y = v_sp, fun = mean, na.rm = TRUE),
  ext_sf = st_extract(x = x_stars, at = v_sf, FUN = mean, na.rm = TRUE) %>% st_as_sf(),
  ext_sf_exact = aggregate(x_stars, v_sfc, mean, exact = TRUE), # stars v 0.5.4
  ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean"),
  ext_terra = terra::extract(x = x_terra, y = v_vect, fun = mean, na.rm = TRUE),
  times=10)

Here are the timings.

mbm
## Unit: milliseconds
##               expr       min        lq      mean    median        uq       max
##         ext_raster 4744.3957 4873.0278 4973.2116 4920.0947 5126.3314 5382.8099
##             ext_sf 5718.2951 5847.0078 6149.7796 6193.7863 6413.7143 6504.5277
##       ext_sf_exact  186.8516  220.1880  255.9041  241.0568  282.8818  385.0357
##  ext_exactextractr  139.7931  149.3529  175.2251  161.3526  175.5298  300.5211
##          ext_terra  491.6398  503.5755  569.9807  552.4099  601.8735  737.8837
##  neval
##     10
##     10
##     10
##     10
##     10

And here are the timings plotted.

ggplot2::autoplot(mbm)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Apply fastest solution

Apply the exatctextractr::exact_extract(x = x_terra, y = v_sf, fun = "mean") method.

ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean")

Plot the result

v_sf$aggregated = ext_exactextractr
v_sf$id = NULL
plot(v_sf)

Conclusion

  • The exactextractr::exact_extract() is the most performant method in the tested case. Here it takes half the time of the second best method.
  • stars uses exactextractr when setting the parameter exact = TRUE in the function aggregate(). It is a little slower and currently only supports mean and sum as aggregation functions. The upside is that it’s integrated within the stars package, so no conversion of objects is needed when you are anyway using stars.
  • In a different experiment I have compared the run times for the same methods as used here on a larger scale. MODIS 250m data covering the whole alps. There terra::extract() performed better than aggregate.stars(..., exact = TRUE).

Note: The exactextractr and aggregate(..., exact = TRUE) yield slightly different results than the other methods since they take into account the area of cells that are crossed by the polygon boundary.

Note: I did this test to identify the most suitable extraction method for NUTS2 and NUTS3 regions over the Alps on 20 year MODIS time series with a 4 daily temporal resolution. In this application the pure exactextractr::exact_extract() method is advisable since the run time is magnitude(s) lower than the other merhods.

Avatar
Peter Zellner
Institute for Earth Observation

Trying to provide, optimize and automitize remote sensing and gis workflows and tools.

comments powered by Disqus