This script compares methods to extract raster values in polygons.
- I consider a one band raster (one Landsat 7 band) for the analysis
- On-disk methods, as for example on
stars_proxyobjects are not included in the benchmarking.
What could be evaluated in another experiment is the performance
- on multi layer rasters (e.g. long time series data cube)
- using on-disk methods such as
Make sure to use
stars >= 0.5.4.
library(stars) library(sf) library(raster) library(terra) library(exactextractr) library(dplyr) library(microbenchmark)
Get data for testing
One Landsat 7 band is used for the benchmarking. It is included in the
# raster data tif = system.file("tif/L7_ETMs.tif", package = "stars") x_ras = raster::raster(tif) ## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj = ## prefer_proj): Discarded datum unknown in Proj4 definition x_ras = x_ras$L7_ETMs x_stars = stars::read_stars(tif) x_stars = x_stars %>% dplyr::slice(band, 1) x_terra = terra::rast(tif) x_terra = x_terra$L7__1
Create voronois on the raster. And create objects of the different spatial classes that are needed for the extraction.
set.seed(1331) p = st_sample(st_as_sfc(st_bbox(x_stars)), 5) %>% st_combine() v_sfc = st_voronoi(p) %>% st_collection_extract("POLYGON") v_sf = st_as_sf(v_sfc) v_sf$id = 1:nrow(v_sf) v_vect = vect(v_sf) v_sp <- as(v_sf, "Spatial")
Plot the overlay
plot(x_stars, reset = FALSE) plot(v_sfc, add = TRUE, border = 'red', lwd = 2)
Benchmark different methods of extracting and aggregating raster values into polygons.
mbm = microbenchmark( ext_raster = raster::extract(x = x_ras, y = v_sp, fun = mean, na.rm = TRUE), ext_sf = st_extract(x = x_stars, at = v_sf, FUN = mean, na.rm = TRUE) %>% st_as_sf(), ext_sf_exact = aggregate(x_stars, v_sfc, mean, exact = TRUE), # stars v 0.5.4 ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean"), ext_terra = terra::extract(x = x_terra, y = v_vect, fun = mean, na.rm = TRUE), times=10)
Here are the timings.
mbm ## Unit: milliseconds ## expr min lq mean median uq max ## ext_raster 4744.3957 4873.0278 4973.2116 4920.0947 5126.3314 5382.8099 ## ext_sf 5718.2951 5847.0078 6149.7796 6193.7863 6413.7143 6504.5277 ## ext_sf_exact 186.8516 220.1880 255.9041 241.0568 282.8818 385.0357 ## ext_exactextractr 139.7931 149.3529 175.2251 161.3526 175.5298 300.5211 ## ext_terra 491.6398 503.5755 569.9807 552.4099 601.8735 737.8837 ## neval ## 10 ## 10 ## 10 ## 10 ## 10
And here are the timings plotted.
ggplot2::autoplot(mbm) ## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Apply fastest solution
exatctextractr::exact_extract(x = x_terra, y = v_sf, fun = "mean") method.
ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean")
Plot the result
v_sf$aggregated = ext_exactextractr v_sf$id = NULL plot(v_sf)
exactextractr::exact_extract()is the most performant method in the tested case. Here it takes half the time of the second best method.
exactextractrwhen setting the parameter
exact = TRUEin the function
aggregate(). It is a little slower and currently only supports
sumas aggregation functions. The upside is that it’s integrated within the
starspackage, so no conversion of objects is needed when you are anyway using
- In a different experiment I have compared the run times for the same methods as used here on a larger scale. MODIS 250m data covering the whole alps. There
terra::extract()performed better than
aggregate.stars(..., exact = TRUE).
aggregate(..., exact = TRUE)yield slightly different results than the other methods since they take into account the area of cells that are crossed by the polygon boundary.
Note: I did this test to identify the most suitable extraction method for NUTS2 and NUTS3 regions over the Alps on 20 year MODIS time series with a 4 daily temporal resolution. In this application the pure
exactextractr::exact_extract()method is advisable since the run time is magnitude(s) lower than the other merhods.