Introduction
This script compares methods to extract raster values in polygons.
- I consider a one band raster (one Landsat 7 band) for the analysis
- On-disk methods, as for example on
stars_proxy
objects are not included in the benchmarking.
What could be evaluated in another experiment is the performance
- on multi layer rasters (e.g. long time series data cube)
- using on-disk methods such as
stars_proxy
objects
Libraries
Make sure to use stars >= 0.5.4
.
library(stars)
library(sf)
library(raster)
library(terra)
library(exactextractr)
library(dplyr)
library(microbenchmark)
Get data for testing
Raster data
One Landsat 7 band is used for the benchmarking. It is included in the stars
package.
# raster data
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x_ras = raster::raster(tif)
## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in Proj4 definition
x_ras = x_ras$L7_ETMs
x_stars = stars::read_stars(tif)
x_stars = x_stars %>% dplyr::slice(band, 1)
x_terra = terra::rast(tif)
x_terra = x_terra$L7__1
Polygon data
Create voronois on the raster. And create objects of the different spatial classes that are needed for the extraction.
set.seed(1331)
p = st_sample(st_as_sfc(st_bbox(x_stars)), 5) %>% st_combine()
v_sfc = st_voronoi(p) %>% st_collection_extract("POLYGON")
v_sf = st_as_sf(v_sfc)
v_sf$id = 1:nrow(v_sf)
v_vect = vect(v_sf)
v_sp <- as(v_sf, "Spatial")
Plot the overlay
plot(x_stars, reset = FALSE)
plot(v_sfc, add = TRUE, border = 'red', lwd = 2)
Extract
Benchmark
Benchmark different methods of extracting and aggregating raster values into polygons.
mbm = microbenchmark(
ext_raster = raster::extract(x = x_ras, y = v_sp, fun = mean, na.rm = TRUE),
ext_sf = st_extract(x = x_stars, at = v_sf, FUN = mean, na.rm = TRUE) %>% st_as_sf(),
ext_sf_exact = aggregate(x_stars, v_sfc, mean, exact = TRUE), # stars v 0.5.4
ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean"),
ext_terra = terra::extract(x = x_terra, y = v_vect, fun = mean, na.rm = TRUE),
times=10)
Here are the timings.
mbm
## Unit: milliseconds
## expr min lq mean median uq max
## ext_raster 4744.3957 4873.0278 4973.2116 4920.0947 5126.3314 5382.8099
## ext_sf 5718.2951 5847.0078 6149.7796 6193.7863 6413.7143 6504.5277
## ext_sf_exact 186.8516 220.1880 255.9041 241.0568 282.8818 385.0357
## ext_exactextractr 139.7931 149.3529 175.2251 161.3526 175.5298 300.5211
## ext_terra 491.6398 503.5755 569.9807 552.4099 601.8735 737.8837
## neval
## 10
## 10
## 10
## 10
## 10
And here are the timings plotted.
ggplot2::autoplot(mbm)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Apply fastest solution
Apply the exatctextractr::exact_extract(x = x_terra, y = v_sf, fun = "mean")
method.
ext_exactextractr = exact_extract(x = x_terra, y = v_sf, fun = "mean")
Plot the result
v_sf$aggregated = ext_exactextractr
v_sf$id = NULL
plot(v_sf)
Conclusion
- The
exactextractr::exact_extract()
is the most performant method in the tested case. Here it takes half the time of the second best method. stars
usesexactextractr
when setting the parameterexact = TRUE
in the functionaggregate()
. It is a little slower and currently only supportsmean
andsum
as aggregation functions. The upside is that it’s integrated within thestars
package, so no conversion of objects is needed when you are anyway usingstars
.- In a different experiment I have compared the run times for the same methods as used here on a larger scale. MODIS 250m data covering the whole alps. There
terra::extract()
performed better thanaggregate.stars(..., exact = TRUE)
.
Note: The
exactextractr
andaggregate(..., exact = TRUE)
yield slightly different results than the other methods since they take into account the area of cells that are crossed by the polygon boundary.
Note: I did this test to identify the most suitable extraction method for NUTS2 and NUTS3 regions over the Alps on 20 year MODIS time series with a 4 daily temporal resolution. In this application the pure
exactextractr::exact_extract()
method is advisable since the run time is magnitude(s) lower than the other merhods.