Combine OpenEO and SOS spatial databases in R

A tutorial on how to interact with Sensor Observation Service (SOS) and OpenEO based databases in the EURAC EDP portal

Introduction

The following tutorial demonstrates how to interact with the datasets stored in the Environmental Data Platform created within the “Data Platform and Sensing Technologie for Environmental Sensing Lab” (DPS4ESLAB) project.

This project is dedicated to developing innovative solutions for environmental monitoring and the automation of sensor reading and is maily a cooperation between the Center for Sensing Soutions and the Insitutes for Earth Observation and Renewable Energy at Eurac Research

The following tutorial is based on the two openly available R packages MonalisR for the access to the SOS database acquired during the Monalisa Project as well as the OpenEO R-client currently in development and interacting with the OpenEO standard.

In detail, this tutorial features the following steps:
- Get tidy ndvi station data through MonalisaR
- Get corrisponting raster data through openeo
- Fit continuous curves to point measurements

Python

Before starting with the R-tutorial we would like to point you to an alternative to access Raster data in OpenEO using the programming language Python. This Python Tutorial created by our colleague Michele Claus describes in detail how the data accessible via the OpenEO standard can be accessed and processed in Python.

Libraries

The structure of the code in both R-packages offers the possibility to directly download and install the packages in R


library(remotes)
remotes::install_github(repo="Open-EO/openeo-r-client",ref="develop", dependencies=TRUE)
remotes::install_github("https://github.com/mattia6690/MonalisR", dependencies = TRUE, build = FALSE)

Adjacent to the packages we demonstrate some possibilities to tidy, analyze and visialize the data. Therefore the following packages are needed.


# Load the required packages
library(openeo)
library(MonalisR)

# Load additional packages
library(vctrs)  # Use vector files
library(dplyr)  # Organize Tables
library(tidyr)  # Tidy data
library(lubridate) # Time objects organization
library(ggplot2)# Plotting
library(mapview)# Plotting
library(leaflet)# Plotting
#library(raster)
library(stars)  # Spatial Raster objects used with OpenEO
library(sf)     # For vector objects
library(broom)  # Clean the data
library(ncmeta) # Read the Netcdf Files

Get station data via SOS

The MonalisR package

The path to the SOS is automatically set to the EDP Platform SOS database and can be changed if another might be accessed (slight changes may apply):


setMonalisaURL()
## [1] "http://monalisasos.eurac.edu/sos/api/v1/timeseries/"

The data can be explored using the getMonalisaDB function as well as the inherent parameters. The url is not necessary if the EURAC database is used. If no url statement is specified the EURAC SOS DB is automatically accessed.


SOS = setMonalisaURL()

# mnls_foi=getMonalisaDB(url=SOS,subset="foi")
# mnls_props=getMonalisaDB(url=SOS,subset="property")
# mnls_proc=getMonalisaDB(url=SOS,subset="procedure")
# mnls_all=getMonalisaDB(url=SOS,subset = "all")

mnls_comb = getMonalisaDB(url=SOS,subset="combined")
as_tibble(mnls_comb)
## # A tibble: 353 x 4
##    id    foi                            proc                   prop             
##    <chr> <chr>                          <chr>                  <chr>            
##  1 2060  LaimburgProvince               MeteoSensors_Laimburg~ Air Humidity - 1~
##  2 1972  Algund2                        MeteoSensors_Algund2   Air Humidity - 5~
##  3 1772  Girlan1Lamm                    MeteoSensors_Girlan1L~ Air Humidity - 5~
##  4 1978  Gries2NeufeldGreifensteinerweg MeteoSensors_Gries2Ne~ Air Humidity - 5~
##  5 1984  Lana6                          MeteoSensors_Lana6     Air Humidity - 5~
##  6 1760  Nals2OberauCarli               MeteoSensors_Nals2Obe~ Air Humidity - 5~
##  7 1766  NeumarktStiermoos              MeteoSensors_Neumarkt~ Air Humidity - 5~
##  8 1990  Terlan3                        MeteoSensors_Terlan3   Air Humidity - 5~
##  9 1784  Terlan3a                       MeteoSensors_Terlan3a  Air Humidity - 5~
## 10 1996  Tramin13er                     MeteoSensors_Tramin13~ Air Humidity - 5~
## # ... with 343 more rows

Additionally, the response can include the spatial locations.


mnls_geom = getMonalisaDB(url = SOS, subset = "geom")

Given the information about the location we added the possibility to plot them directly on a Leaflet


plotMonalisaLeaflet()

Get NDVI Timeseries

Functions to retrieve the spatial locations and transform them to sf objects


mona_stations = MonalisR::getMonalisaDB(subset = "geom") %>% 
  as.data.frame(stringsAsFactors = FALSE) %>% 
  mutate(LAT = as.numeric(LAT), 
         LON = as.numeric(LON)) %>% 
  st_as_sf(coords = c("LAT", "LON"), crs = 4326)

mona_stations
## Simple feature collection with 31 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 10.57988 ymin: 46.3047 xmax: 12.27283 ymax: 47.0376
## Geodetic CRS:  WGS 84
## First 10 features:
##                               FOI                  geometry
## 1                LaimburgProvince POINT (11.28871 46.38246)
## 2                         Algund2 POINT (11.12387 46.68578)
## 3                     Girlan1Lamm POINT (11.27953 46.45589)
## 4  Gries2NeufeldGreifensteinerweg POINT (11.28028 46.50176)
## 5                           Lana6 POINT (11.15179 46.61655)
## 6                Nals2OberauCarli  POINT (11.2057 46.56013)
## 7               NeumarktStiermoos   POINT (11.2628 46.3047)
## 8                         Terlan3 POINT (11.22854 46.54105)
## 9                        Terlan3a POINT (11.22861 46.54098)
## 10                     Tramin13er POINT (11.24451 46.32918)

Now we filter for the stations that have just the Information on the Normalized Difference Vegetation Index (NDVI)


mona_ndvi = MonalisR::getMonalisaDB(subset = "combined") %>% 
  as.data.frame(stringsAsFactors = FALSE) %>% 
  dplyr::filter(prop == "Normalized Difference Vegetation Index - average")

mona_stations = mona_stations %>% 
  dplyr::filter(FOI %in% unique(mona_ndvi$foi))

Now the datasets are reduced to just two Feature of Interest (FOI, equivalent of a station in our setup):


mona_stations = mona_stations %>% 
  dplyr::filter(FOI %in% c("vimef2000", "domef1500"))

After the reorganization and selection of two station the downloadMonalisa function is used to retrieve the required datasets directly from the database. This function generates the request, sends it to the API and transforms the return to a table of results. Here we search for all the results in the summer period for the Vimef2000 and Domef1500 sites. The result is a nested tibble (practically a dataframe with Lists organzed in columns) in order to provide the best possible data overview.


mona_ndvi.raw <- downloadMonalisa(datestart = "2018-04-01 00:00",
                              dateend = "2018-10-01 00:00",
                              foi = mona_stations$FOI,
                              procedure = "",
                              property = "Normalized Difference Vegetation Index - average")
## Warning in if (foi == "") foi1 <- x$foi %>% unique: the condition has length > 1
## and only the first element will be used
## Warning in if (!is.null(foi) & foi != "") {: the condition has length > 1 and
## only the first element will be used

mona_ndvi.raw
## # A tibble: 2 x 5
##   id    foi       proc                                prop             Data     
##   <chr> <chr>     <chr>                               <chr>            <list>   
## 1 1877  domef1500 SpectralReflectanceSensor_domef1500 Normalized Diff~ <tibble ~
## 2 1927  vimef2000 SpectralReflectanceSensor_vimef2000 Normalized Diff~ <tibble ~

Now we use the dplyr package to tidy the data and to aggregate it to daily observations


mona_ndvi = unnest(mona_ndvi.raw,cols = c(Data)) %>% 
  dplyr::select(foi, TimeStamp, Value) %>% 
  dplyr::mutate(day = lubridate::date(TimeStamp)) %>% 
  dplyr::group_by(foi, day) %>% 
  dplyr::summarise(ndvi = mean(Value, na.rm = TRUE)) %>% 
  ungroup
## `summarise()` regrouping output by 'foi' (override with `.groups` argument)

Create Test Sites

Now we create two 250m buffers around the two stations that will delineate our region of interest when combining them with the MODIS datasets.


test_sites = mona_stations %>% 
  st_transform(3035) %>%  # Transform to UTM System
  st_buffer(dist = 50, endCapStyle = "SQUARE") # 50m Buffer around the datasets

And here is the graphical representation of the Test sites.


mapview(test_sites.ll) + mapview(mona_stations.ll)

Tidy and Plot

In order to generate meaningful and visually useful plots of the time series we first aggregate the data into two week periods


mona_ndvi_week = mona_ndvi %>%
  dplyr::mutate(year=lubridate::year(day)) %>% 
  dplyr::mutate(week = lubridate::week(day)) %>%
  dplyr::group_by(foi, year, week) %>%
  dplyr::summarise(ndvi = mean(ndvi, na.rm = TRUE))
## `summarise()` regrouping output by 'foi', 'year' (override with `.groups` argument)

Now the data can be plotted using the ggplot2 package. This can be done either for each day:


plt_daily_ndvi = ggplot(mona_ndvi, aes(x = day, y = ndvi, col = foi)) + 
  geom_point() + 
  geom_line() +
  facet_wrap(.~foi)

plt_daily_ndvi

or for each week


plt_weekly_ndvi = ggplot(mona_ndvi_week, aes(x = week, y = ndvi, col = foi)) +
  geom_point() + 
  geom_line() +
  facet_wrap(.~foi)

plt_weekly_ndvi

Get raster data via openEO

Connection to openEO API

Now we login to the openEO back-end. For doing this a few information are needed such as the backend host and the secret credentials. These credentials can be provided to specific users that registered to the data portal. Please visit the EDP-Portal or the DPS4ESLAB Website (in the introduction) for more information.


# define configuration (url, client id, secret)
eurac_host = "https://openeo.eurac.edu"

# get the config file (client secret and client id)
conf = read.csv("Credentials/openeo_eurac_conf.csv", 
                stringsAsFactors = FALSE) # adapt this path to where you store the conf file.
conf = list(client_id = conf$client_id, secret = conf$secret)

Then connect to the backend. Now data and process discovery is possible.


# connect (data discovery possible)
eurac = connect(host = eurac_host)
## Connected to service:  https://openeo.eurac.edu 
## Please check the terms of service (terms_of_service()) and the privacy policy (privacy_policy()). By further usage of this service, you acknowledge and agree to those terms and policies.
## Warning: Connected host is not production-ready. Unexpected errors might occur.

# get the oidc providers supported by the backend
prov = list_oidc_providers()
#prov$Eurac_EDP_Keycloak

Finally login to the backend. Now the connection is completed and also processing is possible.


# login (data access and processing possible)
login(login_type = "oidc", 
      provider = prov$Eurac_EDP_Keycloak, 
      config = conf, 
      con = eurac)

# check if login worked
eurac$isConnected()
eurac$isLoggedIn()

Discover the backend

Discover the collections. You can see them also in the Rstudio “connections” pane

colls = list_collections()
names(colls) %>% head()
## [1] "ADO_CORINE_100m_3035_ODC"       "ADO_NDVI_MODIS_231m_3035_ODC"  
## [3] "Backscatter_Sentinel1_Track015" "DRI2_T32TPR"                   
## [5] "DRI2_T32TPS"                    "DRI2_T32TPT"
collection_viewer("ADO_NDVI_MODIS_231m_3035")
## Warning in htmlViewer(html): Cannot show a viewer panel. 'viewer' not available,
## maybe you are using this package not in RStudio.
describe_collection(collection = "ADO_NDVI_MODIS_231m_3035")
## ADO_NDVI_MODIS_231m_3035
## Title:               4 Day Maximum Value Composite NDVI based on MODIS Reflectance
## Description:         NDVI derived from daily MODIS observations (Aqua and Terra) over vegetated surfaces in the alps. Data is provided as a 4 Day Maximum Value Composite at 231 m spatial resolution. The time series is starting from 2000. It has 4 Bands: - 1) NDVI: The maximum NDVI value of the 4d window [unitless; -1 - 1; 254 = no vegetation, 255 = not processed due to insufficient quality], - 2) DOY: The day of year corresponding to the chosen value [days; 1-365; 254 = no vegetation, 255 not processed due to insufficient quality], -3) PLATFORM: The MODIS Platform [1 = Terra, 2 = Aqua], - 4) QU: The Quality Flag containing the values [unitless; 1 = good quality, 2 = bad MODIS QA Flag, 8 = reflectance out of range, 10 = 2 and 8, 65 = bad acquisition geometry, 66 = 65 and 2, 72 = 65 and 8, 74 = 65 and 2 and 8, 128 = no vegetation].
## Deprecated:          FALSE
## Source:              Eurac EO WCS, United States Geological Survey (USGS), Peter Zellner
## Platform:            Aqua, Terra
## Constellation:           Aqua, Terra
## Instrument:          MODIS
## Spatial extent (lon,lat):    (4.21644695838283, 42.817152676055), (18.99055634337, 48.6018483599891)
## Temporal extent:     2000-02-23T00:00:00Z / 2020-11-13T00:00:00Z
## Bands:
##       name center_wavelength gsd
## 1     NDVI                 0   0
## 2      DOY                 0   0
## 3 PLATFORM                 0   0
## 4       QU                 0   0

Here’s a convenience function to view the extent of a collection without loading it.

extent_viewer = function(collection){
  extent = openeo::describe_collection(collection)$extent$spatial
  extent = sf::st_bbox(obj = c(xmin = extent[1], 
                               xmax = extent[3], 
                               ymax = extent[4], 
                               ymin = extent[2]), 
                       crs = sf::st_crs(4326))
  mapview::mapview(extent)
}

extent_viewer("ADO_NDVI_MODIS_231m_3035")

Discover the processes that are available.

processes = list_processes()
names(processes) %>% head()
## [1] "dimension_labels"         "ln"                      
## [3] "coherence"                "cos"                     
## [5] "lt"                       "aggregate_spatial_window"
describe_process(process = "load_collection")
## Process: load_collection
## Summary: Load a collection
## Description: Loads a collection from the current back-end by its id and returns it as processable data cube. The data that is added to the data cube can be restricted with the additional `spatial_extent`, `temporal_extent`, `bands` and `properties`.
## 
## **Remarks:**
## 
## * The bands (and all dimensions that specify nominal dimension labels) are expected to be ordered as specified in the metadata if the `bands` parameter is set to `null`.
## * If no additional parameter is specified this would imply that the whole data set is expected to be loaded. Due to the large size of many data sets this is not recommended and may be optimized by back-ends to only load the data that is actually required after evaluating subsequent processes such as filters. This means that the pixel values should be processed only after the data has been limited to the required extents and as a consequence also to a manageable size.
## Returns: A data cube for further processing. The dimensions and dimension properties (name, type, labels, reference system and resolution) correspond to the collection's metadata, but the dimension labels are restricted as specified in the parameters.
## 
##         Parameter
## 1              id
## 2  spatial_extent
## 3 temporal_extent
## 4           bands
## 5      properties
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Description
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The collection id.
## 2                                                                                                       Limits the data to load from the collection to the specified bounding box or polygons.\n\nThe process puts a pixel into the data cube if the point at the pixel center intersects with the bounding box or any of the polygons (as defined in the Simple Features standard by the OGC).\n\nThe GeoJSON can be one of the following GeoJSON types:\n\n* A `Polygon` geometry,\n* a `GeometryCollection` containing Polygons,\n* a `Feature` with a `Polygon` geometry or\n* a `FeatureCollection` containing `Feature`s with a `Polygon` geometry.\n\nSet this parameter to `null` to set no limit for the spatial extent. Be careful with this when loading large datasets!
## 3 Limits the data to load from the collection to the specified left-closed temporal interval. Applies to all temporal dimensions. The interval has to be specified as an array with exactly two elements:\n\n1. The first element is the start of the temporal interval. The specified instance in time is **included** in the interval.\n2. The second element is the end of the temporal interval. The specified instance in time is **excluded** from the interval.\n\nThe specified temporal strings follow [RFC 3339](https://tools.ietf.org/html/rfc3339). Also supports open intervals by setting one of the boundaries to `null`, but never both.\n\nSet this parameter to `null` to set no limit for the spatial extent. Be careful with this when loading large datasets!
## 4                                                                                                                                                                           Only adds the specified bands into the data cube so that bands that don't match the list of band names are not available. Applies to all dimensions of type `bands`.\n\nEither the unique band name (metadata field `name` in bands) or one of the common band names (metadata field `common_name` in bands) can be specified. If unique band name and common name conflict, the unique band name has higher priority.\n\nThe order of the specified array defines the order of the bands in the data cube. f multiple bands match a common name, all matched bands are included in the original order.
## 5                                                                                                                                                                                                                                                                                                                                                            Limits the data by metadata properties to include only data in the data cube which all given conditions return `true` for (AND operation).\n\nSpecify key-value-pairs with the key being the name of the metadata property, which can be retrieved with the openEO Data Discovery for Collections. The value must a condition (user-defined process) to be evaluated against the collection metadata, see the example.
##   Optional
## 1    FALSE
## 2    FALSE
## 3    FALSE
## 4     TRUE
## 5     TRUE
process_viewer("filter_bbox")
## Warning in htmlViewer(html): Cannot show a viewer panel. 'viewer' not available,
## maybe you are using this package not in RStudio.

Processing with openEO

Definitions

First, select a region of interest, the timespan and the collection.

# select a station
station_number = 1
station_name = mona_stations %>% 
  slice(station_number) %>% 
  sf::st_drop_geometry() %>% 
  dplyr::select(FOI) %>% 
  as.character

# convert to openEO extent definition
aoi = test_sites %>% 
  st_transform(4326) %>% 
  slice(station_number) %>%
  st_bbox() %>% 
  as.list()
names(aoi) = c("west", "south", "east", "north")

# select a timespan
time_range = c("2018-01-01T00:00:00.000Z",   
               "2018-12-31T00:00:00.000Z")

# select a collection
collection = "ADO_NDVI_MODIS_231m_3035"

Build a process graph

Now start a process graph.

# load the processes into an object
p = processes()

# load a collection
data = p$load_collection(id = collection, 
                         spatial_extent = aoi,
                         temporal_extent = time_range)

# save the result
result = p$save_result(data = data, format="netCDF")

No processing has happened so far. Only the workflow/process graph has been defined.

graph_info = create_user_process(result, id = "test", submit = FALSE)
print(jsonlite::toJSON(graph_info, pretty = TRUE, auto_unbox = TRUE))

Execute the processing directly

For small jobs you can get the results directly.

a = Sys.time()
compute_result(result,
               format = "netCDF",
               output_file = "Data/edp_openeo_ndvi.nc", 
               con = eurac)
b = Sys.time() - a

Execute the processing as a batch job

For larger jobs you should do a batch job.

# create a job on the backend. the job object contains info on the job.
job = create_job(graph = result,
                 title = "edp_openeo_ndvi",
                 description = "edp_openeo_ndvi",
                 format = "netCDF", 
                 con = eurac)

# look at the job object: it contains the job id
job

Start the job

start_job(job = job$id, 
          con = eurac) # use the id of the job (job$id) to start the job
openeo::list_jobs(con = eurac) # here you can see your jobs and their status

Once your job is finished you can get the download links here. Usually you receive a data file containing the data and a process file containing the process graph as json.

# get the job info 
result_obj = list_results(job = job$id, 
                          con = eurac)

# download the data sets
dwnld = download_results(job = job$id, 
                         folder = "Data/", # adjust path here
                         con = eurac)

# here you can see where your data is stored. The filename is composed of the 
# folder you specify above and the filename "data.<chosenfiletype>" for the data 
# and "process.json" for the process graph. 
dwnld

Load the result

Load the result into R. Note: The names of the NDVI collection are falsely assigned. DOY is the NDVI.


# load one band of the ndvi result (the ndvi band)
modis_ndvi = read_ncdf("Data/edp_openeo_ndvi.nc", var = "DOY")
## Warning in .get_nc_projection(meta$attribute, rep_var, all_coord_var): No
## projection information found in nc file.
## Warning: ignoring unrecognized unit: 10^0

# this would be the result obtained by the batch job (they're identical)
# modis_ndvi = read_ncdf(dwnld[[1]])

# assign the crs manually (make sure it's the correct one for your colleciton)
st_crs(modis_ndvi) = st_crs(3035) # assign projection manually

# mask out the no data value
modis_ndvi[[1]][modis_ndvi[[1]] < -1] =  NA # replace -32.768 with NA

# get the time stamps
modis_ndvi_time = st_get_dimension_values(modis_ndvi, "DATE")
modis_ndvi_time = as.POSIXct(modis_ndvi_time, origin = "1970-01-01")

# get the ndvi data 
modis_ndvi_vals = modis_ndvi %>% pull() %>% c()

# create a data.frame
modis_ndvi_df = tibble(foi = "modis", 
                       day = modis_ndvi_time,
                       ndvi = modis_ndvi_vals)

head(modis_ndvi_df)
## # A tibble: 6 x 3
##   foi   day                  ndvi
##   <chr> <dttm>              <dbl>
## 1 modis 2017-12-30 01:00:00     0
## 2 modis 2018-01-03 01:00:00    NA
## 3 modis 2018-01-07 01:00:00    NA
## 4 modis 2018-01-11 01:00:00    NA
## 5 modis 2018-01-15 01:00:00    NA
## 6 modis 2018-01-19 01:00:00    NA

Analyse the results

Now check the position of the requested MODIS pixel and the test site.

# since we only requested a pixel we have to reconstruct the bbox
bbox_modis_ndvi = modis_ndvi %>%  
  st_bbox() %>%  
  st_as_sfc() %>% 
  st_centroid() %>% 
  st_buffer(250, endCapStyle = "SQUARE") %>% 
  st_as_sf()

# plot the station and the pixel
mapview(bbox_modis_ndvi, col.reg = "red", layer.name = "modis", label = "modis") + 
  mapview(test_sites %>% slice(station_number), layer.name = "test_site") +
  mapview(mona_stations %>% slice(station_number), layer.name = "test_point")

Combine the two data sources

combi_ndvi = rbind(mona_ndvi %>% dplyr::filter(foi == station_name), 
                   modis_ndvi_df)

This representation allows to plot the NDVI retrieved from with MonalisR together with the one from OpenEO in one graph. We see that MODIS resolution is too coarse to represent only the station measurement. In the next course we will use Sentinel-2 ;).


ggplot(combi_ndvi, aes(x = day, y = ndvi, col = foi)) + 
  geom_line() + 
  geom_point() +
  facet_wrap(.~foi) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Avatar
Peter Zellner
Institute for Earth Observation

Trying to provide, optimize and automitize remote sensing and gis workflows and tools.

comments powered by Disqus