# ¶ Satellite data pre-processing and ancillary data preparation

## ¶ Sentinel-1 data

The Sentinel-1 data build the core dataset of the GFM service. We not only use them as primary information source on the extent of flooded areas and permanent and seasonal water bodies, but also for the (offline-) generation of parameters describing globally the C-band Synthetic Aperture Radar (SAR) backscatter signature at the 20m-pixel-level, supporting the generation of the different product layers.

### ¶ Sentinel-1 pre-processing

The GFM service ingests observations from the Sentinel-1A/B satellites that are acquired in Interferometric Wide-swath mode and Ground Range Detected at High resolution (Sentinel-1 IW GRDH). The GRDH products consist of focused SAR data that has been detected, multi-looked and projected to ground range using an Earth ellipsoid model, and phase information is lost. The resulting product has approximately square spatial resolution pixels and square pixel spacing with reduced speckle at the cost of worse spatial resolution. In case of the here used high-resolution product, the raw backscatter amplitude is sampled with a 10×10 m pixel size. For the GFM flood service, we process Sentinel-1 data in VV-polarisation (and neglect the VH-polarisation channel) due to its higher sensitivity in differentiating water from non-water surfaces.

The incoming Level-1 Sentinel-1 datasets must undergo the pre-processing routines before being forwarded to the algorithms for water and flood extent detection. The output of the pre-processing is Analysis-Ready-Data (ARD) that is formatted and gridded, and immediately forwarded to the GFM flood detection engine and added to the Sentinel-1 data cube.

Figure 1 Illustration of the Sentinel-1 pre-processing workflow, with a Sentinel-1 Level-1 GRDH scene as input and georeferenced, calibrated backscatter (Sigma Nought) data as output.

Based on single Sentinel-1 Level-1 GRDH scenes, the near real-time (NRT) Sentinel-1 pre-processing workflow (see Figure 1) is triggered. Before the actual SAR processing starts, the external, global 30m Copernicus DEM (Copernicus DEM) mosaic (see Section Satellite orbit state vectors) is cropped to the projected extents of the scene to minimize the input file size in the Sentinel Application Platform (SNAP). Then, the SNAP processing is conducted via Graph Processing Tool (GPT) commands (https://senbox.atlassian.net/wiki/spaces/SNAP/pages/70503475/Bulk+Processing+with+GPT) subsequently called from a Python package developed by TU Vienna. This routine involves the following steps, where the step 2, 7, and 8 are done independently from SNAP:

Table 2: Description of the steps in the Sentinel-1 pre-processing workflow, carried out using the SNAP toolbox via Graph Processing Tool (GPT) commands.

Following the processing pipeline depicted in Figure 1, we perform basic and fully automatic quality checks to ensure that the produced files and file contents fulfil a certain level of quality:

• Open and read file contents.
• Checking if the file only contains no data values.
• Validating the expected range of values.
• Checking if the spatial reference information is properly set in the GeoTIFF header.

An optional by-product of the SNAP geocoding workflow is the so-called projected local incidence angle (PLIA), which is the angle between the vector ground-satellite and the surface normal vector, projected into the range plane. PLIA is an essential information about the observation geometry of the satellite and varies across different orbits. Due to the Sentinel‑1’s stable orbit revisit over time, we do not produce the PLIA files in the NRT service but only during archive reprocessing (hence it is depicted by grey text in the last section in Figure 1).

### ¶ Sentinel-1 temporal parameters

Based on the Sentinel-1 data, several temporal parameters are computed for the period 1.1.2019-31.12.2019 on a global scale. These serve as an input for the flood algorithms, exclusion layers and advisory flags. The parameters differ in their temporal aggregation. Parameter computed on monthly basis are in the following called GRMAG for grouped monthly aggregated, parameters aggregated over the total time period TAG. Moreover, parameters are either computed over all orbits together or per orbit. The temporal parameters are listed and described in following table:

Table 3: Overview of Sentinel-1 temporal parameters, and availability per orbit and over all orbits

### ¶ Sentinel-1 harmonic parameters

Radar signal interacts with the Earth’s surface in many different ways. It can be absorbed, scattered, and reflected according to the surface states and characteristics of sensor. The surface state (including soil moisture content, vegetation, roughness, etc.) varies over time leading to variation of backscatter time series. Based on different periods of variation, time series of backscatter can be decomposed into trend, seasonality and short-term random variation.

GFM flood mapping algorithm 3 (see Section Flood mapping algorithm 3 (TUW) for details) utilizes the harmonic model to simulate the backscatter seasonal variation and the estimation of normal, non-flooded conditions. Figure 2 shows a time series of VV backscatter and superimposed with the modeled harmonic time series for an agricultural plot in Thessaly, Greece. An example for an estimated backscatter image for one day in summer and one day in winter over Greece is given in Figure 3. One can review the noticeable backscatter differences caused by varying seasonal vegetation state in this example. In this section, we describe the preparation performed to define the harmonic model on a global scale.

Figure 2: Comparison of actual backscatter time-series and its harmonic model

Figure 3: Example of the estimated backscatter of the harmonic model in summer and winter time over Greece

We introduce a harmonic model (see Eq. 3 1 below) following the approach of Schlaffer et al (2015), where ${\stackrel{^}{\sigma }}_{{t}_{day}}^{0}\hat\left\{\sigma \right\}_\left\{t_\left\{day\right\}\right\}^0$ the most probable radar backscatter at time ${t}_{day}t_\left\{day\right\}$ i.e. day of the year, is computed from $\stackrel{ˉ}{{\sigma }^{0}}\bar\left\{\sigma ^0\right\}$, the average backscatter for the time period, and ${C}_{i}C_i$ and ${S}_{i}S_i$, the harmonic coefficients of the cosine and sine components. The harmonic coefficients and the average backscatter are further referred to as harmonic parameters. As suggested by Schlaffer et al (2015), k is set to 3, representing processes of a time-scale of four months, sufficient to reduce the impact of long-lasting flood events.

The harmonic parameters are derived from a least-squares estimation based on the backscatter values and corresponding observation times of input Sentinel-1 time series. This approach differs from Schlaffer et al (2015) as the backscatter is used directly instead of using 10-day composites. However, backscatter coefficients are highly dependent on the acquisition geometry. While a normalisation approach could be utilised when sufficient incidence angle samples per pixel are present as with ENVISAT ASAR, this procedure is not feasible with Sentinel-1. Hence the parameter estimation is performed for each unique acquisition geometry corresponding to a relative Sentinel-1 orbit. Consequently, a unique harmonic parameter set per orbit per Equi7grid tile is required. Hence, to model the estimated backscatter ${\stackrel{^}{\sigma }}_{t,r}^{0}\hat\left\{\sigma \right\}_\left\{t,r\right\}^0$ for any given day $tt$ and relative orbit $rr$, seven (at k=3) harmonic parameters are computed. In order to get a measure of how many samples support the estimation and to exclude ill-fitted harmonic parameters, the number of valid observations (NOBS) for each pixel is written as an additional layer. In addition, the standard deviation of the harmonic model is calculated as the square root of the sum of squared errors (SSE), divided by the number of data points (${N}_{points}N_\left\{points\right\}$) adjusted for the degrees of freedom of the model (see Eq. 3 2 below). The SSE is derived from the pixel’s backscatter time-series ${\sigma }_{t,r}^{0}\sigma _\left\{t,r\right\}^0$ and its harmonic model ${\stackrel{^}{\sigma }}_{t,r}^{0}\hat\left\{\sigma \right\}_\left\{t,r\right\}^0$.

### ¶ Sentinel-1 projected local incidence angle data

The projected local incidence angle (PLIA) is the angle between the surface normal and the looking direction of the satellite (local incidence angle - LIA) which is further projected into the range plane. It provides essential information about the observation geometry of the satellite and varies across different orbits. It can be computed for each individual image within the terrain correction step (section 3.1.1 Sentinel-1 preprocessing), however, within the GFM project we exploit the stable nature of the Sentinel-1 orbit and precompute the mean PLIA values globally for all available orbits. As an input for this computation, we use the individual PLIA images aggregated to mean values from the all available data observed in the year 2020.

The mean PLIA values of the corresponding orbit are used together with the ${\sigma }^{0}\sigma ^0$ image and the parameters of the harmonic model as an input of the TUW GFM flood mapping algorithm (see Flood mapping algorithm 3 (TUW) for details).

## ¶ Ancillary data preparation

In order to enable the calculations for the flood mapping and its by-products, the GFM system relies on several auxiliary datasets. These datasets provide information on terrain, land cover, and Sentinel-1 orbit geometry and are necessary for the Sentinel-1 preprocessing, the flood- and water mapping itself, and the preparation of the Exclusion Mask and the Advisory Flags.

Like the Sentinel-1 backscatter observations and the thereof derived statistical- and local signature parameters, these auxiliary datasets are stored in the Equi7Grid-based data cube at their native sampling. Consequently, also the auxiliary datasets are more efficiently stored and are accessible via the same interfaces as the Sentinel-1 datasets.

### ¶ Satellite orbit state vectors

The Restituted Orbit (RESORB) files for the Sentinel-1 satellites contain the restituted Orbit State Vectors (OSVs) based on the orbit determination performed by ESA’s Precise Orbit Determination Service. The OSVs describe the geometry of the satellite’s centre of gravity in an Earth-fixed coordinate frame and are a necessary input to the geocoding of the incoming Sentinel-1 IW GRDH scenes when geometric precision at the 20m scale is demanded. The RESORB files are selected as input in the pre-processing component of the GFM system, as the (more precise) Precise Orbit Ephemerides (POEORB) files are unsuitable in NRT operations, with a delivery of 20 days after acquisition.

The RESORB files are accessed from Copernicus POD Data Hub at a 5 minutes interval and accessed by SNAP from within the pre-processing pipeline.

### ¶ Copernicus DEM

The Copernicus Digital Elevation Model (DEM) is a digital surface model (DSM) which (unlike a digital terrain model) includes structures above ground, e.g. buildings, bridges, or vegetation. The Copernicus DEM is mainly based on data from the WorldDEM, which is a product originating from the TanDEM-X mission, but is further enhanced using auxiliary height data from ASTER, SRTM90, SRTM30, GMTED2010 ALOS World 3D-30m and many more. It is available in three resolutions at two spatial extent windows, EEA-10 (10m, Europe), GLO-30 (30m, global) and GLO-90 (90m, global). Within the scope of the GFM project, we used the 30m sampled global version of the Copernicus DEM to achieve a trade-off between a high spatial sampling and a global coverage.

Because SNAP 7/8 only supports SRTM height data internally for geocoding, every other DEM must be provided and prepared from outside. Since SAR geocoding routines work with ellipsoid based 3D coordinates, commonly used orthometric heights H, as is the case for the Copernicus DEM, must be transformed to ellipsoidal heights ${h}_{ellips}h_\left\{ellips\right\}$. To do so, geoid undulations (${N}_{geoid}N_\left\{geoid\right\}$), which are supplied by various Earth Gravitational Models (e.g. EGM96 or EGM2008), must be taken into account. These parameters are incorporated into the following formula to derive the desired ellipsoidal heights:

SNAP 7/8 only works with one single external DEM file, so both data sets need to be combined to a global mosaic. The workflow describing the data flow from the download of both tiled input data sets to the creation of the global mosaic in ellipsoidal heights, summarized in Table 4.

Table 4: Summary of the workflow from the download of the 30m Copernicus DEM and EGM2008 tiled input data sets to the creation of the global mosaic in ellipsoidal heights.

### ¶ Sentinel-1 Global Backscatter Model

The Sentinel-1 Global Backscatter Model (S1-GBM) was generated by the Remote Sensing Group of the TU Vienna, within a dedicated project by the European Space Agency (ESA). The dataset describes Earth’s complete land surface for the period 2016-17 by the temporal mean and standard deviation of the Sentinel-1 ${\sigma }^{0}\sigma ^0$ backsctter in VV- and VH-polarization at a 10 m sampling, giving a high-quality impression on surface structures and patterns.

In GFM, we use the temporal mean VV and VH ${\sigma }^{0}\sigma ^0$ backscatter and VH standard deviation (computed jointly from data from all orbits) for the identification of densely vegetated areas, where the Sentinel-1 SAR observations are insensitive to floods.

### ¶ HAND drainage information

The Height Above Nearest drainage (HAND) index Rennó et al. (2008) contains the vertical distance between any location and the nearest location in the drainage network with respect to pre-identified flow directions and by minimizing hydrological distances. The HAND index is calculated near-globally (between 60°N and 56°S) based on elevation and drainage direction information provided by the Hydrosheds mapping product Lehner et al. (2008). This product was derived based on the digital elevation model of the Shuttle Radar Topography Mission (SRTM), which had to be hydrologically conditioned by using a sequence of automated procedures such as void-filling, filtering, stream burning, and upscaling techniques as well as manual corrections (Lehner et al., 2008).

In the GFM project, the HAND index was used to create the HAND Exclusion Mask (HAND-EM) which helps us to reduce water-lookalike classes depending on the hydrologic–topographic setting and separate the flood-prone from non-flood prone areas.

### ¶ Land cover data

To support the masking of area where the Sentinel-1 sensors as C-band radar are in general insensitive to floods, we collect land cover data sets. Furthermore, the Affected Land Cover product layers provides such information to the users of the individual GFM products.The Global Human Settlement Layer (GHSL) is a static urban mask generated by JRC from Landsat-8 optical imagery at 30 m resolution. Information about the Affected Population (i.e., number of people living within flooded area) is derived from the GHSL-POP dataset. GHS-BUILD-S2 dataset contains the information about the build up area expressed in terms of probability grid at 10 m resolution. This information is derived from a Sentinel-2 global image composite.

The World Settlement Footprint 2015 (WSF2015) is a static urban mask generated by DLR from Sentinel-1 and Landsat-8 data at 10m resolution in 2015. The GHSL and the WSF2015 will be used to highlight urban areas where flood detection is probably not possible due to the side-looking viewing geometry of SAR-satellites and complex interactions of the SAR signal with urban structures.

The Global forest change dataset (Hansen et al., (2013); version 1.8: https://storage.googleapis.com/earthenginepartners-hansen/GFC-2020-v1.8/download.html) characterises global forest extent and change from 2000 through 2020 and is based on the time-series analysis of Landsat images in characterizing global forest extent and change from 2000 through 2020. The Global forest change dataset was used to highlight the densely vegetated areas, where their identification based on Sentinel-1 Global Backscatter Model (S1-GBM) was not possible because of the irregular temporal coverage of Sentinel-1.