4.2.2.5. Gap Filling
This notebook will show some algorithms to fill gaps of vector data and raster data. The notebook is structured as follows:
Gap Filling of a vector dataset in spatial domain
Gap filling of a raster dataset in temporal and spatial domain
[1]:
from esa_climate_toolbox.core import get_op
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import scipy.interpolate as interp
import xarray as xr
from xcube.core.store import new_data_store
To get access to the data, we will initiate the ESA CCI Zarr store in the following cell.
[2]:
store = new_data_store('esa-cci-zarr')
4.2.2.5.1. 1. Gap Filling of a Vector Dataset in the Spatial Domain
In this example, we will work with trend data of the Greenland ice sheet, which is provided as a set of spatial points over a given time range. We will apply several interpolation algorithms to fill the gaps between these points, ultimately converting the data into a raster format.
First we will open the data as usual.
[3]:
ds = store.open_data("ESACCI-ICESHEETS_Greenland_GMB-2003-2016-v1.1.zarr")
ds
[3]:
<xarray.Dataset> Size: 43kB
Dimensions: (meas_ind: 901, time: 10)
Dimensions without coordinates: meas_ind, time
Data variables:
GMB_trend (meas_ind, time) float32 36kB dask.array<chunksize=(901, 10), meta=np.ndarray>
end_time (time) datetime64[ns] 80B dask.array<chunksize=(10,), meta=np.ndarray>
latitude (meas_ind) float32 4kB dask.array<chunksize=(901,), meta=np.ndarray>
longitude (meas_ind) float32 4kB dask.array<chunksize=(901,), meta=np.ndarray>
start_time (time) datetime64[ns] 80B dask.array<chunksize=(10,), meta=np.ndarray>
t (time) datetime64[ns] 80B dask.array<chunksize=(10,), meta=np.ndarray>
Attributes: (12/20)
Institution: DTU Space - Div. of Geodynamics
Latitude_max: 83.23728
Latitude_min: 60.73857
Longitude_max: -14.706720000000018
Longitude_min: -71.83566000000002
Method: GRACE GMB trend following Barletta et al. 2012
... ...
grid_projection: Geographical coordinates relative to WGS84
netCDF_version: NETCDF3_CLASSIC
product_version: 1.2
time_coverage_end: 2016-01-31
time_coverage_start: 2003-01-01
time_resolution: 5YWe define the final coordiates of the resired target raster and extract one time slice from the data set.
[4]:
lon = np.arange(-73., -12., 0.02)
lat = np.arange(83., 59., -0.02)
grid_lon, grid_lat = np.meshgrid(lon, lat)
GMB_trend_slice = ds.GMB_trend.isel(time=0)
Now we can perform a 2D linear interpolation using scipy.interpolate.griddata.
[5]:
gridded_data = interp.griddata(
(ds.longitude, ds.latitude),
GMB_trend_slice,
(grid_lon, grid_lat),
method='linear'
)
ds_linear = xr.Dataset(
{'GMB_trend_slice': (['lat', 'lon'], gridded_data)},
coords={'lon': lon, 'lat': lat}
)
ds_linear
[5]:
<xarray.Dataset> Size: 29MB
Dimensions: (lat: 1200, lon: 3050)
Coordinates:
* lon (lon) float64 24kB -73.0 -72.98 -72.96 ... -12.04 -12.02
* lat (lat) float64 10kB 83.0 82.98 82.96 ... 59.06 59.04 59.02
Data variables:
GMB_trend_slice (lat, lon) float64 29MB nan nan nan nan ... nan nan nan nanThe same approach can be applied using "cubic" as the interpolation method.
[6]:
gridded_data = interp.griddata(
(ds.longitude, ds.latitude),
GMB_trend_slice,
(grid_lon, grid_lat),
method="cubic"
)
ds_cubic = xr.Dataset(
{'GMB_trend_slice': (['lat', 'lon'], gridded_data)},
coords={'lon': lon, 'lat': lat}
)
ds_cubic
[6]:
<xarray.Dataset> Size: 29MB
Dimensions: (lat: 1200, lon: 3050)
Coordinates:
* lon (lon) float64 24kB -73.0 -72.98 -72.96 ... -12.04 -12.02
* lat (lat) float64 10kB 83.0 82.98 82.96 ... 59.06 59.04 59.02
Data variables:
GMB_trend_slice (lat, lon) float64 29MB nan nan nan nan ... nan nan nan nanNow, we can plot the results: the left plot shows the original points from the ESA CCI ice sheet trend dataset, the center plot displays the linear interpolation, and the right plot presents the cubic interpolation.
[7]:
fig, ax = plt.subplots(1, 3, figsize=(20, 6))
scatter = ax[0].scatter(
ds.longitude, ds.latitude, c=GMB_trend_slice, cmap='seismic', vmin=-500, vmax=500, s=30
)
ax[0].set_title('Scatter Plot of Values on 2D Grid')
ax[0].set_ylabel('lat')
ax[0].set_xlabel('lon')
fig.colorbar(scatter, ax=ax[0], orientation='vertical')
ds_linear.GMB_trend_slice.plot(ax=ax[1], vmin=-500, vmax=500, cmap='seismic')
ax[1].set_title('Linear interpolation')
ds_cubic.GMB_trend_slice.plot(ax=ax[2], vmin=-500, vmax=500, cmap='seismic')
ax[2].set_title('Cubic interpolation')
[7]:
Text(0.5, 1.0, 'Cubic interpolation')
4.2.2.5.2. 2. Gap filling of a raster dataset
In this example, we will work with the soil moisture dataset, which is a 3d rastered dataset with gaps in it. We will show an example how to fill the gaps along the time axis for a time series and how to fill the pags in the 2d spatial domain.
First we will open the data as usual.
[8]:
ds = store.open_data('ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-1978-2021-fv07.1.zarr')
ds
[8]:
<xarray.Dataset> Size: 654GB
Dimensions: (time: 15767, lat: 720, lon: 1440)
Coordinates:
* lat (lat) float64 6kB 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
* lon (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.6 179.9
* time (time) datetime64[ns] 126kB 1978-11-01 ... 2021-12-31
Data variables:
dnflag (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
flag (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
freqbandID (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
mode (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
sensor (time, lat, lon) float64 131GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
sm (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
sm_uncertainty (time, lat, lon) float32 65GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
t0 (time, lat, lon) float64 131GB dask.array<chunksize=(16, 720, 720), meta=np.ndarray>
Attributes: (12/44)
Conventions: CF-1.9
cdm_data_type: Grid
comment: This dataset was produced with funding of t...
contact: cci_sm_contact@eodc.eu
creator_email: cci_sm_developer@eodc.eu
creator_name: Department of Geodesy and Geoinformation, V...
... ...
time_coverage_end_product: 20211231T235959Z
time_coverage_resolution: P1D
time_coverage_start: 1978-11-01 00:00:00
time_coverage_start_product: 19781101T000000Z
title: ESA CCI Surface Soil Moisture COMBINED acti...
tracking_id: ad35798e-58e0-488f-b5b9-593874a477004.2.2.5.2.1. Gap filling along the time axis
For this, we create a small function that removes all NaN values and then performs interpolation to the original time axis using xarray.Dataset.interp().
[9]:
def fill_temporal_gaps(ds, method="linear"):
new_times = ds_ts.time
ds_nonans = ds_ts.dropna(dim="time")
return ds_nonans.interp(time=new_times, method=method)
We will apply this gap-filling method to one year of the dataset as an example. To do this, we subset the dataset and select the year 2000.
[10]:
ds_ts = ds.sel(lat=20, lon=20, method="nearest")
ds_ts = ds_ts.sel(time=slice('2000-01-01 00:00:00', '2001-01-01 00:00:00'))
ds_ts
[10]:
<xarray.Dataset> Size: 18kB
Dimensions: (time: 367)
Coordinates:
lat float64 8B 20.12
lon float64 8B 20.12
* time (time) datetime64[ns] 3kB 2000-01-01 ... 2001-01-01
Data variables:
dnflag (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
flag (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
freqbandID (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
mode (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
sensor (time) float64 3kB dask.array<chunksize=(13,), meta=np.ndarray>
sm (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
sm_uncertainty (time) float32 1kB dask.array<chunksize=(13,), meta=np.ndarray>
t0 (time) float64 3kB dask.array<chunksize=(13,), meta=np.ndarray>
Attributes: (12/44)
Conventions: CF-1.9
cdm_data_type: Grid
comment: This dataset was produced with funding of t...
contact: cci_sm_contact@eodc.eu
creator_email: cci_sm_developer@eodc.eu
creator_name: Department of Geodesy and Geoinformation, V...
... ...
time_coverage_end_product: 20211231T235959Z
time_coverage_resolution: P1D
time_coverage_start: 1978-11-01 00:00:00
time_coverage_start_product: 19781101T000000Z
title: ESA CCI Surface Soil Moisture COMBINED acti...
tracking_id: ad35798e-58e0-488f-b5b9-593874a47700Next, we apply the gap-filling method to the selected sample.
[11]:
ds_ts_gap_fill = fill_temporal_gaps(ds_ts)
ds_ts_gap_fill
[11]:
<xarray.Dataset> Size: 18kB
Dimensions: (time: 367)
Coordinates:
lat float64 8B 20.12
lon float64 8B 20.12
* time (time) datetime64[ns] 3kB 2000-01-01 ... 2001-01-01
Data variables:
dnflag (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
flag (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
freqbandID (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
mode (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
sensor (time) float64 3kB dask.array<chunksize=(367,), meta=np.ndarray>
sm (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
sm_uncertainty (time) float32 1kB dask.array<chunksize=(367,), meta=np.ndarray>
t0 (time) float64 3kB dask.array<chunksize=(367,), meta=np.ndarray>
Attributes: (12/44)
Conventions: CF-1.9
cdm_data_type: Grid
comment: This dataset was produced with funding of t...
contact: cci_sm_contact@eodc.eu
creator_email: cci_sm_developer@eodc.eu
creator_name: Department of Geodesy and Geoinformation, V...
... ...
time_coverage_end_product: 20211231T235959Z
time_coverage_resolution: P1D
time_coverage_start: 1978-11-01 00:00:00
time_coverage_start_product: 19781101T000000Z
title: ESA CCI Surface Soil Moisture COMBINED acti...
tracking_id: ad35798e-58e0-488f-b5b9-593874a47700We then plot the original and gap-filled soil moisture dataset, zooming in on a two-month period to better highlight the gap-filling effect.
[12]:
time_slice = slice('2000-06-01 00:00:00', '2000-08-01 00:00:00')
ds_ts.sm.sel(time=time_slice).plot(label="Original")
ds_ts_gap_fill.sm.sel(time=time_slice).plot(label="Gap filling", linestyle="dashed")
plt.legend()
[12]:
<matplotlib.legend.Legend at 0x785408414ec0>
4.2.2.5.2.2. Gap Filling in the Spatial Domain
In this example, we utilize the low-pass gap-filling algorithm from `gridtools <https://github.com/esa-esdl/gridtools>`__ (DOI 10.5281/zenodo.15062839). This method effectively applies a 2D low-pass filter to the data. As a demonstration, we select a single time slice and apply gap filling in the spatial domain using the aforementioned algorithm.
[13]:
sm_slice = ds.sm.sel(time='2000-07-01 12:00:00', method='nearest')
sm_slice
[13]:
<xarray.DataArray 'sm' (lat: 720, lon: 1440)> Size: 4MB
dask.array<getitem, shape=(720, 1440), dtype=float32, chunksize=(720, 720), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 6kB 89.88 89.62 89.38 89.12 ... -89.38 -89.62 -89.88
* lon (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.4 179.6 179.9
time datetime64[ns] 8B 2000-07-02
Attributes:
_CoordinateAxes: time lat lon
ancillary_variables: sm_uncertainty flag t0
dtype: float32
long_name: Volumetric Soil Moisture
standard_name: soil_moisture_content
units: m3 m-3
valid_range: [0.0, 1.0][14]:
ds_sub = ds.sel(time='2000-07-01 12:00:00', method='nearest')
ds_sub = ds_sub[["sm"]]
ds_sub
[14]:
<xarray.Dataset> Size: 4MB
Dimensions: (lat: 720, lon: 1440)
Coordinates:
* lat (lat) float64 6kB 89.88 89.62 89.38 89.12 ... -89.38 -89.62 -89.88
* lon (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.4 179.6 179.9
time datetime64[ns] 8B 2000-07-02
Data variables:
sm (lat, lon) float32 4MB dask.array<chunksize=(720, 720), meta=np.ndarray>
Attributes: (12/44)
Conventions: CF-1.9
cdm_data_type: Grid
comment: This dataset was produced with funding of t...
contact: cci_sm_contact@eodc.eu
creator_email: cci_sm_developer@eodc.eu
creator_name: Department of Geodesy and Geoinformation, V...
... ...
time_coverage_end_product: 20211231T235959Z
time_coverage_resolution: P1D
time_coverage_start: 1978-11-01 00:00:00
time_coverage_start_product: 19781101T000000Z
title: ESA CCI Surface Soil Moisture COMBINED acti...
tracking_id: ad35798e-58e0-488f-b5b9-593874a47700Next, we apply the toolbox’s gap-filling operation, which is highly efficient as it leverages Numba for parallel computation.
[15]:
gapfill_op = get_op('gapfill')
[16]:
ds_filled = gapfill_op(ds_sub)
ds_filled.sm
[16]:
<xarray.DataArray 'sm' (lat: 720, lon: 1440)> Size: 4MB
dask.array<fillgaps_lowpass_2d, shape=(720, 1440), dtype=float32, chunksize=(720, 720), chunktype=numpy.ndarray>
Coordinates:
* lat (lat) float64 6kB 89.88 89.62 89.38 89.12 ... -89.38 -89.62 -89.88
* lon (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.4 179.6 179.9
time datetime64[ns] 8B 2000-07-02
Attributes:
_CoordinateAxes: time lat lon
ancillary_variables: sm_uncertainty flag t0
dtype: float32
long_name: Volumetric Soil Moisture
standard_name: soil_moisture_content
units: m3 m-3
valid_range: [0.0, 1.0]Finally, we plot the results for North Africa as an example to illustrate the gap-filling process.
[17]:
fig, ax = plt.subplots(1, 2, figsize=(20, 5))
extent = dict(lat=slice(30, 10), lon=slice(-15, 35))
ds_sub.sm.sel(**extent).plot(ax=ax[0])
ds_filled.sm.sel(**extent).plot(ax=ax[1])
[17]:
<matplotlib.collections.QuadMesh at 0x785408428f50>
[ ]: