4.2.6. Access to Data in NetCDF format viewed as Zarr

Some datasets offered by the Open Data Portal are provided via references format as specified by the kerchunk package. This allows accessing to the original NetCDF data files with an I/O performance comparable to Zarr.

The data is available via store identifier ‘cci-kerchunk-store’.

[1]:
from xcube.core.store import new_data_store
[2]:
kerchunk_store = new_data_store('esa-cci-kc')

Again, let’s see what data sets are available.

[3]:
datasets = kerchunk_store.list_data_ids()
datasets
[3]:
['ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2018-2017-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2019-2018-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2020-2010-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2020-2019-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-MERGED-100m-2010-2020-fv4.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-AVHRR_AM-199109-201612-fv3.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-AVHRR_PM-198201-201612-fv3.0-kr1.1',
 'ESACCI-L3C_SNOW-SWE-MERGED-19790102-20200524-fv2.0-kr1.1',
 'ESACCI-L4_FIRE-BA-MODIS-20010101-20200120-fv5.1-kr1.2',
 'ESACCI-LC-L4-LCCS-Map-300m-P1Y-1992-2015-v2.0.7b-kr1.1',
 'ESACCI-LC-L4-PFT-Map-300m-P1Y-1992-2020-v2.0.8-kr1.1',
 'ESACCI-LST-L3C-LST-MODISA-0.01deg_1MONTHLY_DAY-200207-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODISA-0.01deg_1MONTHLY_NIGHT-200207-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODIST-0.01deg_1MONTHLY_DAY-200003-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODIST-0.01deg_1MONTHLY_NIGHT-200003-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_DAY-199508-202012-fv2.00-kr1.1',
 'ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_NIGHT-199508-202012-fv2.00-kr1.1',
 'ESACCI-PERMAFROST-L4-ALT-ERA5_MODISLST_BIASCORRECTED-AREA4_PP-1997-2002-fv03.0-kr1.1',
 'ESACCI-PERMAFROST-L4-ALT-MODISLST_CRYOGRID-AREA4_PP-2003-2019-fv03.0-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMS-ACTIVE-19910805-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-19781101-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_ADJUSTED-19781101-20211231-fv07.1-kr1.1']

The names are similar but different to the ones from the Climate Toolbox store. We can have a look at a dataset’s metadata to find out more about it.

[4]:
kerchunk_store.describe_data('ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1')
[4]:
<xcube.core.store.descriptor.DatasetDescriptor at 0x7f18e8e72810>

Cubes can easily be opened from the store like this:

[5]:
cube = kerchunk_store.open_data('ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1')
cube
[5]:
<xarray.Dataset>
Dimensions:         (time: 15767, lat: 720, lon: 1440)
Coordinates:
  * lat             (lat) float64 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
  * lon             (lon) float64 -179.9 -179.6 -179.4 ... 179.4 179.6 179.9
  * time            (time) datetime64[ns] 1978-11-01 1978-11-02 ... 2021-12-31
Data variables:
    dnflag          (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    flag            (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    freqbandID      (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    mode            (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sensor          (time, lat, lon) float64 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sm              (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sm_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    t0              (time, lat, lon) datetime64[ns] dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (12/46)
    Conventions:                  CF-1.9
    cdm_data_type:                Grid
    comment:                      This dataset was produced with funding of t...
    contact:                      cci_sm_contact@eodc.eu
    creator_email:                cci_sm_developer@eodc.eu
    creator_name:                 Department of Geodesy and Geoinformation, V...
    ...                           ...
    time_coverage_start:          19781101T000000Z
    time_coverage_start_product:  19781101T000000Z
    title:                        ESA CCI Surface Soil Moisture COMBINED acti...
    tracking_id:                  dc6ade2c-e51b-4a94-81fa-751df95a85a6
    kerchunk_revision:            kr1.1
    kerchunk_creation_date:       031023T093359

Subsets of the data may easily be created like this:

[6]:
sub_cube = cube.sel({
    'lat': slice(40.40, -40.40),
    'lon': slice(-23.40, 57.40),
    'time': slice('2000-01-01', '2000-12-31')
    }
)
sub_cube
[6]:
<xarray.Dataset>
Dimensions:         (time: 366, lat: 324, lon: 324)
Coordinates:
  * lat             (lat) float64 40.38 40.12 39.88 ... -39.88 -40.12 -40.38
  * lon             (lon) float64 -23.38 -23.12 -22.88 ... 56.88 57.12 57.38
  * time            (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-12-31
Data variables:
    dnflag          (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    flag            (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    freqbandID      (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    mode            (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sensor          (time, lat, lon) float64 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sm              (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sm_uncertainty  (time, lat, lon) float32 dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    t0              (time, lat, lon) datetime64[ns] dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
Attributes: (12/46)
    Conventions:                  CF-1.9
    cdm_data_type:                Grid
    comment:                      This dataset was produced with funding of t...
    contact:                      cci_sm_contact@eodc.eu
    creator_email:                cci_sm_developer@eodc.eu
    creator_name:                 Department of Geodesy and Geoinformation, V...
    ...                           ...
    time_coverage_start:          19781101T000000Z
    time_coverage_start_product:  19781101T000000Z
    title:                        ESA CCI Surface Soil Moisture COMBINED acti...
    tracking_id:                  dc6ade2c-e51b-4a94-81fa-751df95a85a6
    kerchunk_revision:            kr1.1
    kerchunk_creation_date:       031023T093359

… and we can plot the data.

[7]:
sub_cube.sm.sel(time='2000-07-01 12:00:00', method='nearest').plot.imshow(cmap='Greys_r', figsize=(8, 8))
[7]:
<matplotlib.image.AxesImage at 0x7f18c98810d0>
../_images/notebooks_6-ECT_Kerchunk_Access_12_1.png
[ ]: