4.2.1.6. Access to Data in NetCDF format viewed as Zarr

Some datasets offered by the Open Data Portal are provided via references format as specified by the kerchunk package. This allows accessing to the original NetCDF data files with an I/O performance comparable to Zarr.

The data is available via store identifier ‘cci-kerchunk-store’.

[1]:
from xcube.core.store import new_data_store
[2]:
kerchunk_store = new_data_store('esa-cci-kc')

Again, let’s see what data sets are available.

[3]:
datasets = kerchunk_store.list_data_ids()
datasets
[3]:
['ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2018-2017-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2019-2018-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2020-2010-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-CHANGE-100m-2020-2019-fv4.0-kr1.1',
 'ESACCI-BIOMASS-L4-AGB-MERGED-100m-2010-2020-fv4.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-ATSR2_AATSR-199506-201204-fv3.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-AVHRR_AM-199109-201612-fv3.0-kr1.1',
 'ESACCI-L3C_CLOUD-CLD_PRODUCTS-AVHRR_PM-198201-201612-fv3.0-kr1.1',
 'ESACCI-L3C_SNOW-SWE-MERGED-19790102-20200524-fv2.0-kr1.1',
 'ESACCI-L4_FIRE-BA-MODIS-20010101-20200120-fv5.1-kr1.2',
 'ESACCI-LC-L4-LCCS-Map-300m-P1Y-1992-2015-v2.0.7b-kr1.1',
 'ESACCI-LC-L4-PFT-Map-300m-P1Y-1992-2020-v2.0.8-kr1.1',
 'ESACCI-LST-L3C-LST-MODISA-0.01deg_1MONTHLY_DAY-200207-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODISA-0.01deg_1MONTHLY_NIGHT-200207-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODIST-0.01deg_1MONTHLY_DAY-200003-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3C-LST-MODIST-0.01deg_1MONTHLY_NIGHT-200003-201812-fv3.00-kr1.1',
 'ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_DAY-199508-202012-fv2.00-kr1.1',
 'ESACCI-LST-L3S-LST-IRCDR_-0.01deg_1MONTHLY_NIGHT-199508-202012-fv2.00-kr1.1',
 'ESACCI-PERMAFROST-L4-ALT-ERA5_MODISLST_BIASCORRECTED-AREA4_PP-1997-2002-fv03.0-kr1.1',
 'ESACCI-PERMAFROST-L4-ALT-MODISLST_CRYOGRID-AREA4_PP-2003-2019-fv03.0-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMS-ACTIVE-19910805-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-PASSIVE-19781101-20211231-fv07.1-kr1.1',
 'ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED_ADJUSTED-19781101-20211231-fv07.1-kr1.1']

The names are similar but different to the ones from the Climate Toolbox store. We can have a look at a dataset’s metadata to find out more about it.

[4]:
kerchunk_store.describe_data('ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1')
[4]:
<xcube.core.store.descriptor.DatasetDescriptor at 0x7f84834486e0>

Cubes can easily be opened from the store like this:

[5]:
cube = kerchunk_store.open_data('ESACCI-SOILMOISTURE-L3S-SSMV-COMBINED-19781101-20211231-fv07.1-kr1.1')
cube
[5]:
<xarray.Dataset> Size: 654GB
Dimensions:         (time: 15767, lat: 720, lon: 1440)
Coordinates:
  * lat             (lat) float64 6kB 89.88 89.62 89.38 ... -89.38 -89.62 -89.88
  * lon             (lon) float64 12kB -179.9 -179.6 -179.4 ... 179.6 179.9
  * time            (time) datetime64[ns] 126kB 1978-11-01 ... 2021-12-31
Data variables:
    dnflag          (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    flag            (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    freqbandID      (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    mode            (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sensor          (time, lat, lon) float64 131GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sm              (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    sm_uncertainty  (time, lat, lon) float32 65GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
    t0              (time, lat, lon) datetime64[ns] 131GB dask.array<chunksize=(1, 720, 1440), meta=np.ndarray>
Attributes: (12/46)
    Conventions:                  CF-1.9
    cdm_data_type:                Grid
    comment:                      This dataset was produced with funding of t...
    contact:                      cci_sm_contact@eodc.eu
    creator_email:                cci_sm_developer@eodc.eu
    creator_name:                 Department of Geodesy and Geoinformation, V...
    ...                           ...
    time_coverage_start:          19781101T000000Z
    time_coverage_start_product:  19781101T000000Z
    title:                        ESA CCI Surface Soil Moisture COMBINED acti...
    tracking_id:                  dc6ade2c-e51b-4a94-81fa-751df95a85a6
    kerchunk_revision:            kr1.1
    kerchunk_creation_date:       031023T093359

Subsets of the data may easily be created like this:

[6]:
sub_cube = cube.sel({
    'lat': slice(40.40, -40.40),
    'lon': slice(-23.40, 57.40),
    'time': slice('2000-01-01', '2000-12-31')
    }
)
sub_cube
[6]:
<xarray.Dataset> Size: 2GB
Dimensions:         (time: 366, lat: 324, lon: 324)
Coordinates:
  * lat             (lat) float64 3kB 40.38 40.12 39.88 ... -39.88 -40.12 -40.38
  * lon             (lon) float64 3kB -23.38 -23.12 -22.88 ... 56.88 57.12 57.38
  * time            (time) datetime64[ns] 3kB 2000-01-01 ... 2000-12-31
Data variables:
    dnflag          (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    flag            (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    freqbandID      (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    mode            (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sensor          (time, lat, lon) float64 307MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sm              (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    sm_uncertainty  (time, lat, lon) float32 154MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
    t0              (time, lat, lon) datetime64[ns] 307MB dask.array<chunksize=(1, 324, 324), meta=np.ndarray>
Attributes: (12/46)
    Conventions:                  CF-1.9
    cdm_data_type:                Grid
    comment:                      This dataset was produced with funding of t...
    contact:                      cci_sm_contact@eodc.eu
    creator_email:                cci_sm_developer@eodc.eu
    creator_name:                 Department of Geodesy and Geoinformation, V...
    ...                           ...
    time_coverage_start:          19781101T000000Z
    time_coverage_start_product:  19781101T000000Z
    title:                        ESA CCI Surface Soil Moisture COMBINED acti...
    tracking_id:                  dc6ade2c-e51b-4a94-81fa-751df95a85a6
    kerchunk_revision:            kr1.1
    kerchunk_creation_date:       031023T093359

… and we can plot the data.

[7]:
sub_cube.sm.sel(time='2000-07-01 12:00:00', method='nearest').plot.imshow(cmap='Greys_r', figsize=(8, 8))
[7]:
<matplotlib.image.AxesImage at 0x7f8472243080>
../../_images/notebooks_Accessing_Data_6-ECT_Kerchunk_Access_12_1.png