2. About the ESA CCI Toolbox

The ESA CCI Toolbox is a software developed to facilitate processing and analysis of all the data products generated by the ESA Climate Change Initiative Programme (CCI). It supports analysis and interactive visualisation of these data products using its Python interface. The ESA CCI Toolbox Python API allows using the functions of the ESA CCI Toolbox in Python programs and may also be used to build extensions.

2.1. Concepts

The ESA CCI Toolbox software is based on a few simple concepts. To get the most out of using the toolbox, it makes sense to make oneself familiar with them before using the Toolbox.

2.1.1. Data Stores

The ESA CCI Toolbox uses the concept of data stores as provided by the xcube software package. This has the advantage that users may easily define their own stores to combine their own third-party data with the CCI data. Users may read more about stores in the documentation of the xcube data store framework.

The CCI toolbox comes with three pre-defined stores: The first one is the CCI Open Data Portal store, which is denoted by the handle esa-cci. It provides access to any data from the Open Data Portal.

The second data store is the Zarr store (esa-cci-zarr), which allows to access datasets from the Open Data Portal that have been converted to the Zarr format. This has been done for selected datasets which were either frequently used or large in terms of data volume. Providing the data as zarr files allows for a more performant data access. The number of Zarr datasets will be constantly increased.

The third data store is the Kerchunk Store (esa-cci-kc), which accesses datasets that are offered by the Open Data Portal via the references format. This format allows to access the files with a similar performance as the Zarr data store.

Additionally, the CCI Toolbox allows to define output stores, to which operation results may be written.

You can find detailed listings of the provided functionality in the API Reference.

2.1.2. Datasets

Data Stores provide access to datasets. You may open datasets from a data store by providing the dataset’s identifier. The ESA CCI Toolbox will only read in a dataset’s metadata and basic structure, but not actual data until explicitly requested (e.g., during the application of an operation or saving). That way, the Toolbox can deal with datasets that don’t fit into your computer’s memory. The ESA CCI Toolbox allows for out-of-core and multi-core processing. However, you can always read datasets directly from your local storage, . e.g., NetCDF files or ESRI Shapefiles.

The ESA CCI Toolbox does not invent new data structures for representing datasets in memory. Instead, opened datasets are represented by data structures defined by the popular Python packages xarray, pandas, and geopandas:

  • Gridded and raster datasets (based on NetCDF/CF or Zarr) are represented by xarray.Dataset objects. Dataset variables are represented by NumPy-compatible xarray.DataArray objects.

  • Vector datasets (from ESRI Shapefiles, GeoJSON files) are represented by geopandas.GeoDataFrame objects. Dataset variables are represented by pandas-compatible geopandas.GeoSeries objects.

  • Tabular data (from CSV, Excel files) are represented by pandas.DataFrame objects.

Note that all remote CCI data set identifiers are prefixed by “esacci.”, for example esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.1-0.r1.

2.1.3. Functions and Operations

The ESA CCI Toolbox provides numerous I/O, analysis, and processing operations that address typical climate analyses. These operations are Python functions. The ESA CCI Toolbox has an operation registry where functions are registered. In addition to operations provided by the ESA CII Toolbox, the Python packages xarray, pandas, and geopandas provide a rich and powerful low-level data processing interface for the datasets opened through the Toolbox.

You can find detailed listings of the provided functionality in API Reference.