As part of its effort to continuously improve the quality of service offered to users, the Data Stores Services (DSS) at ECMWF is setting up an “ARCO Data Lake”. ARCO (Analysis-Ready Cloud-Optimised) data structures can offer much faster access speeds to users, especially for access patterns that don’t match with the underlying native storage. For example, it can be slow to retrieve a long time-series at a single geographical point if the underlying data is stored as a series of horizontal fields, as is the case for ERA5 datasets. But ARCO data structures are chunked in both space and time which means increased performance when slicing through time.
The Data Lake is being incrementally populated with the most demanded datasets from the DSS portfolio. Data is hosted in Zarr format and Data Lake assets are served to users through a variety of interfaces.
These include dedicated time-series datasets:
- ERA5 hourly time-series data on single levels from 1940 to present
- ERA5-Land hourly time-series data from 1950 to present
as well as standard visualization services (WMTS) and interactive applications such as the ERA Explorer and the Thermal Trace.
Datasets are available for visualization in the Wekeo Viewer e.g. for ERA5-Land at https://wekeo.copernicus.eu/data?view=layers&dataset=EO:ECMWF:DAT:REANALYSIS_ERA5_LAND
Currently in alpha version (test phase), direct access to data cubes by using the API token is under testing phase. Due to the anticipated high demand and workload in the hosting infrastructure triggered by this new access mechanism, datasets will be gradually opened to the public, subject to the outputs of the test.
Looking ahead, the ARCO-based capabilities across all layers of the DSS infrastructure, from data to software, interfaces and services, will continue to grow, making the ARCO Data Lake and related capabilities a cornerstone component for the evolution of the Service.
Watch this space for further announcements!
ECMWF Support
on behalf of Data Stores Services