Good morning,
Just asking, Is there at ECMWF any way to access data in a cloud native way using Zarr? Does not need to be a production product, only for testing as use case.
Thank everyone,
Good morning,
Just asking, Is there at ECMWF any way to access data in a cloud native way using Zarr? Does not need to be a production product, only for testing as use case.
Thank everyone,
Hi,
There are some redistributions of datasets available on AWS, like this one. However, the main issue is that these redistributions are not well maintained and do not provide regular updates.
At Open-Meteo, we are actively working on a redistribution of ERA5 and similar datasets on AWS, using cloud-native formats. Due to the massive size of these datasets, this is quite challenging. Operating such systems is costly, and even with open-data sponsorship on AWS, it remains difficult to maintain without external funding. Moreover, keeping the datasets up to date is a challenge, often requiring updates to large volumes of data.
To make it possible, Open-Meteo has developed a custom file format that compresses data in chunks, allowing for efficient access to specific parts of the dataset. By applying an integer compression scheme with fixed precision (e.g., rounding temperature to 0.05K), we achieve high compression ratios while maintaining fast access speeds. This method significantly outperforms existing formats.
Open-Meteo’s initial goal was to provide time-series APIs for reanalysis datasets and weather forecasts with fast updates. While ERA5 APIs are available, they can be cumbersome for accessing large portions of the dataset.
Parts of Open-Meteo’s underlaying database, including ERA5, ERA5-Land, CERRA, and ECMWF HRES archives, are already available in our custom format on AWS. Unfortunately, client libraries for Python, Rust, and WASM TypeScript are still in development. We are also working on additional features to make our custom format as flexible as HDF5, NetCDF, or Zarr.
We’re in the final stages of completing the underlying C code, which will enable high-level client libraries, like in Python. Soon, we will provide examples on how to use ERA5 with this new cloud-native format on S3, aiming to make accessing ERA5 both lightning-fast and incredibly convenient!