ECMWF APIs (FAQ, API + data documentation)

Below you find a centralized compilation of key documentation on the new ECMWF APIs as cross-posted from ECMWF API use | BlueGreen Labs.

I will only use the CDS as a reference, but the same applies to ADS and EWDS. The first section will cover general API use and will be familiar to most, with some gotchas highlighted (e.g. switching services, accepting licenses). Further sections cover breaking changes and advanced API implementations. For now I do not touch data conversion issues formally. This is a living document and I’ll update it when new information becomes available.

General API use

For those ending here with old non-functioning scripts, all API endpoints have been migrated to the new server. This means that you will need to update your login credentials and python packages. To get started follow these instructions:

  1. update your login by registering a new ECMWF account
  • you need to validate the login for each service
  • the API key is the same across all services
  1. update the python cdsapi package
pip install 'cdsapi>=0.7.2'
  • “official” support is provided for this package only
  • for other implementations, such as {ecmwfr} use the forum
  1. set your API key and API url (depending on the service, more on this below in switching services)

  2. accept the license agreement for the products you want to use

  • you find the license agreement on the right hand bar in the dataset search panel (see below)
  • all accepted licenses are listed at the end of your profile page
  • not accepting the license will generate failed downloads

The instructions assume that you will generate new scripts, based upon the download pages of any service.

Note that using virtual environments (e.g. Anaconda) might create login and path based issues.

Data download pages for all services

Switching services

The cdsapi python package generates a conflict when trying to use the software to download from multiple data services. In short, the URL used for any given services is determined (set) by a value in your .cdsapirc file. If there is a mismatch between the product requested and the URL provided your download will fail. If downloading from CDS and ADS this would necessitate altering the .cdsapirc file. The workaround for this is to set an environmental variable.

FIX: Set a URL environmental variable before every switch in API (data service) in your python script using:

# CDS
os.environ['CDSAPI_URL'] = 'https://cds.climate.copernicus.eu/api'
# ADS
os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api'
# EWDS
os.environ['CDSAPI_URL'] = 'https://ewds.climate.copernicus.eu/api'
# Load libraries
import os
import cdsapi

# Switch to ADS (URL)
os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api'

client = cdsapi.Client()
dataset = "<DATASET-SHORT-NAME>"
request = {
    <SELECTION-REQUEST>
}
target = "<TARGET-FILE>"
client.retrieve(dataset, request, target)

WARNING: Never include your API key in any scripts, you can set the .cdsapirc file for a single service and use the above line before any change in service without exposing your key.

formatting requests

Note that for certain datasets (and probably all where it applies) it is not allowed to mix instantaneous values and cumulative totals. Keep this in mind when querying values across hours of the day in any multi-hour product. Remember that daily and monthly summaries are provided for some values in different overall products.

scheduling

You can schedule your downloads in batches by submitting instances and keeping track of them manually.

# submit a data request
r = client.retrieve(dataset, request)

# set your target directory
target = '/your/directory/for/data'

# ask to download the data
r.download(target)

When you don’t download the data immediately you can poll for a download later. A status is returned if not successful at this point. Doing so allows you to poll on slower schedule and potentially submit multiple queries.

Note that this assumes that you do not drop out of the current script / session and cycle through all requests on a schedule which does not trigger rate limiters (don’t spam the server by looping through things unnecessarily).

slow processing

A common issue are slow request processing. Although the capacity of the services has increased there seem to be, at times, discrepancies between requests initiated through the web portal and the API (python package). These issues are not consistent, before submitting a forum query check the status of the servers at:

https://status.ecmwf.int/

If the services are down (maintenance etc) waiting a while is the best option. Normally the capacity will rebound within a day or so, following maintenance and update schedules.

Some of the slow processing issues can be resolved by careful scheduling of your downloads in low traffic moments or by using a custom scheduler.

Breaking changes

Non functioning scripts

The format of the python API package changed, i.e. old scripts and requests will need to be reworked.

FIX: Rephrase your old query using a data page search (see above) and alter the request part of your python query

client.retrieve(dataset, request, target) # <- new retrieval call

Non functioning netCDF files

The netCDF output of the new API (netCDF4) is different from the old API (netCDF3), especially when it comes to formatting time variables. This will leave the new output incompatible with any old processing workflows.

FIX: Alter the netCDF output field (data_format) in your request to ‘netcdf_legacy’ to get the old output format e.g.

dataset = 'reanalysis-era5-pressure-levels'
  request = {
     'product_type': ['reanalysis'],
     'variable': ['geopotential'],
     'year': ['2024'],
     'month': ['03'],
     'day': ['01'],
     'time': ['13:00'],
     'pressure_level': ['1000'],
     'data_format': 'netcdf_legacy' # <- SET TO THE LEGACY FORMAT
 }

The above fix is not a long-term solution. However, until things consolidate this might be the best option to ensure continuity of your products and services, without expending energy on ongoing updates or fixes.

Basic API documentation (custom API implementations)

For those implementing their own scripts using curl, C, R or other languages note that the endpoint structure and workflow has been altered. I’ll use the CDS endpoint as an example, but the structure remains the same across other data services.

The API requires two API endpoints for a complete download (+ validation). The base URL of all APIs has the form of:

https://cds.climate.copernicus.eu/api/retrieve/v1/

where the first part of the URL changes depending on the service used.

A query to the service uses this base URL + modifiers to submit a valid POST call. The POST call not only depends on end line arguments but also alters the URL itself e.g.:

https://cds.climate.copernicus.eu/api/retrieve/v1/processes/{dataset}/execute/

where {dataset} is the dataset you would find in the python request (see ‘reanalysis-era5-pressure-levels’ in the above demo request).

HTML return codes will give you an indication of the success of the call itself. If successful the call will return a job ID. You can use this job ID to check on the progress of the download using the following structure:

https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/{ID}

If the processing was successful this call will return you a URL for the location of the data, from which you can download it. This URL has the following structure:

https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/{ID}/results/

Formatting requests

Note that for a (JSON) request to function properly it needs to be wrapped into an additional {“inputs”: {JSON_REQUEST}} wrapper. This is not reported by the debugger and a hard to find detail. You can find the original implementation linked below. Adjust your queries to this format in whichever language you implement your custom data scheduler.

Authentication

The API uses a custom header field (not the Authorization routine) to validate your transactions. To successfully query the API(s) you need to add the following statement to your header (including your private key). In the examples below use the URLs (URL) as highlighted above and your private key (KEY).

In Curl this would read:

curl -i -H "PRIVATE-TOKEN: KEY" \
         -H "Content-Type: application/json" URL

In Python this would read:

  import requests
  requests.get(
   URL,
   headers={"PRIVATE-TOKEN":"KEY"}
  )

In R this reads:

  library(httr)
  GET(
   URL,
   add_headers("PRIVATE-TOKEN" = KEY)
  )

Acknowledgements

Thanks go out to all who first highlighted these issues in the forum and those providing solutions. This resource will be updated when new information becomes available.

5 Likes

Thanks a lot @Koen_Hufkens !

This one was the problem for my scripts. I could not understand why my code stopped working.

Now code works as expected!

1 Like

Thank you very much Koen and Odil!
The truth is that I am working in php, not in python. But I would like to know at least the endpoint I have to use to test it in POSTMAN.
If I use the endpoints in the document you attached they don’t work.
Thank you very much for your attention.
Regards
Rosa

1 Like

Thank you so much @Koen_Hufkens ! I’ve got the new API working from Java :grinning:

1 Like

@rosamaria_arnau I was able to use the URL in this post for testing with postman

for me (as I am getting ERA5 hourly data on single levels) I used this one:

https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-single-levels/execute/