ECMWF APIs (FAQ, API + data documentation)

Koen_Hufkens · 14 October 2024 12:02

Below you find a centralized compilation of key documentation on the new ECMWF APIs as cross-posted from ECMWF API use | BlueGreen Labs.

I will only use the CDS as a reference, but the same applies to ADS and EWDS. The first section will cover general API use and will be familiar to most, with some gotchas highlighted (e.g. switching services, accepting licenses). Further sections cover breaking changes and advanced API implementations. For now I do not touch data conversion issues formally. This is a living document and I’ll update it when new information becomes available.

General API use

For those ending here with old non-functioning scripts, all API endpoints have been migrated to the new server. This means that you will need to update your login credentials and python packages. To get started follow these instructions:

update your login by registering a new ECMWF account

you need to validate the login for each service
the API key is the same across all services

update the python cdsapi package

pip install 'cdsapi>=0.7.2'

“official” support is provided for this package only
for other implementations, such as {ecmwfr} use the forum

set your API key and API url (depending on the service, more on this below in switching services)
accept the license agreement for the products you want to use

you find the license agreement on the right hand bar in the dataset search panel (see below)
all accepted licenses are listed at the end of your profile page
not accepting the license will generate failed downloads

The instructions assume that you will generate new scripts, based upon the download pages of any service.

Note that using virtual environments (e.g. Anaconda) might create login and path based issues.

Data download pages for all services

Switching services

The cdsapi python package generates a conflict when trying to use the software to download from multiple data services. In short, the URL used for any given services is determined (set) by a value in your .cdsapirc file. If there is a mismatch between the product requested and the URL provided your download will fail. If downloading from CDS and ADS this would necessitate altering the .cdsapirc file. The workaround for this is to set an environmental variable.

FIX: Set a URL environmental variable before every switch in API (data service) in your python script using:
# CDS
os.environ['CDSAPI_URL'] = 'https://cds.climate.copernicus.eu/api'
# ADS
os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api'
# EWDS
os.environ['CDSAPI_URL'] = 'https://ewds.climate.copernicus.eu/api'

# Load libraries
import os
import cdsapi

# Switch to ADS (URL)
os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api'

client = cdsapi.Client()
dataset = "<DATASET-SHORT-NAME>"
request = {
    <SELECTION-REQUEST>
}
target = "<TARGET-FILE>"
client.retrieve(dataset, request, target)

WARNING: Never include your API key in any scripts, you can set the .cdsapirc file for a single service and use the above line before any change in service without exposing your key.

formatting requests

Note that for certain datasets (and probably all where it applies) it is not allowed to mix instantaneous values and cumulative totals. Keep this in mind when querying values across hours of the day in any multi-hour product. Remember that daily and monthly summaries are provided for some values in different overall products.

scheduling

You can schedule your downloads in batches by submitting instances and keeping track of them manually.

# submit a data request
r = client.retrieve(dataset, request)

# set your target directory
target = '/your/directory/for/data'

# ask to download the data
r.download(target)

When you don’t download the data immediately you can poll for a download later. A status is returned if not successful at this point. Doing so allows you to poll on slower schedule and potentially submit multiple queries.

Note that this assumes that you do not drop out of the current script / session and cycle through all requests on a schedule which does not trigger rate limiters (don’t spam the server by looping through things unnecessarily).

slow processing

A common issue are slow request processing. Although the capacity of the services has increased there seem to be, at times, discrepancies between requests initiated through the web portal and the API (python package). These issues are not consistent, before submitting a forum query check the status of the servers at:

https://status.ecmwf.int/

If the services are down (maintenance etc) waiting a while is the best option. Normally the capacity will rebound within a day or so, following maintenance and update schedules.

Some of the slow processing issues can be resolved by careful scheduling of your downloads in low traffic moments or by using a custom scheduler.

Breaking changes

Non functioning scripts

The format of the python API package changed, i.e. old scripts and requests will need to be reworked.

FIX: Rephrase your old query using a data page search (see above) and alter the request part of your python query
client.retrieve(dataset, request, target) # <- new retrieval call

Non functioning netCDF files

The netCDF output of the new API (netCDF4) is different from the old API (netCDF3), especially when it comes to formatting time variables. This will leave the new output incompatible with any old processing workflows.

FIX: Alter the netCDF output field (data_format) in your request to ‘netcdf_legacy’ to get the old output format e.g.

dataset = 'reanalysis-era5-pressure-levels'
  request = {
     'product_type': ['reanalysis'],
     'variable': ['geopotential'],
     'year': ['2024'],
     'month': ['03'],
     'day': ['01'],
     'time': ['13:00'],
     'pressure_level': ['1000'],
     'data_format': 'netcdf_legacy' # <- SET TO THE LEGACY FORMAT
 }

The above fix is not a long-term solution. However, until things consolidate this might be the best option to ensure continuity of your products and services, without expending energy on ongoing updates or fixes.

Basic API documentation (custom API implementations)

For those implementing their own scripts using curl, C, R or other languages note that the endpoint structure and workflow has been altered. I’ll use the CDS endpoint as an example, but the structure remains the same across other data services.

The API requires two API endpoints for a complete download (+ validation). The base URL of all APIs has the form of:

https://cds.climate.copernicus.eu/api/retrieve/v1/

where the first part of the URL changes depending on the service used.

A query to the service uses this base URL + modifiers to submit a valid POST call. The POST call not only depends on end line arguments but also alters the URL itself e.g.:

https://cds.climate.copernicus.eu/api/retrieve/v1/processes/{dataset}/execute/

where {dataset} is the dataset you would find in the python request (see ‘reanalysis-era5-pressure-levels’ in the above demo request).

HTML return codes will give you an indication of the success of the call itself. If successful the call will return a job ID. You can use this job ID to check on the progress of the download using the following structure:

https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/{ID}

If the processing was successful this call will return you a URL for the location of the data, from which you can download it. This URL has the following structure:

https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/{ID}/results/

Formatting requests

Note that for a (JSON) request to function properly it needs to be wrapped into an additional {“inputs”: {JSON_REQUEST}} wrapper. This is not reported by the debugger and a hard to find detail. You can find the original implementation linked below. Adjust your queries to this format in whichever language you implement your custom data scheduler.

github.com

ecmwf-projects/cads-api-client/blob/ca975c86ec586125c5cd674ba33351b327603584/cads_api_client/processing.py#L78


      
              *args: Any,
              raise_for_status: bool = True,
              session: requests.Session | None = None,
              retry_options: dict[str, Any] = {"maximum_tries": 2, "retry_after": 10},
              **kwargs: Any,
          ) -> T_ApiResponse:
              if session is None:
                  session = requests.Session()
              method = kwargs["method"] if "method" in kwargs else args[0]
              url = kwargs["url"] if "url" in kwargs else args[1]
              inputs = kwargs.get("json", {}).get("inputs", {})
              logger.debug(f"{method.upper()} {url} {inputs}")
              response = multiurl.robust(session.request, **retry_options)(*args, **kwargs)
              logger.debug(f"REPLY {response.text}")
          
              if raise_for_status:
                  cads_raise_for_status(response)
              self = cls(response, headers=kwargs.get("headers", {}), session=session)
              self.log_messages()
              return self

Authentication

The API uses a custom header field (not the Authorization routine) to validate your transactions. To successfully query the API(s) you need to add the following statement to your header (including your private key). In the examples below use the URLs (URL) as highlighted above and your private key (KEY).

In Curl this would read:

curl -i -H "PRIVATE-TOKEN: KEY" \
         -H "Content-Type: application/json" URL

In Python this would read:

  import requests
  requests.get(
   URL,
   headers={"PRIVATE-TOKEN":"KEY"}
  )

In R this reads:

  library(httr)
  GET(
   URL,
   add_headers("PRIVATE-TOKEN" = KEY)
  )

Acknowledgements

Thanks go out to all who first highlighted these issues in the forum and those providing solutions. This resource will be updated when new information becomes available.

Odil_Dasturlovchi · 15 November 2024 04:43

Thanks a lot @Koen_Hufkens !

This one was the problem for my scripts. I could not understand why my code stopped working.

Now code works as expected!

rosamaria_arnau · 20 November 2024 20:15

Thank you very much Koen and Odil!
The truth is that I am working in php, not in python. But I would like to know at least the endpoint I have to use to test it in POSTMAN.
If I use the endpoints in the document you attached they don’t work.
Thank you very much for your attention.
Regards
Rosa

Laura_Bromley · 26 November 2024 17:14

Thank you so much @Koen_Hufkens ! I’ve got the new API working from Java

Laura_Bromley · 26 November 2024 17:16

@rosamaria_arnau I was able to use the URL in this post for testing with postman

for me (as I am getting ERA5 hourly data on single levels) I used this one:

https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-single-levels/execute/

Michael_Kittridge · 15 March 2025 09:53

Hi all,

I just want to make the HTTP API more clear for people that discover this thread.

cdsapi

The official python package (cdsapi) is quite complicated and pretty hard to read even for a python programmer like myself.
For an equivalent set of code that is meant to mimic the python package, I would refer people to the Julia implementation:
cdsapi.jl

It’s much more straightforward to read and understand regardless of your programming language background.

HTTP API

Base url

The base url that is used for all http API calls is currently the following:

https://cds.climate.copernicus.eu/api

Authentication

All http API calls must contain the PRIVATE-TOKEN header with the user-specific key. In python, the headers dictionary would be the following:

headers = {'PRIVATE-TOKEN': {key}}

ECMWF Products

The product is the name the ECMWF gives to a dataset. For example, this can be:

product = 'reanalysis-era5-pressure-levels'

or

product = 'reanalysis-era5-land'

Request parameters

The last piece of information you need to create a job on the CDS server is the request parameters specifically associated with a product. An easy way to figure this out is to go to the CDS website and manually click on variables, dates, times, etc and then go down to the “API Request” section at the very bottom to see the code.
Here’s a link to a product download page:

Here’s an example written as a python dictionary:

request_params = {
 'data_format': 'netcdf',
 'variable': '2m_dewpoint_temperature',
 'area': [-34.3, 166.3, -47.3, 178.6],
 'download_format': 'unarchived',
 'year': ['1950'],
 'month': ['01'],
 'day': ['01'],
 'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00']
 }

If you set the data_format to ‘netcdf’, there’s no need to have the ‘download_format’ set to ‘zip’ as CDS will use zip internally in the netcdf4 anyway (which is good). It does not appear that ‘grib’ uses internal compression by default, so you might want to use ‘zip’ in this case.

The number of data values that the CDS will process in one job changes from time to time. I find it good practice to request one variable for a maximum of one year at a time.

Creating a job

Now that we have all the necessary pieces of data, we can create a job on the CDS server.
To create a job for the CDS server, we need to send a POST request to the following url template:

request_url = '{base_url}/retrieve/v1/processes/{product}/execute/'

In our above example, it would equate to:

request_url = 'https://cds.climate.copernicus.eu/api/retrieve/v1/processes/reanalysis-era5-land/execute/'

The request parameters must be sent as JSON in the body of the POST request, but with the addition of ‘inputs’ as the key for the entire request parameters.
For example, the python dictionary that should be converted to JSON should be the following using the above request_params:

request_body = {'inputs': request_params}

All up, here’s an example of using the python requests package to create a job on the CDS server:

resp = requests.post(request_url, json=request_body, headers=headers)

Response

A response with a 404 error will either mean that the url is incorrect or the product name is wrong. A 400/500 error will mean that the request parameters were wrong.

The response headers will have two vital pieces of data. The JobID is the id the server has assigned for this job request and you should keep this for later!

The second is the status. The status has four options: accepted, running, successful, and failed. When you first create a job you’ll likely either get accepted or failed. The failed status can happen for various reasons. The ECMWF constantly changes the limits on the number of queued jobs per user, and they don’t provide a clear way to know what that limit is. If the job has a status of accepted, then your good to go.

Check the status of jobs

There are two ways to check the status of jobs on the CDS server.

Check the status of a single job

Using the job_id you retrieved from your POST request, you can see the status of that job by making a GET request to the following url:

job_status_url = '{base_url}/retrieve/v1/jobs/{job_id}'

In our example, it would be:

job_id = '993a2b8f-b885-40c3-9d70-56d9273f199f'
job_status_url = 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/993a2b8f-b885-40c3-9d70-56d9273f199f'

Using the python requests package, we would do the following:

resp = requests.get(job_status_url, headers=headers)

This response is nearly identical to the response when creating a job via the POST request. When the status has changed to ‘successful’, then the download is ready. You’ll then need to use the ‘results’ link to find the download link, which will be described in the next section.

Here’s an example of a job status response (as a python dict):

response_dict = {
 'processID': 'reanalysis-era5-land',
 'type': 'process',
 'jobID': '993a2b8f-b885-40c3-9d70-56d9273f199f',
 'status': 'successful',
 'created': '2025-03-14T23:45:27.885965',
 'started': '2025-03-14T23:45:34.050973',
 'finished': '2025-03-14T23:45:35.915181',
 'updated': '2025-03-14T23:45:35.915181',
 'links': [{'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/993a2b8f-b885-40c3-9d70-56d9273f199f',
   'rel': 'self',
   'type': 'application/json'},
  {'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/993a2b8f-b885-40c3-9d70-56d9273f199f/results',
   'rel': 'results'}],
 'metadata': {'origin': 'api'}
 }

Check the status of all jobs

The other option is to check the status of all jobs.

This requires sending a GET request to the following url:

jobs_status_url = '{base_url}/retrieve/v1/jobs?limit=100'

which equates to:

jobs_status_url = 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs?limit=100'

Unlike other http requests, this one actually has some parameters that can be added to the GET request. The ‘limit’ parameter sets the max number of jobs that the server will provide. The default is 10, so I tend to put 100 to make sure I get all my jobs. The other parameter is ‘status’. By default, the server will reply with jobs with all statuses. You can filter out certain status’ by specifying the statuses in the url.

For example, the following url will only provide jobs that have a status of completed and running:

jobs_status_url = 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs?limit=100&status=completed&status=running'

The response is similar to the single job status request except that it’s a list of JSON objects for each job. Each job also has more information. Instead of having to make a job status call then a job results call, this response has got it all!

Here’s an example of a response (as a python list of dict):

rsponse_dict = [{
  'processID': 'reanalysis-era5-land',
  'type': 'process',
  'jobID': '993a2b8f-b885-40c3-9d70-56d9273f199f',
  'status': 'successful',
  'created': '2025-03-14T23:45:27.885965',
  'started': '2025-03-14T23:45:34.050973',
  'finished': '2025-03-14T23:45:35.915181',
  'updated': '2025-03-14T23:45:35.915181',
  'links': [{'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/993a2b8f-b885-40c3-9d70-56d9273f199f',
    'rel': 'monitor',
    'type': 'application/json',
    'title': 'job status info'},
   {'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/993a2b8f-b885-40c3-9d70-56d9273f199f/results',
    'rel': 'results'}],
  'metadata': {'results': {'asset': {'value': {'type': 'application/netcdf',
      'href': 'https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-cache-1/2025-03-14/d354f0e372f931bb7965a3c189a4d67d.nc',
      'file:checksum': '48805b3be456283f04ee960c17b1497f',
      'file:size': 201905,
      'file:local_path': 's3://cci2-prod-cache-1/2025-03-14/d354f0e372f931bb7965a3c189a4d67d.nc'}}},
   'datasetMetadata': {'title': 'ERA5-Land hourly data from 1950 to present'},
   'qos': {'status': {}},
   'origin': 'api'}},
  {'processID': 'reanalysis-era5-land',
   'type': 'process',
   'jobID': '8ec42bbf-7394-4f56-babc-6b9e0e5d3036',
   'status': 'failed',
   'created': '2025-03-06T01:49:44.010872',
   'finished': '2025-03-06T01:49:48.884921',
   'updated': '2025-03-06T01:49:48.879550',
   'links': [{'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/8ec42bbf-7394-4f56-babc-6b9e0e5d3036',
     'rel': 'monitor',
     'type': 'application/json',
     'title': 'job status info'},
    {'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/8ec42bbf-7394-4f56-babc-6b9e0e5d3036/results',
     'rel': 'results'}],
   'metadata': {'results': {'type': 'job results failed',
     'title': 'The job has failed.',
     'status': 400,
     'trace_id': '404ddccb-46e0-445f-9a5f-1a7eb27f287e',
     'traceback': 'Number of API queued requests for this dataset is temporarily limited. Please configure your scripts accordingly '},
    'datasetMetadata': {'title': 'ERA5-Land hourly data from 1950 to present'},
    'qos': {'status': {}},
    'origin': 'api'}}],
 'links': [{'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs?limit=100',
   'rel': 'self',
   'title': 'list of submitted jobs'},
  {'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs?limit=100&cursor=MjAyNS0wMy0wNiAwMTo0OTo0NC4wMTA4NzI%3D&back=False',
   'rel': 'next'},
  {'href': 'https://cds.climate.copernicus.eu/api/retrieve/v1/jobs?limit=100&cursor=MjAyNS0wMy0xNCAyMzo0NToyNy44ODU5NjU%3D&back=True',
   'rel': 'prev'}]

Download the requested file

If a GET request is made to a single job, then a GET request needs to be made to the ‘results’ link (contained within the job status response) to find the download link.

This results url link looks like:

job_results_url = '{base_url}/retrieve/v1/jobs/{job_id}/results'

And a GET request using the python requests package would be the following:

resp = requests.get(job_results_url, headers=headers)

The header response should look like this:

results_response = {
   'asset': {'value': {'type': 'application/netcdf',
   'href': 'https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-cache-1/2025-03-14/d354f0e372f931bb7965a3c189a4d67d.nc',
   'file:checksum': '48805b3be456283f04ee960c17b1497f',
   'file:size': 201905,
   'file:local_path': 's3://cci2-prod-cache-1/2025-03-14/d354f0e372f931bb7965a3c189a4d67d.nc'}}
   }

You’ll want the ‘href’ link to download the file. Downloading the file does not require the private user key. It’s a public link.

This download link can be obtained directly from the response when using the GET request to get the statuses of all jobs.

Limits

The ECMWF sets limits on various aspects of the API and they change them as they see fit without feedback to the user. The main issue I’ve run into is the number of jobs that can be queued at a time. In the past this was 32, but seems to have gone down to ~20. I’ve seen it as low as 10 at various times. If you don’t want to bother with handling these kinds of limits, then set the number of jobs that you queue to 10-15.

Antonio_Moya · 16 April 2025 09:02

I’ve port Julia code to C#. Feel free to use it: https://github.com/antonio-moya/CDSAPI