Below you find a centralized compilation of key documentation on the new ECMWF APIs as cross-posted from ECMWF API use | BlueGreen Labs.
I will only use the CDS as a reference, but the same applies to ADS and EWDS. The first section will cover general API use and will be familiar to most, with some gotchas highlighted (e.g. switching services, accepting licenses). Further sections cover breaking changes and advanced API implementations. For now I do not touch data conversion issues formally. This is a living document and Iâll update it when new information becomes available.
General API use
For those ending here with old non-functioning scripts, all API endpoints have been migrated to the new server. This means that you will need to update your login credentials and python packages. To get started follow these instructions:
- update your login by registering a new ECMWF account
- you need to validate the login for each service
- the API key is the same across all services
- update the python cdsapi package
pip install 'cdsapi>=0.7.2'
- âofficialâ support is provided for this package only
- for other implementations, such as {ecmwfr} use the forum
-
set your API key and API url (depending on the service, more on this below in
switching services
) -
accept the license agreement for the products you want to use
- you find the license agreement on the right hand bar in the dataset search panel (see below)
- all accepted licenses are listed at the end of your profile page
- not accepting the license will generate failed downloads
The instructions assume that you will generate new scripts, based upon the download pages of any service.
Note that using virtual environments (e.g. Anaconda) might create login and path based issues.
Data download pages for all services
- CDS: https://cds.climate.copernicus.eu/datasets
- ADS: https://ads.atmosphere.copernicus.eu/datasets
- EWDS: https://ewds.climate.copernicus.eu/datasets
Switching services
The cdsapi
python package generates a conflict when trying to use the software to download from multiple data services. In short, the URL used for any given services is determined (set) by a value in your .cdsapirc
file. If there is a mismatch between the product requested and the URL provided your download will fail. If downloading from CDS and ADS this would necessitate altering the .cdsapirc
file. The workaround for this is to set an environmental variable.
FIX: Set a URL environmental variable before every switch in API (data service) in your python script using:
# CDS os.environ['CDSAPI_URL'] = 'https://cds.climate.copernicus.eu/api' # ADS os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api' # EWDS os.environ['CDSAPI_URL'] = 'https://ewds.climate.copernicus.eu/api'
# Load libraries import os import cdsapi # Switch to ADS (URL) os.environ['CDSAPI_URL'] = 'https://ads.atmosphere.copernicus.eu/api' client = cdsapi.Client() dataset = "<DATASET-SHORT-NAME>" request = { <SELECTION-REQUEST> } target = "<TARGET-FILE>" client.retrieve(dataset, request, target)
WARNING: Never include your API key in any scripts, you can set the
.cdsapirc
file for a single service and use the above line before any change in service without exposing your key.
formatting requests
Note that for certain datasets (and probably all where it applies) it is not allowed to mix instantaneous values and cumulative totals. Keep this in mind when querying values across hours of the day in any multi-hour product. Remember that daily and monthly summaries are provided for some values in different overall products.
scheduling
You can schedule your downloads in batches by submitting instances and keeping track of them manually.
# submit a data request
r = client.retrieve(dataset, request)
# set your target directory
target = '/your/directory/for/data'
# ask to download the data
r.download(target)
When you donât download the data immediately you can poll for a download later. A status is returned if not successful at this point. Doing so allows you to poll on slower schedule and potentially submit multiple queries.
Note that this assumes that you do not drop out of the current script / session and cycle through all requests on a schedule which does not trigger rate limiters (donât spam the server by looping through things unnecessarily).
slow processing
A common issue are slow request processing. Although the capacity of the services has increased there seem to be, at times, discrepancies between requests initiated through the web portal and the API (python package). These issues are not consistent, before submitting a forum query check the status of the servers at:
If the services are down (maintenance etc) waiting a while is the best option. Normally the capacity will rebound within a day or so, following maintenance and update schedules.
Some of the slow processing issues can be resolved by careful scheduling of your downloads in low traffic moments or by using a custom scheduler.
Breaking changes
Non functioning scripts
The format of the python API package changed, i.e. old scripts and requests will need to be reworked.
FIX: Rephrase your old query using a data page search (see above) and alter the request part of your python query
client.retrieve(dataset, request, target) # <- new retrieval call
Non functioning netCDF files
The netCDF output of the new API (netCDF4) is different from the old API (netCDF3), especially when it comes to formatting time variables. This will leave the new output incompatible with any old processing workflows.
FIX: Alter the netCDF output field (data_format) in your request to ânetcdf_legacyâ to get the old output format e.g.
dataset = 'reanalysis-era5-pressure-levels' request = { 'product_type': ['reanalysis'], 'variable': ['geopotential'], 'year': ['2024'], 'month': ['03'], 'day': ['01'], 'time': ['13:00'], 'pressure_level': ['1000'], 'data_format': 'netcdf_legacy' # <- SET TO THE LEGACY FORMAT }
The above fix is not a long-term solution. However, until things consolidate this might be the best option to ensure continuity of your products and services, without expending energy on ongoing updates or fixes.
Basic API documentation (custom API implementations)
For those implementing their own scripts using curl
, C
, R
or other languages note that the endpoint structure and workflow has been altered. Iâll use the CDS endpoint as an example, but the structure remains the same across other data services.
The API requires two API endpoints for a complete download (+ validation). The base URL of all APIs has the form of:
where the first part of the URL changes depending on the service used.
A query to the service uses this base URL + modifiers to submit a valid POST call. The POST call not only depends on end line arguments but also alters the URL itself e.g.:
https://cds.climate.copernicus.eu/api/retrieve/v1/processes/{dataset}/execute/
where {dataset}
is the dataset you would find in the python request (see âreanalysis-era5-pressure-levelsâ in the above demo request).
HTML return codes will give you an indication of the success of the call itself. If successful the call will return a job ID
. You can use this job ID to check on the progress of the download using the following structure:
If the processing was successful this call will return you a URL for the location of the data, from which you can download it. This URL has the following structure:
https://cds.climate.copernicus.eu/api/retrieve/v1/jobs/{ID}/results/
Formatting requests
Note that for a (JSON) request to function properly it needs to be wrapped into an additional {âinputsâ: {JSON_REQUEST}}
wrapper. This is not reported by the debugger and a hard to find detail. You can find the original implementation linked below. Adjust your queries to this format in whichever language you implement your custom data scheduler.
Authentication
The API uses a custom header field (not the Authorization routine) to validate your transactions. To successfully query the API(s) you need to add the following statement to your header (including your private key). In the examples below use the URLs (URL) as highlighted above and your private key (KEY).
In Curl this would read:
curl -i -H "PRIVATE-TOKEN: KEY" \
-H "Content-Type: application/json" URL
In Python this would read:
import requests
requests.get(
URL,
headers={"PRIVATE-TOKEN":"KEY"}
)
In R this reads:
library(httr)
GET(
URL,
add_headers("PRIVATE-TOKEN" = KEY)
)
Acknowledgements
Thanks go out to all who first highlighted these issues in the forum and those providing solutions. This resource will be updated when new information becomes available.