Check your recent NetCDF files downloaded from the CDS and ADS: you may need to download them again

We were made aware of an issue which gave wrong data in NetCDF format from the Climate Data Store (CDS) and the Atmosphere Data Store (ADS). This happened when the raw data is in the GRIB format, for example, ERA5. Please consider re-downloading the data. Notice if you downloaded data in the GRIB format, there is no need to re-download the data.

We understand that the issue has now been fixed and the cache is in the process of being cleared.

What is affected: Any files downloaded in the NetCDF format while the raw format is in GRIB (see listed datasets below) between Tuesday, March 19th and Thursday, March 21st, 2024.

On CDS:

  • ERA5 datasets
  • ERA5-Land datasets
  • CERRA datasets
  • CARRA datasets
  • Seasonal Forecast datasets
  • UERRA datasets
  • GloFAS datasets
  • EFAS datasets

On ADS:

  • CAMS global atmospheric composition forecasts
  • EAC4 datasets
  • EGG4 datasets

What to do: Download the data again.

We are sorry for the inconvenience this has caused.

ECMWF Support

2 Likes

We currently think that this issue only affected the ADS on the Tuesday, and it seems that the symptom was all fields in the netCDF file having the exact same values, however we’re still investigating.

@Luke_Jones, I believe that the problem goes significantly deeper. Could you please take a look at the ticket CUS-24672? There I reported on downloading plausible (non-constant) but incorrect data for 2m_temperature from the CDS API on Wednesday.

Hi Ben,

I took a look at your ticket. From what I can see, you downloaded a netCDF file with 8 fields but the last 7 are all identical. Only the first is different to the others. That’s consistent with the other corrupted data we saw.

Just to be clear about the date range that this problem existed for: on the ADS it was just Tuesday the 19th but on the CDS it was Tuesday the 19th to Thursday the 21st.

Luke.

Hi Luke,

Thanks for getting back to me. Apologies, I misinterpreted your earlier post as claiming that the problem was confined to ADS.

Please note that the Agrometeorological indicators from 1979 to present derived from reanalysis (AgERA5) dataset has also been affected by the issue described in our announcement of 22-Mar-2024.
The following AgERA5 dates are being reprocessed and will soon be made available again:

2024-03-13
2024-03-14
2024-03-15
2024-03-16

Apologies for the inconvenience caused.

ECMWF Support

Can you give us a timeline on the reprocessing of the affected days in March 2024?

It took a little longer than expected but the following AgERA5 dates have been reprocessed and are now available to download from the Climate Data Store (CDS):

2024-03-13
2024-03-14
2024-03-15
2024-03-16

Hello, could it be that we are experiencing a repeat of this issue?

I am observing that if I download ERA5 data in NetCDF and look at the t2m variable for 2024-01-09 at times T00, 03, 06, 09, … 21, then the data at each timeslice from T03 onwards are identical, although T00 and T03 are different.

5 Likes

I agree,

I see two periods of repeated values,

2024-04-13 12:00h to 2024-04-14 23:00h
2024-04-14 00:00h to 2024-04-14 10:00h

(I haven’t yet fetched any later than that)

Applies to many ERA5 elements , Temps, Wind etc

3 Likes

Hi we are seeing the same issue too. the values are crazy in today’s file.

2 Likes

Hi,

I agree, again the NetCDF files are wrong.

Reproducing the error:

import os
import tempfile

import xarray as xr
import cdsapi

from common_library.secrets import DictSecretHandler

CDSAPI_URL = 'https://cds.climate.copernicus.eu/api/v2'
CDSAPI_TIMEOUT_IN_SECONDS = 600
CDSAPI_MAX_VARIABLES_SINGLE_REQUEST = 8

secret = DictSecretHandler(os.environ.get('CDSAPI_SECRET_ID'), 'uid', 'api_key')
api_key = f'{secret.username}:{secret.password}'

request = {
    'variable': ['2m_temperature', '10m_v_component_of_wind', '10m_u_component_of_wind', '100m_u_component_of_wind',
                 '100m_v_component_of_wind'],
    'product_type': 'reanalysis',
    'year': ['2024'],
    'month': ['4'],
    'day': ['13'],
    'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00',
             '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00',
             '23:00']}

with tempfile.TemporaryDirectory() as tmpdir:
    cdsapi_client = cdsapi.Client(key=api_key, url=CDSAPI_URL, verify=0, wait_until_complete=True,
                                  timeout=CDSAPI_TIMEOUT_IN_SECONDS)

    cdsapi_client.retrieve(
        'reanalysis-era5-single-levels',
        {
            **request,
            'format': 'netcdf',
        },
        f'{tmpdir}/download.nc')

    cdsapi_client.retrieve(
        'reanalysis-era5-single-levels',
        {
            **request,
            'format': 'grib'
        },

        f'{tmpdir}/download.grib')

    print('open files')
    nc_file = xr.open_dataset(f'{tmpdir}/download.nc')
    grib_file = xr.open_dataset(f'{tmpdir}/download.grib', engine='cfgrib')

    for label, data in [('netcdf', nc_file), ('grib', grib_file)]:
        print('*'*50)
        print(f'{label}: filter for all timestamps greater than the first timestamp in the dataset')
        sub_file = data.where(data.time > data.time.values[0], drop=True)
        # Check for any variation on temperature
        print(f'{label}: min={data.t2m.min().values.squeeze().tolist()}, max={data.t2m.max().values.squeeze().tolist()}')
        # Check for any variation for lat/lon of London
        ldn = data.where((data.latitude == 51.5) & (data.longitude == 0), drop=True)
        print(f'{label}: London Today={ldn.t2m.values.squeeze().tolist()}\n')

Output:

**************************************************
netcdf: filter for all timestamps greater than the first timestamp in the dataset
netcdf: min=-19.75750732421875, max=23.306658803851832
netcdf: London Today=[3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435, 3.4316706415036435]
**************************************************
grib: filter for all timestamps greater than the first timestamp in the dataset
grib: min=204.82492065429688, max=318.05517578125
grib: London Today=[284.96112060546875, 284.63232421875, 284.28533935546875, 284.16351318359375, 284.24078369140625, 284.4186706542969, 285.5191650390625, 286.7073974609375, 287.9845886230469, 289.4714050292969, 290.4076843261719, 291.31036376953125, 291.82720947265625, 291.900634765625, 291.72698974609375, 291.271484375, 290.62744140625, 289.7354736328125, 288.738037109375, 287.45587158203125, 286.1479797363281, 284.7046203613281, 284.1304931640625]

It seems the issue is still not solved for me,
netcdf-related issue with CSD API:
When I downloaded ERA5 hourly pressure levels data using the CDS API , I found the desired varible “specific_humidity” in the retrieved NC file seems to be the values of temperature which is another varible I ordered together. Mean while, both these two variables are constant values throught all the layers.
I tried to order this data via the browser, it turned out to be normal. SO it seems the CDS API has mixed varibles when converting Grib to Netcdf. This may only happen in CDS API. The datas I found massively errorly retrieved are between 20180102 to 20180125, hourly. Here is my script. Looking forward to your concern.

#%%
import sys  
import os
import cdsapi

yyyymmdd='20180102'
UTC='01'
CDIR='.'

if not os.path.exists(CDIR+"/"+yyyymmdd):
    os.mkdir(CDIR+"/"+yyyymmdd)

c = cdsapi.Client()
c.retrieve(
    'reanalysis-era5-pressure-levels',
    {
        'product_type': 'reanalysis',
        'format': 'netcdf',
        'variable': [
            'specific_humidity', 'temperature',
        ],
        'pressure_level': [
            '50',
            '70', '100', '125',
            '150', '175', '200',
            '225', '250', '300',
            '350', '400', '450',
            '500', '550', '600',
            '650', '700', '750',
            '775', '800', '825',
            '850', '875', '900',
            '925', '950', '975',
            '1000',
        ],
        'year': yyyymmdd[0:4],
        'month': yyyymmdd[4:6],
        'day': yyyymmdd[6:8],
        'time': UTC+':00',
        'area': [70, -180, -70, 180,],
        'grid': ['0.25','0.25']
    },
    CDIR+"/"+yyyymmdd+'/ERA5-PL-GBL-25km-'+yyyymmdd+'-'+UTC+'00.nc')
    # 'ERA5-PL-GBL-'+year+month+day+time+'.nc')

the meta data of the ncfile is as below, note the scale_factor and add_offset of q are wrong:

   short q(time=1, level=29, latitude=561, longitude=1440);
      :scale_factor = 0.0035906519685684926; // double
      :add_offset = 117.65130506603339; // double
      :_FillValue = -32767S; // short
      :missing_value = -32767S; // short
      :units = "kg kg**-1";
      :long_name = "Specific humidity";
      :standard_name = "specific_humidity";

    short t(time=1, level=29, latitude=561, longitude=1440);
      :scale_factor = 7.010125146872568E-4; // double
      :add_offset = 212.33612105135984; // double
      :_FillValue = -32767S; // short
      :missing_value = -32767S; // short
      :units = "K";
      :long_name = "Temperature";
      :standard_name = "air_temperature";

It turns out to be, the resolution specification 'grid': ['0.25','0.25'] leaded to this problem, when I comment it as default, the data retrieved well. However, such specification in ERA5-Land dataset is also working well now.

So, the keyword ‘grid’ for ERA5 hourly pressure levels is no longer steadly supported in the lastest updates? or just be abnormal for some date, since 20180101 also works well with grid.

Unfortunately there was another issue with GRIB to NetCDF conversion on the Climate Data Store (CDS) last week (18-19 April 2024) which was fixed - please refer to the announcement posted here: You may need to re-download data from the CDS

Apologies for the inconvenience caused.

ECMWF Support