Missing data in ERA5T

Data for the hours 22 and 23 have often been missing in the ERA5T dataset lately, for example at dates 2020-11-10, 2020-11-11, 2020-11-13. Is this a known problem and can it be expected to continue?

Hi,

At the moment the timing of the availability of ERA5T data on the CDS on a daily basis can vary. We do not work to a specific target schedule. However, the D-5 data are typically available by 12UTC, but not guaranteed. We are working on reducing the variability of the time of availability, but this may take several months to achieve.


Thanks

Michela

Hi,

Thanks, but that was not what my question was about. If I download data using the code bellow, then all data at timestamps 2020-11-13 22:00 and 2020-11-13 23:00 are missing (nan:s) for the instantaneous variables (eg temperature). This pattern of missing data at the hours 22 and 23 has been reoccurring lately. Is this data known to be missing at the source? Or could it be something weird happening in converting it to netCDF format?


import cdsapi
date = "2020-11-13"
c = cdsapi.Client(wait_until_complete = True)
request = {
   'product_type': 'reanalysis',
   'variable': ["10m_u_component_of_wind", 
   "10m_v_component_of_wind", 
   "2m_dewpoint_temperature", "2m_temperature", 
   "soil_temperature_level_3", 
   "soil_temperature_level_4", "surface_pressure", 
   "surface_solar_radiation_downwards", 
   "surface_thermal_radiation_downwards", 
   "total_cloud_cover", "total_precipitation", 
   "total_sky_direct_solar_radiation_at_surface"],
   "date": date,
   "time": '00/to/23/by/1',
   'area': [-56, -180, 71, 180],
   "format": "netcdf"
}
fn = f"tmp/ERA5_{date}.nc"
r = c.retrieve('reanalysis-era5-single-levels',request, fn)
ds = xr.open_dataset(fn, decode_times=True, decode_cf=True)
print(ds.sel(latitude=60, longitude=15)["t2m"])

Hi,

this could happen because the request caught all the times available at that moment. As said, we try to guarantee the availability of D-5 by 12UTC on day D (where D is the current day), however this may not always be possible due to operational factors.

If you try to run the same request again, you'll continue to get the file in the cache with the two missing steps. To avoid this, we suggest you add the keyword 'nocache' in CDS API request, with a random string e.g.:

'nocache':'123'


Thanks

Michela

Thanks Michela. Using a random string or any other change works.  Just to check that I understand you right. A call can go through even though all 24 hours are not yet ready, then when remaking the call, it will fetch a cached file with missing data?

I currently have a cloud function that regularly (each 5 hour) looks for one variable at a single grid at hour 23. If that call don’t fail, I start up a larger instance to actually retrieve the data. I was assuming the call would fail if the data does not exist, but it seems like I actually have make sure it’s not a NaN as well?

Hi Lukas,

yes, a call can go through even though all 24 hours are not yet ready, then when remaking the call, it will fetch the same cached file with missing data.

If the request of data at 2300 hours works, all the data for that day should be there.

If the data are not there your request would fail:

Thanks

Michela