Providing a bit more info to make this better searchable for others with similar problems.
I had a similar problem when i recently downloaded era5 files as netcdf. I requested several variables, and downloaded one file per year
“surface_pressure”, ‘10m_u_component_of_wind’, ‘10m_v_component_of_wind’, ‘2m_dewpoint_temperature’, ‘2m_temperature’, ‘total_precipitation’, ‘mean_surface_downward_short_wave_radiation_flux’, ‘geopotential’
For all year except for 2024 this worked fine. For 2024 (the current year) tp
and msdwswrf
were missing.
If I just open the grib file for 2024, it will load all variables except for tp
and msdwswrf
, and provide following warning (just an excerpt)
In [2]: d=xr.open_dataset("scandinavia_2024.grb")
skipping variable: paramId==228 shortName='tp'
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/cfgrib/dataset.py", line 660, in build_dataset_components
dict_merge(variables, coord_vars)
File "/usr/local/lib/python3.12/site-packages/cfgrib/dataset.py", line 591, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='time' value=Variable(dimensions=('time',), data=array([1704067200, 1704070800, 1704074400, ..., 1728594000, 1728597600,
1728601200])) new_value=Variable(dimensions=('time',), data=array([1704045600, 1704088800, 1704132000, 1704175200, 1704218400,
1704261600, 1704304800, 1704348000, 1704391200, 1704434400,
[...]
1728453600, 1728496800, 1728540000, 1728583200]))
skipping variable: paramId==235035 shortName='msdwswrf'
Traceback (most recent call last):
File "/usr/local/lib/python3.12/site-packages/cfgrib/dataset.py", line 660, in build_dataset_components
dict_merge(variables, coord_vars)
File "/usr/local/lib/python3.12/site-packages/cfgrib/dataset.py", line 591, in dict_merge
raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='time' value=Variable(dimensions=('time',), data=array([1704067200, 1704070800, 1704074400, ..., 1728594000, 1728597600,
1728601200])) new_value=Variable(dimensions=('time',), data=array([1704045600, 1704088800, 1704132000, 1704175200, 1704218400,
1704261600, 1704304800, 1704348000, 1704391200, 1704434400,
[...]
1704693600, 1704736800, 1704780000, 1704823200, 1704866400,
[...]
Solution
It worked fine to read the dataset with cfgrib.open_datasets
, so e.g. d=cfgrib.open_datasets("scandinavia_2024.grb")
. More information on that functionality is provided here GitHub - ecmwf/cfgrib: A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
For me d[0]
includes variables indexed only along time
, which is the same as valid_time
:
Dimensions: (time: 6816, latitude: 62, longitude: 97)
Coordinates:
number int64 0
* time (time) datetime64[ns] 2024-01-01 ... 2024-10-10T23:00:00
step timedelta64[ns] 00:00:00
surface float64 0.0
* latitude (latitude) float64 72.25 72.0 71.75 71.5 ... 57.5 57.25 57.0
* longitude (longitude) float64 4.39 4.64 4.89 5.14 ... 27.89 28.14 28.39
valid_time (time) datetime64[ns] 2024-01-01 ... 2024-10-10T23:00:00
Data variables:
z (time, latitude, longitude) float32 ...
sp (time, latitude, longitude) float32 ...
u10 (time, latitude, longitude) float32 ...
...
The values from aggregated variables are in d[1]
and indexed differently
<xarray.Dataset>
Dimensions: (time: 569, step: 12, latitude: 62, longitude: 97)
Coordinates:
number int64 0
* time (time) datetime64[ns] 2023-12-31T18:00:00 ... 2024-10-10T18:0...
* step (step) timedelta64[ns] 01:00:00 02:00:00 ... 11:00:00 12:00:00
surface float64 0.0
* latitude (latitude) float64 72.25 72.0 71.75 71.5 ... 57.5 57.25 57.0
* longitude (longitude) float64 4.39 4.64 4.89 5.14 ... 27.89 28.14 28.39
valid_time (time, step) datetime64[ns] 2023-12-31T19:00:00 ... 2024-10-1...
Data variables:
tp (time, step, latitude, longitude) float32 ...
msdwswrf (time, step, latitude, longitude) float32 ...
It’s easy though to convert them
data=d1[1].stack({"time_linear": ["time","step"]})
data = data.swap_dims({"time_linear": "valid_time"})
# you _might_ need to slice the data further, data includes some extra (nan) values
# from 2023-12-31 to be able to cover 2024
# data = data.isel(valid_time=slice(5,-7))
API request
# original api reqest as netcdf
dataset = "reanalysis-era5-single-levels"
request = {
'product_type': ['reanalysis'],
'variable': ["surface_pressure", '10m_u_component_of_wind', '10m_v_component_of_wind', '2m_dewpoint_temperature', '2m_temperature', 'total_precipitation', 'mean_surface_downward_short_wave_radiation_flux', 'geopotential'],
'year': [str(year)],
'month': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12'],
'day': ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31'],
'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00'],
'data_format': "netcdf", #'grib',
'download_format': 'unarchived',
'area': [72.29, 4.39, 57, 28.5]
}
client = cdsapi.Client()
target = f"scandinavia_{year}.nc"
client.retrieve(dataset, request, target)#.download()
As mentioned above, to get the varialbes for the current year (2024) I needed to download the grib files instead.