Forthcoming update to the format of netCDF files produced by the conversion of GRIB data on the CDS

Michela · 19 November 2024 19:43

We are pleased to announce an update to the format of netCDF files produced by the conversion of GRIB data on the CDS. The new release will take place on Tuesday 26 November 2024.

The main improvement is that when converting to netCDF, the GRIB variables are split based on the stepType grib key so that the time dimensions are correctly handled and provided in a more user friendly format.

The stepType describes how the variable has been aggregated over the time step, and can take values such as instant(=instantaneous) and accum(=accumulated).

Splitting by stepType means that valid_time is now used as the time dimension for both hourly and monthly data and that that the experiment version number coordinate is correct for all variables.

Please note: In some circumstances, the conversion from GRIB to netCDF can result in multiple files. If you select ‘Unarchived’ then the files will be returned unzipped if there is data with only one stepType requested (leading to a single netCDF file being produced), and returned zipped into a single file if there are data with multiple stepTypes (hence multiple netCDF files produced).

If you have any further questions, please raise a query from the Support Portal

Best regards
ECMWF Support

Michela · 3 December 2024 09:44

Dear users,
the change was released as expected for the ERA5 family datasets.

Regards
ECMWF Support

Peter_Akers · 5 December 2024 14:40

Hi Michela,

Is there any way that users can revert to using the old system if they wish, i.e. combine instantaneous and accumulated variables into a single file? It is relatively straightforward to either unzip or split queries, but I have other downstream processes which make having everything in one file much more convenient.

Thanks

Michela · 5 December 2024 15:12

Hi @Peter_Akers
please have a look at this page for info about the legacy converter: GRIB to netCDF conversion on new CDS and ADS systems - Copernicus Knowledge Base - ECMWF Confluence Wiki

Please keep in mind that it is not supported.

Thanks

Kenneth_Bowman · 5 December 2024 21:41

This is my first post to the forum, so my apologies if I am asking in the wrong place.

I have many years of old data in the previous netCDF file format (netCDF 3, 64-bit offset). I have downloaded some data in the new netCDF format (netCDF-4), but reading from the new files is approximately 3 orders of magnitude slower than the old files (5.099 s vs. 0.0084 s).

They both appear to be valid netCDF files and the data looks normal. My only guess is that perhaps the chunk size for the netCDF-4 files is particularly inefficient? I can provide more details if it would be of any help.

Thanks, Ken Bowman

Kenneth_Bowman · 6 December 2024 15:31

I solved my problem by re-chunking the file using nccopy to better match my access pattern.

Thanks, Ken Bowman

Eugenio_Cividini · 10 December 2024 18:39

Hi,
i have downloaded the Seasonal forecast daily and subdaily data on single levels on 12-10-2024, and found that no explicit indication of the reference time is set inside the NetCDF metadata (neither global or dataset specific metadata). I am using GDAL as opening driver and QGIS as visualization tool. Is that intended or is it a mistake (or i am missing something)? If intended, that will be quite inappropriate, on my opinion: the time interpreation should not depend from external information (outside the dataset itself). Thank you!

Eduardo_Penabad · 11 December 2024 13:43

Hello Eugenio,

welcome to the User Forum!

I have tried to reproduce the behaviour you mentioned, but I haven’t found anything unexpected and to me it seems that both the reference time (I guess here you refer to the initialisation date of the forecast) and the leadtime are properly represented in the netCDF files you receive from the CDS.

For instance, for the following request:

import cdsapi

dataset = "seasonal-original-single-levels"
request = {
    "originating_centre": "meteo_france",
    "system": "8",
    "variable": ["2m_temperature"],
    "year": ["2024"],
    "month": ["11", "12"],
    "day": ["01"],
    "leadtime_hour": [ f"{hh}" for hh in range(6,78,6)],
    "data_format": "netcdf"
}

client = cdsapi.Client()
client.retrieve(dataset, request).download()

you will get a netCDF file where you have time labelled as the "initial time of the forecast" and step as "time since forecast_reference_time". I agree, though, that latter name can sound confusing because there is no such forecast_reference_time in this file, and the variable containing the initialisation date is called time.

As an example of what you get inspecting the netCDF file using ncdump reveals all those elements:

dimensions:
	number = 51 ;
	time = 2 ;
	step = 12 ;
	latitude = 180 ;
	longitude = 360 ;
variables:

[...]

	int64 time(time) ;
		time:long_name = "initial time of forecast" ;
		time:standard_name = "forecast_reference_time" ;
		time:units = "seconds since 1970-01-01" ;
		time:calendar = "proleptic_gregorian" ;
	double step(step) ;
		step:_FillValue = NaN ;
		step:long_name = "time since forecast_reference_time" ;
		step:standard_name = "forecast_period" ;
		step:units = "hours" ;

[...]

data:
 time = "2024-11-01", "2024-12-01" ;
 step = 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72 ;

Eduardo_Penabad · 13 December 2024 08:25

Dear @Eugenio_Cividini

my apologies for a partially incorrect answer to your previous message. When looking at your issue I only did a quick inspection to the file and overlooked some important bits.

You were completely right in your question that you were not getting the expected behaviour for those files (which is to have forecast_reference_time and forecast_period as time coordinates).

I think this can be related to some technical issues the CDS had earlier this week when updating the datasets with the release of the December 2024 real-time forecasts, so it definitely needs an investigation on our side to fix this unexpected behaviour.

Apologies for the inconveniences, and thanks for your understanding.

Regards,
Edu

PS: Having said that, the netCDF conversion for these datasets has been always labelled as “experimental” and the general advice for any operational downstream workflow is to rely on the original files archived in GRIB format if possible.

Eduardo_Penabad · 18 December 2024 08:56

Dear @Eugenio_Cividini

this issue you reported last week has been now fixed and the netCDF version of the GRIB archived data in the seasonal-original-single-levels dataset should be now back to what it was expected to be (specifically forecast_reference_time as a dimension even when it only has one value; and both forecast_reference_time and forecast_period are used as the names for the time coordinates).

Thanks for your understanding.