Inconsistency Between Files Downloaded by 0.6.1 and 0.7.0 Version of cdsapi

Dear ECMWF Support Team,

I hope this email finds you well.

I am writing to seek clarification regarding an issue I encountered while downloading ERA5 reanalysis data using different versions of the cdsapi. When I used cdsapi version 0.6.1, the program occasionally got stuck and suspended. The printed information indicated that the old version would soon be deprecated and that version 0.7.0 was now available.

Following this, I created a new Python virtual environment, installed cdsapi version 0.7.0, and followed the instructions on this page [Climate Data Store]. I then downloaded the same dataset using both versions:

[‘reanalysis-era5-single-levels’,
{
‘product_type’: ‘reanalysis’,
‘format’: ‘netcdf’,
‘variable’: ‘2m_temperature’,
‘year’: ‘2023’,
‘month’: ‘12’,
‘day’: ‘30’,
‘time’: ‘23:00’,
}]

I found that the files downloaded by the two versions had different file sizes, with the file downloaded by cdsapi 0.6.1 being larger. Additionally, when I checked the consistency between these two files, I discovered that they had different values. This inconsistency has made me hesitant to use the new version, as I am unsure whether my code contains hidden mistakes or if there are bugs in the new API.

To illustrate this issue, I have included screenshots of my downloading programs using cdsapi versions 0.6.1 and 0.7.0 (1.png and 2.png) respectively. These screenshots show the code used for downloading and comparing the data.

Could you please review these screenshots and provide guidance on this issue? I am particularly concerned about the inconsistency in the downloaded data and would appreciate your insights on which version I should use for reliable data retrieval.

I am looking forward to hearing from you soon.

Thank you for your assistance.

Best regards,


Because I am also investigating this, I think the difference is in the packing performed to the previous GRIB->netCDF converter which isn’t present any more. Looking at data in past, most of them would be stored “packed”:

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#packed-data

while current downloaded data in netCDF format are stored as floating point values, compressed with (g)zip filter into netCDF4 file format without packing. Because the packing is a lossy procedure, the data differ. Never mind, though, because the data were originally stored in GRIB files, potentially already packed using a (different) lossy compression (bit shaving, JPEG2?), and the data now in the netCDF file as floating points cannot be considered fully sporting all significant bits as supported by the IEEE754, which in itself is an approximation of real numbers anyway…
My suggestion about the mismatch is “live with it”.