Problems Parsing .NC Files After Outage

Hi All, is anyone else having trouble processing .NC files since the service was back online?

I’m still investigating the issue from my end but from what I can see, it looks like something in the output has changed that causes xarray to have issues reading the files.

Apologies for the unformatted traceback, for some reason I cant insert a code block.

=======

File “/databricks/python/lib/python3.8/site-packages/xarray/backends/api.py”, line 531, in open_dataset
backend_ds = backend.open_dataset(
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/netCDF4_.py”, line 555, in open_dataset
store = NetCDF4DataStore.open(
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/netCDF4_.py”, line 384, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/netCDF4_.py”, line 332, in init
self.format = self.ds.data_model
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/netCDF4_.py”, line 393, in ds
return self.acquire()
File "/databricks/python/lib/python3.8/site-packages/xarray/backends/netCDF4
.py", line 387, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File “/usr/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/file_manager.py”, line 189, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File “/databricks/python/lib/python3.8/site-packages/xarray/backends/file_manager.py”, line 207, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File “src/netCDF4/_netCDF4.pyx”, line 2353, in netCDF4._netCDF4.Dataset.init
File “src/netCDF4/_netCDF4.pyx”, line 1963, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b’/dbfs/tmpj7uo6edn/df2216c0-9ccb-4ade-8fb6-4054ac698433.nc’

3 Likes

I am having trouble with the file format as well

This could be related to this announcement. If so, this is a breaking change for me and likely many other users. I’ll add notes here when I find a fix. If anyone reading this already has, I’d be grateful if you could post it here

Forthcoming update to the format of netCDF files produced by the conversion of GRIB data on the CDS - Announcements / C3S - Announcements - Forum

I found that since then combined requests for aggregated sum and aggregated extreme values (like ssrd, tp, mn2t, mx2t) deliver a zipped netCDF, even when I requested with “download_format”=“unarchived”. After unzipping I was able to work with the file.
When I request ssrd, tp and mn2t, mx2t separately, the response is not zipped but a plain netCDF, as before.

This is happening to me as well. I’m giving the CDS API client a target= argument when I retrieve the dataset. It’s actually saving the file in ZIP format, that has three separate netCDF files, one for each stepType. Where it used to save a single netCDF file containing all the requested metrics.

Hi all,

Similar issues here with file formatting errors with post outage files.

Specifically, with files downloaded post outage, I get the following errors when trying to access those specific files in R or QGIS:

R: Error in R_nc4_open: NetCDF: Unknown file format

QGIS: Unsupported Data Source: study system\1990-2023\seperate\study_system_2008_5.nc is not a supported raster data source

Can ECMWF support please provide an update on whether there are post outage file corruption issues?

I’m guessing that this is not an expected change following the update to the format of the netCDF files since we’re having unknown file format issues across multiple different languages/tools. However, if this is an expected change, could you also let us know please so we can make plans to rectify?

Thanks in advance for your help

Hi,
please have a look at the following link: Forthcoming update to the format of netCDF files produced by the conversion of GRIB data on the CDS - #2 by Michela

Thanks

Hi Michela,

Thank you for your message. Long message so bolded the most important sections to save you time.

I’ve had a look at the .netCDF file changes post that you and Peter posted (thank you for sharing this). It’s still not clear whether the issues importing the .nc files across multiple programmes (Python, R, and QGIS) are an expected result from the change in conversion or whether it’s a file error issue caused by technical issues on the CDS? Could you confirm this?

If it is expected behaviour following the conversion, then I’m happy to invest the time to debug and figure out a solution myself, but given there are a number of issues (not your fault) that we’ve been having that have stemmed from technical problems on the CDS end, it would be good if you could confirm the bolded section explicitly so we can avoid trying to fix an issue that we can’t fix.

I appreciate that you’re super busy, but given the specific questions and that the forum link has been posted before and there are still people posting with this problem, could you give a more detailed response to the questions?

Thanks again, and all the best,

Hi all,

I have an update on this.

For me, and potentially others, an issue is that some of the helper packages we are using to download the data might still be processing the data and saving the api outputs with the .nc extension rather than a zip extension.

If you are getting the unknown file format message, try changing the extension to .zip to see if it is an extension issue. If that does seem to be a problem for you, it would be worth reaching out to the extension developer(s) to let them know.

1 Like