Hello everyone,
I am downloading quite a few datasets via the python api. Further processing requires me to merge those files into a larger dataset. It would be handy if I was able to do that straight with the information of the api call. Unfortunately with the river discharge variable there are some between the api request and file names. For instance the string that defines the rcp value has an additional underscore in the api request. Or quite often dashes are replaced by underscores. This makes automatically processing a lot of data quite tedious.
It would be nice if the strings in the api call have an exact match in the file name to allow for easy filtering. Best of course would be to enforce it, for future data uploads.
Depending on the source/provider of the datasets, the naming convention can be different. As far as I know this is an active choice. However, within a single dataset version the naming convention should be uniform.
The api & file names are not hard-set over the whole of CDS as the names would need to be gigantic to accomodate all options.
On the CDS system there is everything from reanalysis to projection model data, from climate to sector specific impact variables, and data is available on various aggregation levels in both space (model grid vs regional aggregates) and time (sub-hourly vs decadal). With the new CDS (and the merger of the ADS) the addition of measurements makes this even more complex.
…and also using both hyphens and underscores in parameters names makes very hard creating good and machine-readable filenames…
2 Likes