How can I tell if data for my request is coming from a previous request in the CDS Cache?

, ,

When a request is submitted to the CDS, the request and the output data are stored in the CDS cache. Sometimes, this can lead to unexpected behavior when a user submits a request to the CDS.

If you (or another user) submit an identical request, then if the data are still in the cache, this copy is returned -i.e the data are not re - extracted.

For most datasets this is not an issue, HOWEVER for datasets which are updated on a daily basis (such as ERA5t), this can mean that the latest data may not be returned, and it may appear that data are missing.

Hence,it is useful to be able to identify where the data are being downloaded from - a 'fresh' request, or as the result of a previous query.

It is  possible to get this information from the API output.

You need to set quiet=False and debug=True in your request, e.g.

c = cdsapi.Client(quiet=False, debug=True)

With these options on, you can then inspect the output the first time a request is submitted:

just before "Request is completed" you see something like:

2020-08-25 09:34:55,067 DEBUG REPLY {'state': 'completed', 'request_id': 'ddfaa12f-1474-4c04-9178-c64e6b9c9625', 'location': '', 'content_length': 4153200, 'content_type': 'application/x-grib', 'sent_to_rmq_at': '2020-08-25T08:34:52.539Z'}

Note that the request ID ("ddfaa...")  is reported and that it's repeated in the cache file 'location' (at the end).

If the same request is then re-submitted, before "Request is completed"  you would see:

2020-08-25 09:37:28,301 DEBUG REPLY {'state': 'completed', 'request_id': '175b9c14-5d53-41c9-9b84-40a57502c4ef', 'location': '', 'content_length': 4153200, 'content_type': 'application/x-grib', 'result_provided_by': 'ddfaa12f-1474-4c04-9178-c64e6b9c9625'}

Again the request ID is reported but this time it doesn't match the one in the location field, which is the ID from the previous request because it's re-using that cache file. And just to make it clear, there's an additional "result_provided_by" field that explicitly tells you that the result for this request is being provided by a different, cached one.

Thanks to Luke Jones (ECMWF) for his help with the investigation of this issue.