Download for a list of geographical positions

Frank_Fell · 12 November 2024 11:56

We need ERA-5 hourly data (profiles and single levels) along a sub-satellite orbit to support the retrieval of total column water vapour from a nadir observing microwave radiometer.

Our current practice is to download global datasets and extract the required sub-satellite information locally at our workstations. This is obviously not very efficient. We are now wondering if there is a best practice to download data only for the ERA-5 grid cells overflown by a particular satellite.

A first step in this direction could be to divide the satellite orbit into subsections and only download the data within the corresponding bounding boxes for the corresponding time frame.

Ideally, the API would accept a list of [lon, lat, time] tupels and return the requested parameters only for those.

I couldn’t find any information about such an approach in the knowledgebase. Is there a way to do this or are there any plans to extend the API to do this?

Kind regards,
Frank

T_B · 13 November 2024 11:01

Would be curious about a good solution for this too.

In my current use case, I make single requests for each point location (often further split into many requests of 1 or few months due to the limit of items per request). This generally works, but it takes very long queuing times for many point locations, for which I need hourly variables over many years.

Matteo_De_Felice12 · 13 November 2024 11:52

You should give a look at the Earth Data Hub on the DestinE platform: Earth Data Hub

Frank_Fell · 13 November 2024 13:01

Good hint! The ERA5 subset offered on the Earth Data Hub will be sufficient for many, especially for the land surface community.

In our particular case, the Earth Data Hub does not provide some of the ERA5 parameters we need for our purposes, and refers us back to the CDS…

Koen_Hufkens · 14 November 2024 08:07

In my R implementation of API I have a batch mode to address this issue (and my need to query many locations for mostly ornithology related work). But I assume you are working in python.

Anyway, the python fix isn’t difficult, as the requests are JSON lists which can be easily wrapped. A scheduler might be needed not to blow past the rate limitations. Have a look at the R code, it should translate rather well. If you implement something in python consider doing it for the long haul and pushing it back (as a pull request).

T_B · 14 November 2024 13:21

Thank you for your hints! May I ask, whether you somehow circumvent the bottleneck of items per request, and if so how?

I have (Python) code for putting all the needed requests to the API. It iterates through the locations and requests chunks of one or few months one after the other - which is needed because otherwise the maximum number of items per request gets violated. (And then more code for further processing the hourly data…)

However, the long queuing times make the whole procedure very slow for a couple of locations and a couple of years needed.

Is your “batch mode” similar or does it work in a different way?

Cheers

Koen_Hufkens · 14 November 2024 16:38

In the past I noticed a rate limited queue of ~10 - 20 downloads at a time. To the lower end now I think. In batch mode in R I would queue up 10 items at a time max not to bog down the system for other users and not run afoul of user based limits (don’t think you get black listed but downloads will just be slow). I think this was mentioned somewhere in the docs at some point - don’t ask me where and I’ll defer to those administering the API for this.

T_B · 14 November 2024 17:12

Thank you for this swift response! I always had the feeling that parallel requests (with the same user account) will be queued one after the other anyways, but will explore this again. Cheers.