We need ERA-5 hourly data (profiles and single levels) along a sub-satellite orbit to support the retrieval of total column water vapour from a nadir observing microwave radiometer.
Our current practice is to download global datasets and extract the required sub-satellite information locally at our workstations. This is obviously not very efficient. We are now wondering if there is a best practice to download data only for the ERA-5 grid cells overflown by a particular satellite.
A first step in this direction could be to divide the satellite orbit into subsections and only download the data within the corresponding bounding boxes for the corresponding time frame.
Ideally, the API would accept a list of [lon, lat, time] tupels and return the requested parameters only for those.
I couldn’t find any information about such an approach in the knowledgebase. Is there a way to do this or are there any plans to extend the API to do this?
Would be curious about a good solution for this too.
In my current use case, I make single requests for each point location (often further split into many requests of 1 or few months due to the limit of items per request). This generally works, but it takes very long queuing times for many point locations, for which I need hourly variables over many years.
In my R implementation of API I have a batch mode to address this issue (and my need to query many locations for mostly ornithology related work). But I assume you are working in python.
Anyway, the python fix isn’t difficult, as the requests are JSON lists which can be easily wrapped. A scheduler might be needed not to blow past the rate limitations. Have a look at the R code, it should translate rather well. If you implement something in python consider doing it for the long haul and pushing it back (as a pull request).
Thank you for your hints! May I ask, whether you somehow circumvent the bottleneck of items per request, and if so how?
I have (Python) code for putting all the needed requests to the API. It iterates through the locations and requests chunks of one or few months one after the other - which is needed because otherwise the maximum number of items per request gets violated. (And then more code for further processing the hourly data…)
However, the long queuing times make the whole procedure very slow for a couple of locations and a couple of years needed.
Is your “batch mode” similar or does it work in a different way?
In the past I noticed a rate limited queue of ~10 - 20 downloads at a time. To the lower end now I think. In batch mode in R I would queue up 10 items at a time max not to bog down the system for other users and not run afoul of user based limits (don’t think you get black listed but downloads will just be slow). I think this was mentioned somewhere in the docs at some point - don’t ask me where and I’ll defer to those administering the API for this.
Thank you for this swift response! I always had the feeling that parallel requests (with the same user account) will be queued one after the other anyways, but will explore this again. Cheers.