How to efficiently download more than one year of CERA-20C daily data?

sebastiano_piccolroa · 10 October 2019 13:45

I would like to download ensemble mean, daily data from the CERA-20C reanalysis product in a fast and efficient way.

In the following page:

CERA-20C Atmospheric model, daily data (enda) retrieval efficiency#20CAtmosphericmodel,dailydata(enda)retrievalefficiency-Requestingensemblemean(ep),multipleyears,surface(sfc)

an efficient script to download such type of data is provided. However, when I run the script (for a small area, not the entire globe) only one month at a time is downloaded from the server, and the downloading of the next month is queued. The resulting nc files are small, because I am interested on a small area and few parameters, thus also the corresponding extraction and downloading times are short. However the queue time is around 1 hour. Since I need to download 110*12=1320 months, this procedure will take weeks instead of few minutes (seconds) if the data would be downloaded all at one.

I wonder if there exist a procedure to download all the data at once.

Thank you.

Sebastiano

Michela · 10 October 2019 15:45

Hi,

maybe you could try to request data group by year or try to get grib files and convert them in nc files locally.

Regards

Michela

sebastiano_piccolroa · 10 October 2019 15:53

Thank you for the reply.

I suspect that CERA-20C can be downloaded only month by month. At least this is the case when using the browser for downloading the data, but it is also the case if I define "date": "19010101/TO/20101231" in my python script. In this case, in fact, multiple requests are sent in series, month by month.

Do you think that defining "format": "grib" may change something?

Bests,

Sebastiano

Michela · 11 October 2019 07:44

Hi,

at least you save the time of the conversion from grib to nc file.

Regards

Michela

sebastiano_piccolroa · 11 October 2019 08:13

Thank you Michela,

the problem is not processing time, rather the queue time. The processing (extraction, possible conversion, transfer, and downloading) takes few seconds.

The main issue is being able to combine more months (possibly years) in the same request in order to reduce the queue time (hours).

Regards,

Sebastiano

Anabelle · 11 October 2019 09:09

CERA-20C data is stored on tapes in MARS (ECMWF archive). Each tape contains one month worth of data. The most efficient way of retrieving the data is to retrieve everything you need from one tape at a time - in this case this means one month at a time (you can loop through the months if you wish). If you try submitting a script to retrieve more than one month at a time, you will end up at the bottom of the queue, possibly facing days before data is retrieved. Depending on workload, your request may even get cancelled as inefficient requests affect the overall performance of the system for all users.

sebastiano_piccolroa · 11 October 2019 09:18

Thank you Anabelle. Hence, I suppose that the script suggested in the web-page that I linked above is the most efficient. Still, it undergoes (~1 hour) queuing from one month request to the following one. However, if this is the most efficient procedure I'll use it.

Bests,

Sebastiano

Anabelle · 11 October 2019 09:38

No problem, Sebastiano,

Indeed, the script on the web-page is the most efficient

Queueing for ~1 hour on MARS is very good going It may go a little bit faster during the weekend when activity slows down some.

Anabelle

Nilanjan_Debsharma · 24 August 2023 05:13

While downloading the datasets through the given script it is saying that dataset has been phased out.How to download CERA-20C in 2023 through script?