Why it is much slower downloading from 'reanalysis-era5-complete'?

Xiaobo_Yang · 19 June 2019 14:00

As title, when compared with downloading other ERA5 datasets from the CDS.

This is because data request for 'reanalysis-era5-complete' will retrieve data from ECMWF MARS, a tape based archiving system. While other ERA5 datasets on the CDS are hosted on the CDS itself, which is disk based. Notice the volume of ERA5 is so huge that it is possible to only copy over most popular data from ECMWF MARS to the CDS.

Mikhail_Pichugin · 20 June 2019 01:30

My data request for 'reanalysis-era5-complete' (I need temperature and humidity for 137 levels on 1 Jan 2015) is in the Queued status for 46 hours. Is this acceptable?

Xiaobo_Yang · 20 June 2019 08:14

Hi Mikhail,

Could you let me have your request ID or your CDS user ID so that I can have a detailed look at your request? You can contact us by sending an email to copernicus-support AT ecmwf DOT int.

Thank you,

Xiaobo

Michael_Shaw · 8 July 2019 15:29

Is it safe to assume the same is true for reanalysis-era5-single-levels? It takes on order days to download just a month of runoff, for example.

Xiaobo_Yang · 8 July 2019 15:36

Hi Michael,

It should not be. In fact we have received reports from several users and our technical team is looking into the problem now.

I'll keep you updated.

Thank you,

Xiaobo

Xiaobo_Yang · 8 July 2019 15:51

Just to confirm reanalysis-era5-single-levels data is stored on the CDS disks.

Michell_Fontenelle_G · 30 July 2019 14:46

I'm having the same problems... For instance, to download 1 day is taking sometimes several hours. The same problems are happening with single levels. Is it some kind of internal problem? Is there any kind of estimation when it'll be fixed?

Xiaobo_Yang · 30 July 2019 15:02

Michell, you should see better performance downloading single level ERA5 data. Let me know if that's not the case. Thank you.

Xiaolong_Liu · 12 August 2019 10:57

Hi Xiaobo,

I request the dataset 'reanalysis-era5-complete' recently. Normally it will cost about 90 minutes when I request one-month data. While my last request has timed out seriously. This is my request ID: Request ID: e5ff686a-d3cc-4004-8e07-4f3287ec9881 and my UID is 19396. It already lasts 25 hours showed in my requests list. Can you check for me?

Thank you.

Xiaobo_Yang · 12 August 2019 12:53

Hi Xiaolong,

You can now check status of our system by going to https://cds.climate.copernicus.eu/live/queue. If you log into the CDS, you should be able to see your requests with the URL. Let me know if this helps.

Kind regards,

Xiaobo

Xiaolong_Liu · 12 August 2019 13:05

Hi, it is still in process and already running more than 1day.

Xiaobo_Yang · 12 August 2019 13:46

Hi Xiaolong,

This depends on how busy our system is. For example, at the moment I type, I can see 30 requests are sharing the resources the same as 'reanalysis-era5-complete', with 473 requests queuing.

Whenever possible, you are recommended to download ERA5 data which is hosted on the CDS physically. This includes all ERA5 datasets you can see from the CDS catalogue. In general, if you do not need model level data, you should avoid requesting data from 'reanalysis-era5-complete'.

I hope this helps.

Xiaobo

Xiaolong_Liu · 12 August 2019 14:00

Hi Xiaobo,

Yeah, I know that it only costs a few minutes when downloading the single-level data. It seems what I can do is just waiting. Sorry, I posted a wrong Request ID.

Thanks for your response anyway.

Xiaolong

Xiaobo_Yang · 12 August 2019 14:32

No problem and thank you for your patience.

X_DW · 6 April 2022 13:31

Hi, Xiaobo

My requests have stayed here for several hours, And In this page the reason for queuing is Unable to establish. How can i make my request run again. Thank you.

Climate Data Store | Live (copernicus.eu)

And can I speed up my data retrieving by add more 'time' on my request per time. Since every time I add my 'time' to more than one, The error message will show:

Exception: the request you have submitted is not valid. Expected 1, got 137.; Request failed; Some errors reported (last error -1).

Xiaobo_Yang · 12 August 2019 13:50

BTW, I checked requestid 'e5ff686a-d3cc-4004-8e07-4f3287ec9881' and noticed it was completed.

Michael_Shaw · 11 December 2019 17:49

Hello.

I am still seeing that pulling hourly runoff from cds/Copernicus is potentially prohibitively slow. Perhaps one year of data per day. Yet it varies, and apparently at least in part as a function of user load/demand.

And, it occasionally hangs up, seemingly somewhat at random.

Please advise?

Here is a sample script (which appears to work just fine through sometimes a few months, then I get a timeout, sometimes through years, then I get a timeout…). I follow it with a sample error I have gotten after, e.g., a timeout.

"import cdsapi

c = cdsapi.Client()

def is_leap_year(year):

return (year % 4 == 0) and (year % 100 != 0) or (year % 400 == 0)

def days_in_month(month):

if month == 1 or month == 3 or month == 5 or month == 7 or month == 8 or month == 10 or month == 12:

return 31

elif month == 2:

if is_leap_year(year):

return 29

else:

return 28

else:

return 30

for year in range(1979,2020):

for mon in range(1,13):

for day in range(1,days_in_month(mon)+1):

if (day < 10):

theday="0"+str(day)

else:

theday=str(day)

if (mon < 10):

themon="0"+str(mon)

else:

themon=str(mon)

for time in range(0,24):

if (time < 10):

thetime="0"+str(time)

else:

thetime=str(time)

print("Year, month, day, time: ",str(year),themon,theday,thetime)

c.retrieve("reanalysis-era5-single-levels", {

"product_type": "reanalysis",

"format": "netcdf",

"variable": "runoff",

"year": str(year),

"month": themon,

"day": theday,

"time": thetime

}, "output."+str(year)+str(themon)+str(theday)+str(thetime)+".nc")"

Error:

“requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cds.climate.copernicus.eu', port=443): Read timed out. (read timeout=60)”

Sorry if I'm doing something wrong! Thanks much in advance for any advice.

Best,

Michael

Michael_Shaw · 12 December 2019 18:50

This appears to now be working without any issues, now. Wondering if you made a change on your end? Whereas yesterday when I submitted multiple instances (to get multiple temporal chunks simultaneously) it seemed to slow each instance down (potentially to a timeout and crash of scripts), seems now that all instances are running along at ok rates, now.

Thanks for whatever you may have done on your end...and for your responsiveness either way!

Michael_Shaw · 13 December 2019 20:51

As it turns out, I'm still seeing arbitrary snags, timeouts, and crashes of long term data pulls.

Anyone have a solution to pull the whole hourly archive of ERA5 runoff which doesn't timeout, drop files, etc?

Thanks very much in advance.

Xiaobo_Yang · 16 December 2019 09:07

Hi Michael,

I'll have a look and then get back to you - unfortunately this will take a while.

Kind regards,

Xiaobo