Why it is much slower downloading from 'reanalysis-era5-complete'?

As title, when compared with downloading other ERA5 datasets from the CDS.

This is because data request for 'reanalysis-era5-complete'  will retrieve data from ECMWF MARS, a tape based archiving system. While other ERA5 datasets on the CDS are hosted on the CDS itself, which is disk based. Notice the volume of ERA5 is so huge that it is possible to only copy over most popular data from ECMWF MARS to the CDS.

My data request for 'reanalysis-era5-complete' (I need temperature and humidity for 137 levels on 1 Jan 2015) is in the Queued status for 46 hours. Is this acceptable?

Hi Mikhail,

Could you let me have your request ID or your CDS user ID so that I can have a detailed look at your request? You can contact us by sending an email to copernicus-support AT ecmwf DOT int.

Thank you,

Xiaobo

Is it safe to assume the same is true for reanalysis-era5-single-levels?  It takes on order days to download just a month of runoff, for example.

Hi Michael,

It should not be. In fact we have received reports from several users and our technical team is looking into the problem now.

I'll keep you updated.

Thank you,

Xiaobo

Just to confirm reanalysis-era5-single-levels data is stored on the CDS disks.

I'm having the same problems... For instance, to download 1 day is taking sometimes several hours. The same problems are happening with single levels. Is it some kind of internal problem? Is there any kind of estimation when it'll be fixed?

Michell, you should see better performance downloading single level ERA5 data. Let me know if that's not the case. Thank you.

Hi Xiaobo,

I request the dataset 'reanalysis-era5-complete' recently. Normally it will cost about 90 minutes when I request one-month data. While my last request has timed out seriously. This is my request ID: Request ID: e5ff686a-d3cc-4004-8e07-4f3287ec9881 and my UID is 19396. It already lasts 25 hours showed in my requests list.  Can you check for me?

Thank you. 

Hi Xiaolong,

You can now check status of our system by going to https://cds.climate.copernicus.eu/live/queue. If you log into the CDS, you should be able to see your requests with the URL. Let me know if this helps.

Kind regards,

Xiaobo

Hi, it is still in process and already running more than 1day.

Hi Xiaolong,

This depends on how busy our system is. For example, at the moment I type, I can see 30 requests are sharing the resources the same as 'reanalysis-era5-complete', with 473 requests queuing.

Whenever possible, you are recommended to download ERA5 data which is hosted on the CDS physically. This includes all ERA5 datasets you can see from the CDS catalogue. In general, if you do not need model level data, you should avoid requesting data from 'reanalysis-era5-complete'.

I hope this helps.

Xiaobo

Hi Xiaobo,

Yeah, I know that it only costs a few minutes when downloading the single-level data. It seems what I can do is just waiting. Sorry, I posted a wrong Request ID. 

Thanks for your response anyway.

Xiaolong

No problem and thank you for your patience.

Hi, Xiaobo

My requests have stayed here for several hours, And In this page the reason for queuing is Unable to establish. How can i make my request run again. Thank you.

Climate Data Store | Live (copernicus.eu)

And can I speed up my data retrieving by add more 'time' on my request per time. Since every time I add my 'time' to more than one, The error message will show:

Exception: the request you have submitted is not valid. Expected 1, got 137.; Request failed; Some errors reported (last error -1).


BTW, I checked requestid 'e5ff686a-d3cc-4004-8e07-4f3287ec9881' and noticed it was completed.

Hello.

I am still seeing that pulling hourly runoff from cds/Copernicus is potentially prohibitively slow.  Perhaps one year of data per day.  Yet it varies, and apparently at least in part as a function of user load/demand.

And, it occasionally hangs up, seemingly somewhat at random.

Please advise?

Here is a sample script (which appears to work just fine through sometimes a few months, then I get a timeout, sometimes through years, then I get a timeout…).  I follow it with a sample error I have gotten after, e.g., a timeout.


"import cdsapi

c = cdsapi.Client()


def is_leap_year(year):

    return (year % 4 == 0) and (year % 100 != 0) or (year % 400 == 0)


def days_in_month(month):

    if month == 1 or month == 3 or month == 5 or month == 7 or month == 8 or month == 10 or month == 12:

        return 31

    elif month == 2:

        if is_leap_year(year):

            return 29

        else:

            return 28

    else:

        return 30


for year in range(1979,2020):

   for mon in range(1,13):

      for day in range(1,days_in_month(mon)+1):

        if (day < 10):

           theday="0"+str(day)

        else:

           theday=str(day)

        if (mon < 10):

           themon="0"+str(mon)

        else:

           themon=str(mon)

        for time in range(0,24):

           if (time < 10):

              thetime="0"+str(time)

           else:

              thetime=str(time)

           print("Year, month, day, time: ",str(year),themon,theday,thetime)

           c.retrieve("reanalysis-era5-single-levels", {

              "product_type":   "reanalysis",

              "format":         "netcdf",

              "variable":       "runoff",

              "year":           str(year),

              "month":          themon,

              "day":            theday,

              "time":           thetime

              }, "output."+str(year)+str(themon)+str(theday)+str(thetime)+".nc")"


Error:

“requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='cds.climate.copernicus.eu', port=443): Read timed out. (read timeout=60)”


Sorry if I'm doing something wrong!  Thanks much in advance for any advice.

Best,

Michael

This appears to now be working without any issues, now.  Wondering if you made a change on your end?  Whereas yesterday when I submitted multiple instances (to get multiple temporal chunks simultaneously) it seemed to slow each instance down (potentially to a timeout and crash of scripts), seems  now that all instances are running along at ok rates, now.

Thanks for whatever you may have done on your end...and for your responsiveness either way!

As it turns out, I'm still seeing arbitrary snags, timeouts, and crashes of long term data pulls.

Anyone have a solution to pull the whole hourly archive of ERA5 runoff which doesn't timeout, drop files, etc?

Thanks very much in advance.

Hi Michael,

I'll have a look and then get back to you - unfortunately this will take a while.

Kind regards,

Xiaobo