New CDS-Beta very slow

I am finding that although the new CDS-Beta works, it has been extraordinarily slow, taking many hours just to download a single variable on a single level.

9 Likes

Same issue for me when using the beta API.

Strangely (annoyingly?) I find that when downloading the same using the Web download page those jobs are run pretty much instantly and jump the jobs in the queue from my API requests

1 Like

I’m also experiencing this issue and it is very annoying. Given that I tried to migrate my scripts before the shutdown of the old CDS the slow performance is hardly acceptable. The old version of CDS that I used till this morning was way more performant. So I’m most likely switching back to the old version as long as it’s life.

I can understand that ECMWF might not have the computational resources to fuel the beta and the old version at the same time but if the performance of beta is so terrible that ECMWF and CDS should not encourage their users to switch…

Any official reply from CDS/ ECMWF?

That’s indeed interesting - I really hope their will be an official statement soon showing how to continue. Downloading datasets via the web portal won’t be an option for me as our workflows are designed to be pure M2M communication…

Hi Lukas,

Same for us - M2M, so web portal not an option.

It is a little frustrating that web portal requests are so quick yet API on beta is so slow.

Being able to run just one or two tests per working day due to jobs queuing for hours doesn’t help.

It would be useful to know why sometimes API requests are fast(ish), yet other times so slow and whether it’s just a transitioning problem.

I am having the same issue. I am wondering how the API is so much slower than the requests submitted through the webform.

Even when I switch back to my old CDS setup, requests get automatically transferred to CDS-Beta and it takes more than an hour to receive a file of four variables for 5x5 grid cells.

I am wondering what the workaround for people is right now.

Any updates from ECWMF on this issue?

Today, I had to wait 2.5 hours (!) for a request that had 6 MB - compared to a few seconds on the legacy CDS… Is this gonna be the new normal??? In that case, CDS seems no longer an option for climate data access

1 Like

I am having the exact same issue. I submitted a help request but have not heard back yet.

Same issue here… Queuing time of more than 8h in some cases.

Hi All,

Here’s a reply i received from the help desk after some back and forth the last couple days. They are looking into it and, hopefully, something will be resolved soon.

~Ben

Dear Ben,
Thanks for you prompt reply and for all your inputs (even when unfortunately not positive).
We are working on tuning and improving the performance of the new system in multiple fronts.
There are features in the pipeline that should help to gradually improve that to be released in the coming days.
Your case is helping us quite a lot to be used as indicator and tracker.
Will keep you informed.
Regards,
Angel

1 Like

Just to add as of mid-November, this is issue is now a LOT worse… Jobs queueing for 10 hours that would take 2-8 minutes on the first gen platform. At the moment the server is essentially unusable.

4 Likes

The management and incompetence around this upgrade is shocking me every day more. I am really shocked, even more if I think of the huge waste of public money on the ECMWF. There is no decency in this, if they had any, they would all step-down.

2 Likes

I am having the same issues as Adrian_Mark_Tompkins and MArco_Bacci2, as my jobs are often sitting for hours without any sign of progress. I would really appreciate an update form the ecmwf team on the status of the system, and when it may come back online.

If it still exists and is working, could someone point me to the web platform mentioned previously in this thread for downloading data? Thanks

Hello,

I am very sorry to hear that the quality of the service has degraded so much to make your work problematic. I take your feedbacks extremely seriously as the main goal of the service is to serve our users. That’s why I have decided to provide some information about what is happening.

The old CDS represented a giant leap forward in our way of serving the users of climate data. In my mind it was one of the highlights of the whole service and a critical component of its success. The old CDS though was based on the technology that was available at the beginning of the last decade and this, jointly with the exponential increase in user demand made part of the system sub-optimal.

The technical depth, the obsolescence of component and the lack of backward compatibility are issues common to all technological systems and the CDS was no different. Furthermore, the original setup had to be patched and adjusted several times to keep it performing at a top level in a rapidly changing technological landscape.

It may be hard to appreciate this as a user but the back-end of the old CDS was a complex combination of scripts, platforms, codes and solutions which was gradually becoming more and more challenging to manage and maintain efficiently. This eventually triggered the decision to modernise the CDS and migrate to the new platform.

We might have been overambitious as we decided to change also, for efficiency reasons, the hardware underpinning the CDS. This has possibly contributed to make the transition from one system to the other less smooth and inconsequential than we had originally thought. But we are fully aware that from the users’ point of view, performance is key, no matter which technologies are used.

I would like to offer my most sincere apologies to you and to all other users who have been affected.

Your feedback and those of other users are giving us a good understanding of the scale of the disruption and we are very sorry for the inconvenience caused. We are working around the clock to enhance the performance of the system so that we can continue providing quality-controlled climate data in an open and free manner.

The good news is that things are already starting to improve with queuing time going down and throughput going steadily up in recent days. Hopefully you will shortly be able to notice the improvement in the performances yourself.

Thank you for very much for your patience and understanding.

Carlo Buontempo
Director of C3S.

5 Likes

We do not question the need to change the software and perhaps the infrastructure. We, at least I, question how you handled the process. It is unacceptable that the ECMWF missed completely the most basic rules on how to plan a complex, large-scale project.

  • You did not take a phased approach via small-scale tests, or at least not enough, wanting to rush for the new solution, we have now a full-scale system that does not work, and you cannot fix it in a reasonable amount of time, because you did not take the upgrade in small steps.
  • You do not have a fall-back plan, meaning, the new API has not worked for months now, and you cannot go back to the previous one, because you never thought of a careful mitigation plan with redundancies.
  • If you did not have the budget or time to follow those basic steps, you should not have embarked in this project.

Risk-mitigation plans are due for any large-scale project, requests for funds at the European level, etc. When one applies for a Marie Curie or ERC grant nowadays, one must have a plan to mitigate risks. This is basic practice for the management of any project, and you completely failed it, causing a lot of distress in users and businesses, which have not been safeguarded at all.

4 Likes

Dear Marco,

I am really sorry to hear that the CDS is failing you at the moment.

Although, as you probably know, C3S made no commitment on the quality of service provided, we are very happy to see commercial entities such as yours building on the back of the data and tools we made freely available.

I can reassure you that all the available resources are being allocated to this task and we do hope we will soon reach and exceed the performance we had in the previous CDS.

Thank you for pointing out to us common risk management practice. ECMWF who has produced NRT global weather predictions for the last 50 years without any significant delay or interruption, is not new to those. Things may not have gone to your satisfaction on this occasion but having maintaining a throughput of ~85 Tbytes per day through the discontinuity is not bad either.

Don’t take me wrong, we are not at all happy with the current performances of the system but your post portraits us in a way that is not fair either. We are working around the clock to improve on the performance and we are keen to hear from you on the specifics of what is not working and what can be improved. That’s what the forum is for.

I would really appreciate if you could be considerate and constructive with your contributions to the forum. I fear that ranting publicly about our incompetence will do little to help us improve or help other users.

Hopefully our actions will soon convince you that we are not nearly as naĂŻve or incapable as your recent posts seems to imply.

Best Regards,
Carlo

4 Likes

Dear Carlo,

If you were aware of good risk-management practices, then you failed to put them in place.

Unfortunately, whatever fix might await in the future, it is 9 months that the system hasn’t been working properly. As such, a claim of very poor management and decision making cannot be withdrawn for this project, irrespective of future fixes, which we all hope will be risolutive very soon.

Just as things did seem to be improving, we are back to excessive queue times again this morning.

3 Likes

Same issue for me. CDS performance was ok in the last weeks. CDS website shows a warning:

[Warning] 26 Nov 2024: System is experiencing performance issues. Please check updated status here.

I cannot even log in to the servers and I am getting an error message saying “internal server error Error id: (string of numbers and letters)” or I get rerouted to a page with single button called “Keycloack”. I started to try to think about my requests so much in advance, whereas before I used to download the data when I needed them. It has been quite a challenge lately.

3 Likes