Systems in seasonal forecasts

Imanol_Uriarte_Latorre · 26 February 2025 11:03

I’m downloading the Seasonal forecast daily and subdaily data on single levels (Seasonal forecast daily and subdaily data on single levels) and I want to use the ECMWF forecasts.

Currently the only system that provides data it is system 51, so i downloaded this one.

As I understand from the documentation, these systems are simply different methods of processing the data:

https://confluence.ecmwf.int/display/CKB/Detailed+list+of+parameters
https://confluence.ecmwf.int/display/CKB/Description+of+the+C3S+seasonal+multi-system
https://confluence.ecmwf.int/display/CKB/Description+of+SEAS5-v20171101+C3S+contribution
Announcements - Copernicus Knowledge Base - ECMWF Confluence Wiki (Check 12th October 2022)

But when I download the data, the system is a dimension, and for a single pixel and timestep, I have 51 different values.

Why is so? Are they 51 different simulations? Should I just simply average over that dimension?

I don’t find any information regarding this.

EDIT: For the sake of testing I downloaded data for system 4, and i just realized that it also includes 51 values. Turns out, this is another bug. No matter which system you select in the download page, it downloads all.

Eduardo_Penabad · 26 February 2025 17:08

Hello Imanol!

a quick read about what is “ensemble forecasting” might be helpful to disentangle some of the elements that are puzzling you.

For instance you can read here:

a quick fact sheet here: https://www.ecmwf.int/sites/default/files/medialibrary/2017-03/ecmwf-fact-sheet-ensemble-forecasting.pdf
a bit more detailed explanation about the rational of ensemble forecasting across timescales in the ECMWF systems here: Section 5 Forecast Ensemble (ENS) - Rationale and Construction - Forecast User Guide - ECMWF Confluence Wiki
and a bit of introductory explanation on the specifics of seasonal forecasting and C3S here: Seasonal forecasts and the Copernicus Climate Change Service - Copernicus Knowledge Base - ECMWF Confluence Wiki

But going to your questions/comments, there is absolutetly no bug in the behaviour you described and the data you are looking at is the expected outcome of your data requests.

Regarding “system”. It is the way used to identify a “version” of a forecast system (that is not only the set of models -atmosphere, ocean, sea-ice, etc-, but also the way initial conditions are created or the way uncertainty is sampled with ensemble members). Usually, at a given moment in time, for a given producing centre, only one system is run routinely. You can find in the “real-time” table of the “Summary of available data” details about when a particular system started real-time forecast production.
The keyword “system” as used here is simply a label to identify the different versions and, for instance, as of February 2025, the operational forecasting system from ECMWF is SEAS5 (system=51), from MetOffice is GloSea6 (system=603), from CMCC is SPS3.5 (system=35), from Météo-France is System8 (system=8), from Enviroment and Climate Change Canada are CanESM5.1p1bc (system=4) and GEM5.2-NEMO (system=5), etc
The case of ECMWF system=51 is a special one, as this is not a different forecast system than system=5, it is simply a different labelling of ECMWF SEAS5 (system=5) data used for the current data provision of ECMWF SEAS5 to the C3S activity. You can find more details about this in footnote (5) of the table mentioned above.
As you would have realised by now, after having read the information about ensemble forecasting, and the description of what “system” means, the data you retrieved from the CDS were all the available ensemble members for that specific forecast system and start date. And yes, the bit about start date is relevant here because typically, for a given forecast system, start dates in the hindcast (or reforecast) period have fewer members than for real-time forecasts.

I hope all that sounds helpful.

Regards,
Edu

Ignacio_Saldivia_Gonzatti · 13 May 2025 15:43

Hi @Eduardo_Penabad

Could you clarify whether the 25 ensemble members used in the hindcasts are exactly the first 25 members from the operational forecasts, or if each member has its own unique ID? I’m planning an analysis covering 1995–2024 and need to ensure I’m tracking the same members throughout. Thanks

Eduardo_Penabad · 13 May 2025 16:02

Hi @Ignacio_Saldivia_Gonzatti

what do you mean by “the same members”?
In other words, what features do you need to check to “ensure you are tracking the same members”?

Regards!

Ignacio_Saldivia_Gonzatti · 14 May 2025 10:01

Thank you, @Eduardo_Penabad

As mentioned in another reply, “start dates in the hindcast (or reforecast) period have fewer members than for real-time forecasts.”

If I understand correctly, each member starts with a small perturbation and evolves through the forecast period in slightly different ways than the unperturbed control member.

Since the hindcasts have fewer members, I am wondering how I can combine data from the hindcast runs and the real-time forecasts to conduct probabilistic analyses, calculate an ensemble mean, etc.

In the seas5 files I downloaded, there is a number variable that runs from 1 to N. Can I be 100% sure that number = 1…25 in the hindcast correspond to the same perturbation seeds as number = 1…25 in the real forecasts?

Or are these seeds completely different? Could you point me to any other metadata I should use to match up member n between hindcast and forecast?

That’ll let me merge the two datasets over 1995–2024.

I hope this is clearer.

PS: I see that in the CDS form, steps only go up to 5160 hours (7 months). How can we access the 13-month forecasts for 1st February, 1st May, 1st August and 1st November then? I am sorry if I missed something obvious.

Many thanks,
Ignacio

Eduardo_Penabad · 14 May 2025 11:00

Hi @Ignacio_Saldivia_Gonzatti

I need to first start mentioning that each forecast system have different approaches to mapping the different types of uncertainties in both the initial conditions and those coming from the model run. That is to say that some of the specific elements I will describe below are related to ECMWF SEAS5 but they might or might not apply to other forecast systems available on the CDS.

As I see it there are at least a few elements that make it hard to give a straightforward answer to the question if member N is “the same” for different start dates:

The first one is the distinction between the hindcasts (or reforecasts) and the real-time forecasts. Even though they are run with the same version of the model(s) there are differences in the way initial conditions are created as described, in this case, in the SEAS5 documentation (C3S contribution here, ECMWF SEAS5 User Guide here)
Additionally, for SEAS5 stochastic perturbations (SPPT and SPBS) are applied throughout the model run to all the members. So even the “unperturbed” member it is unperturbed at the initial conditions, during the model run it is perturbed in an equivalent way to any other member.
And finally, part of the perturbations applied to the initial conditions (singular vectors) are situation dependent, meaning that they are designed to produce the perturbations that would lead to their maximum growth.

Ensemble members are expected to be equally likely (and thus, interchangeable) different versions of the forecast fields, and therefefore there are very little guarantees that there is any relation betwen member N for a start date and another one having the same label N for a different start date. In summary, to what extent they can be considered “the same” it is highly dependent on what “the same” means for your specific application.

I hope that information sounds useful.

Ignacio_Saldivia_Gonzatti · 20 May 2025 17:08

Many thanks for your helpful answer, @Eduardo_Penabad.

This is fine as I am using SEAS5 only.

I now understand better why it is not possible to match the N members for different start dates.

My two follow-up questions then are:

I’m planning to force a crop model with SEAS5 and then evaluate its probabilistic skill. Because the hindcasts switch from 25 members (pre-2017) to 51 members (post-2016), do you have any advice on handling the change in ensemble size? I can either: randomly subsample 25 members from each 51-member forecast after 2016 so that I always work with 25 members and maintain a consistent sample size OR
use all available members (25 before 2017, 51 thereafter) and then apply any necessary corrections or normalisation for the differing ensemble sizes?
I see that in the CDS form, steps only go up to 5160 hours (7 months). How can we access the 13-month forecasts for 1st February, 1st May, 1st August and 1st November then? I am sorry if I missed something obvious.

Many thanks.
Kind regards,
Ignacio

Ignacio_Saldivia_Gonzatti · 11 June 2025 09:25

Dear @Eduardo_Penabad,

I am following up on this since I haven’t found an answer to my second question. For the first one, I’ve concluded that it’s better to subsample.

So, my only question is: How can we access the 13-month forecasts for 1st February, 1st May, 1st August and 1st November then?

Many thanks,
Ignacio

Eduardo_Penabad · 11 June 2025 10:02

Hi again @Ignacio_Saldivia_Gonzatti !

for your first question, subsampling 25 members seems to be the appropriate thing to do for your use case. I would simply add that you might not need to go into the additional complication of pooling them at random and it should work fine for you to simply get the first 25 members.

regarding the second one, the longer (13-month) forecasts from SEAS5 are not part of the ECMWF contribution to C3S and therefore they are not available via the CDS.

Eduardo_Penabad · 11 June 2025 16:34

Just a follow-up on the last bit as I have been today collecting some information to complement what I said earlier on the availability of the SEAS5 13-month integrations.

The not-availability via C3S/CDS still holds, this is additional information about other potential ways to access those longer integrations directly from ECMWF.

During a long time, due to some more or less convoluted technical reasons, it was not technically feasible to make that data available to users without direct access to MARS. But thanks to other developments happening at the time at ECMWF the good news is those issues are expected to be soon (1-2 months time) solved. So, in the best case scenario, at some point around or after NH summer, that data could be accessed from the ECMWF datasets catalogue
(Operational archive | ECMWF)
The worst case scenario, where the abovementioned fixes are not implemented as quickly as currently expected, would imply having to wait until the next seasonal forecast system from ECMWF, SEAS6, will be in production. This is something expected to happen in the coming months (end of 2025/beginning of 2026) but exact dates are still subject to changes. Once SEAS6 is the operational seasonal forecast system, it will bring changes in the setup of forecasts lengths with a set of 6-month, 13-month and 24-month-long runs, but they should be all available in the corresponding datasets from the start of SEAS6 operations.
- NOTE: Availability of those longer forecasts from SEAS6 via the CDS is yet to be discussed

For more details on those incoming changes please keep an eye in the usual ECMWF channels.