I’m downloading the Seasonal forecast daily and subdaily data on single levels (Seasonal forecast daily and subdaily data on single levels) and I want to use the ECMWF forecasts.
Currently the only system that provides data it is system 51, so i downloaded this one.
As I understand from the documentation, these systems are simply different methods of processing the data:
https://confluence.ecmwf.int/display/CKB/Detailed+list+of+parameters
https://confluence.ecmwf.int/display/CKB/Description+of+the+C3S+seasonal+multi-system
https://confluence.ecmwf.int/display/CKB/Description+of+SEAS5-v20171101+C3S+contribution
Announcements - Copernicus Knowledge Base - ECMWF Confluence Wiki (Check 12th October 2022)
But when I download the data, the system is a dimension, and for a single pixel and timestep, I have 51 different values.
Why is so? Are they 51 different simulations? Should I just simply average over that dimension?
I don’t find any information regarding this.
EDIT: For the sake of testing I downloaded data for system 4, and i just realized that it also includes 51 values. Turns out, this is another bug. No matter which system you select in the download page, it downloads all.
Hello Imanol!
a quick read about what is “ensemble forecasting” might be helpful to disentangle some of the elements that are puzzling you.
For instance you can read here:
But going to your questions/comments, there is absolutetly no bug in the behaviour you described and the data you are looking at is the expected outcome of your data requests.
-
Regarding “system”. It is the way used to identify a “version” of a forecast system (that is not only the set of models -atmosphere, ocean, sea-ice, etc-, but also the way initial conditions are created or the way uncertainty is sampled with ensemble members). Usually, at a given moment in time, for a given producing centre, only one system is run routinely. You can find in the “real-time” table of the “Summary of available data” details about when a particular system started real-time forecast production.
The keyword “system” as used here is simply a label to identify the different versions and, for instance, as of February 2025, the operational forecasting system from ECMWF is SEAS5 (system=51), from MetOffice is GloSea6 (system=603), from CMCC is SPS3.5 (system=35), from Météo-France is System8 (system=8), from Enviroment and Climate Change Canada are CanESM5.1p1bc (system=4) and GEM5.2-NEMO (system=5), etc
-
The case of ECMWF system=51 is a special one, as this is not a different forecast system than system=5, it is simply a different labelling of ECMWF SEAS5 (system=5) data used for the current data provision of ECMWF SEAS5 to the C3S activity. You can find more details about this in footnote (5) of the table mentioned above.
-
As you would have realised by now, after having read the information about ensemble forecasting, and the description of what “system” means, the data you retrieved from the CDS were all the available ensemble members for that specific forecast system and start date. And yes, the bit about start date is relevant here because typically, for a given forecast system, start dates in the hindcast (or reforecast) period have fewer members than for real-time forecasts.
I hope all that sounds helpful.
Regards,
Edu
2 Likes
Hi @Eduardo_Penabad
Could you clarify whether the 25 ensemble members used in the hindcasts are exactly the first 25 members from the operational forecasts, or if each member has its own unique ID? I’m planning an analysis covering 1995–2024 and need to ensure I’m tracking the same members throughout. Thanks
Hi @Ignacio_Saldivia_Gonzatti
what do you mean by “the same members”?
In other words, what features do you need to check to “ensure you are tracking the same members”?
Regards!
Thank you, @Eduardo_Penabad
As mentioned in another reply, “start dates in the hindcast (or reforecast) period have fewer members than for real-time forecasts.”
If I understand correctly, each member starts with a small perturbation and evolves through the forecast period in slightly different ways than the unperturbed control member.
Since the hindcasts have fewer members, I am wondering how I can combine data from the hindcast runs and the real-time forecasts to conduct probabilistic analyses, calculate an ensemble mean, etc.
In the seas5 files I downloaded, there is a number
variable that runs from 1 to N. Can I be 100% sure that number
= 1…25 in the hindcast correspond to the same perturbation seeds as number
= 1…25 in the real forecasts?
Or are these seeds completely different? Could you point me to any other metadata I should use to match up member n between hindcast and forecast?
That’ll let me merge the two datasets over 1995–2024.
I hope this is clearer.
PS: I see that in the CDS form, steps only go up to 5160 hours (7 months). How can we access the 13-month forecasts for 1st February, 1st May, 1st August and 1st November then? I am sorry if I missed something obvious.
Many thanks,
Ignacio
Hi @Ignacio_Saldivia_Gonzatti
I need to first start mentioning that each forecast system have different approaches to mapping the different types of uncertainties in both the initial conditions and those coming from the model run. That is to say that some of the specific elements I will describe below are related to ECMWF SEAS5 but they might or might not apply to other forecast systems available on the CDS.
As I see it there are at least a few elements that make it hard to give a straightforward answer to the question if member N is “the same” for different start dates:
- The first one is the distinction between the hindcasts (or reforecasts) and the real-time forecasts. Even though they are run with the same version of the model(s) there are differences in the way initial conditions are created as described, in this case, in the SEAS5 documentation (C3S contribution here, ECMWF SEAS5 User Guide here)
- Additionally, for SEAS5 stochastic perturbations (SPPT and SPBS) are applied throughout the model run to all the members. So even the “unperturbed” member it is unperturbed at the initial conditions, during the model run it is perturbed in an equivalent way to any other member.
- And finally, part of the perturbations applied to the initial conditions (singular vectors) are situation dependent, meaning that they are designed to produce the perturbations that would lead to their maximum growth.
Ensemble members are expected to be equally likely (and thus, interchangeable) different versions of the forecast fields, and therefefore there are very little guarantees that there is any relation betwen member N for a start date and another one having the same label N for a different start date. In summary, to what extent they can be considered “the same” it is highly dependent on what “the same” means for your specific application.
I hope that information sounds useful.
1 Like
Many thanks for your helpful answer, @Eduardo_Penabad.
This is fine as I am using SEAS5 only.
I now understand better why it is not possible to match the N members for different start dates.
My two follow-up questions then are:
-
I’m planning to force a crop model with SEAS5 and then evaluate its probabilistic skill. Because the hindcasts switch from 25 members (pre-2017) to 51 members (post-2016), do you have any advice on handling the change in ensemble size? I can either: randomly subsample 25 members from each 51-member forecast after 2016 so that I always work with 25 members and maintain a consistent sample size OR
use all available members (25 before 2017, 51 thereafter) and then apply any necessary corrections or normalisation for the differing ensemble sizes?
-
I see that in the CDS form, steps only go up to 5160 hours (7 months). How can we access the 13-month forecasts for 1st February, 1st May, 1st August and 1st November then? I am sorry if I missed something obvious.
Many thanks.
Kind regards,
Ignacio