Missing values in ECMWF-51 and DWD-21 hindcasts (Seasonal forecast daily and subdaily data on single levels)

I would like to report an issue identified in the dataset “Seasonal forecast daily and subdaily data on single levels”.

A problem has been detected in the hindcasts of the European model (ECMWF-51) and the German model (DWD-21), which contain a large number of missing values (NA) in several ensemble members. I am currently working with the 1993–2022 hindcast period. This issue prevents the execution of verification procedures for any forecast.

This situation raises concerns regarding the robustness of the forecasts (2024), since hindcasts are fundamental for forecast construction and calibration. Upon examining the hindcast dimensions, I found that the ECMWF-51 and DWD-21 models have 51 ensemble members, of which 25 and 20 members, respectively, contain missing data. In contrast, the ECCC-5 and MF-9 models have 21 and 31 ensemble members, respectively, and do not show this issue.

The problem occurs systematically in these models, affecting all variables and multiple months within the hindcast period. I also tested a different hindcast period (1994–2023), and the same issue persists. No alternative hindcast periods are available, as data for DWD, ECCC, and MF are only provided from 1993 onwards.

I look forward to your response!

I might be wrong but this is possibly due to the fact that the hindcast period for the ECWMF model is 1981-2016 and data after that is forecast data from older model runs. The forecasts have more ensemble members (51) than the hindcasts (25)

if you now read this in xarray you will get a lot of missing data because xarray tries to be helpful - it allows you to open this data but then you get missing data for the ensemble members after 25 during the hindcast period. I have had this happen to me.

You have 2 choices - only use the real hindcast period OR only get ensemble members 0-24 for the whole period.

The same is the case for DWD but it’s hindcast is 1993-2023 and the numbers are 50 for FC and 30 for HC.

Description of the C3S seasonal multi-system - Copernicus Knowledge Base - ECMWF Confluence Wiki

Thank you very much for your explanation.

Before reading your reply, I reviewed other posts related to this dataset and found a response from an ECMWF staff member stating:

“The data you retrieved from the CDS were all the available ensemble members for that specific forecast system and start date. And yes, the bit about start date is relevant here because typically, for a given forecast system, start dates in the hindcast (or reforecast) period have fewer members than for real-time forecasts.”

This seems consistent with what you mentioned regarding the different number of ensemble members between hindcasts and forecasts. It may indeed be related to the fact that the real hindcast period uses fewer members, while later start dates correspond to forecasts with larger ensemble sizes.

I will therefore test restricting the ensemble dimension to the members that are consistently available during the actual hindcast period and check whether this resolves the issue for verification purposes.

Thanks again for your answer — this was very helpful.