Seasonal Forecast and Hindcast Use

I would like to ask about the correct approach in which the seasonal forecast model run should be compared with its relevant hindcast run(s). 

Let's consider the ECMWF forecast monthly data on a single level starting on 1st February 2020. I want to calculate, for instance, monthly anomaly of air temperature for May 2020. It is already available as the final product in CDS, I know, but I'd like to learn a method how this is done. What hindcast runs are used to calculate May 2020 temperature anomaly and what other steps follow? 

1) All hindcast runs starting from 1st February (1.2.1993, 1.2.1994, 1.2.1995 ..... 1.2. 2016) are considered. The individual ensemble members data for May 1993, 1994.... 2016 are taken and mean value over all these ensemble members and years is calculated and this is then used as the "hindcast climatology" for the month of May.

or

2) All hindcast runs starting from 1st of May (1.5.1993, 1.5.1994, 1.5.1995.... 1.5.2016) are considered and all others steps are the same as above.

or

3) None of these two options, it's done in a different way.... (how?)

Another question related to this is whether I could also download an ensemble mean value calculated over all hindcast members starting on the same day (e.g., mean value from all ensemble members of hindcast starting on 1.1.1993 etc.). Could I?

Thank you for any suggestions and advice.

Petr

Hi Petr,

in seasonal forecast, the biases are usually leadtime dependent, i.e. the models don't have the same biases at the very beginning of the forecast than at the end.The hindcasts are produced in the same way as the real-time forecasts but for past dates, so you can use them to bias-correct the forecasts and to assess the skill of the forecasting system. Given the dependency on the leadtime, you are expected to remove those biases by comparing like with like.

When you calculate anomalies by substracting from a given real-time forecast the model climate (calculated as the average over the hindcast period), you are, by construction, removing the mean bias. As mentioned above, those biases are expected to be leadtime dependent, so you would like to substract from a forecast with a leadtime N (in months), the hindcast for the same leadtime.

How does this work in your example?

  • for a given start date, i.e. 1st February, you will have:
    • fcst(ensemble_member, leadtime_month, gridpoint)
    • hcst(year, ensemble_member, leadtime_month, gridpoint)
  • and you will calculate (for the ensemble mean anomaly, but it can be easily
    • anom_ensemblemean(leadtime_month, gridpoint) = fcst_ensemblemean( leadtime_month, gridpoint) - hcst_mean(leadtime_month, gridpoint)
    • anom_per_member(ensemble_member, leadtime_month, gridpoint) = fcst(ensemble_member, leadtime_month, gridpoint) - hcst_mean(leadtime_month, gridpoint)

where

fcst_ensemblemean is the average over the ensemble members

hcst_mean is the average over both the ensemble members and the years


In your example you were trying to bias-correct forecast with leadtime=4 (May from a February start date) using hindcasts with leadtime=1 (May from a May start date)


Regarding your second question, in the C3S datasets offer there are no ensemble means calculated per each hindcast start date (YYYYMM), you just have available the hindcast mean (as introduced above, hcst_mean) calculated both over years and members.


I hope that sounds clarifying.

Best regards,


Eduardo Penabad
C3S Seasonal Forecasts


Hi Eduardo,

Thank you very much for the detailed explanation. Just to make sure (and to avoid a methodological mistake when working with the seasonal forecast data) the same approach will apply when working with seasonal forecast data of daily resolution. Of course, with a small difference, that it makes no sense to study anomaly for a particular day (e.g., 17th May) rather a group of days (week or so). 

Thanks once aagain

Petr

Yes Petr, you are right... independently to the specifics on the methodology you'd like to apply to correct biases on daily data, the assumption that biases might have a leadtime dependency still holds and the potential impact of that should be taken into account.