Inference with AIFS-ENS v.1 on S2S Timescales

Aodhan_Sweeney · 30 March 2026 09:28

Hello!

Thanks for making the AIFS-ENS v1.0 available! I was quite impressed to see the skill of the AIFS-ENS v1.0 compared to the S2S runs of IFS shown in the recent AIFS-CRPS paper ( AIFS-CRPS: Ensemble forecasting using a model trained with a loss function based on the Continuous Ranked Probability Score ), as well as the results of AIFShera in AI Weather Quest.

In order to further investigate the utility and skill of AIFS-ENS for S2S forecasting, I tried running the AIFS-ENS (following instructions from this notebook) on an A10 GPU out to 6-weeks as done in the referenced paper. However, the results I am getting seem unrealistic, and are difficult to square with results in the paper. For example, see the week six predictions I get form a 7 member ensemble of AIFS-ENS initialized from March 26, 2026 at 00z (see attached). Large biases appear at these later lead times, particularly over the Tibetan plateau, and the Andes. While this problem is worse with larger lead times, even at week 3 and 4 biases are quite large. A similar error seems to have been flagged on the AIFS-ENS issues page, but to a much smaller degree (see this issue). The only difference I can identify between the paper and what I am currently running is the use of the N096 1x1 grid versus the public versions N320 0.25x0.25 grid.

My questions are:
1.) Could the differences between the model I am running at 0.25x0.25 degree versus the paper/AIFSheras 1x1 degree N096 grid be responsible for the very large biases? Are there any other obvious contributions to these differences?

2.) Is creation of a hindcast climatology necessary to get useful results on the S2S timescales? I noticed that AIFShera quintile probabilities in the weather quest are debiased using a hindcast climatology over the previous 20 years. Is this data publicly available anywhere?

Thanks in advance for any help!
Aodhan

Meghan · 29 April 2026 15:50

This question was also submitted via another channel. For completeness, I’m posting the response here in case it is useful to others:

The cold biases over mountainous terrain at extended lead times are consistent with our own findings. AIFS-ENS was trained to minimise errors on medium-range timescales (up to ~10 days) and does not generalise well beyond that — local biases and artefacts grow with lead time, particularly over complex orography. The resolution difference is an important factor: the O96 version used in the paper is coarser, which tends to produce more stable long-range rollouts.
The primary motivation for using a model climatology is consistency with IFS, but it does provide a modest skill benefit at S2S lead times. The hindcast climatology used in the AI Weather Quest is not currently publicly available, unfortunately.

On the broader question of S2S capability: we are actively working on an operational subseasonal version of AIFS (tentatively AIFS-SUBS). It will initially be available in experimental mode via ECMWF charts, with model weights and data access planned for a later stage. We unfortunately can’t give a firm timeline yet.