Seeking architecture and training configuration advice for 1° model

Julian_Schmitt · 12 February 2026 23:27

We are training a 1° model (O96) from scratch on the provided 40 year ERA5 dataset using a slightly modified default configuration for MSE training. We have modified the following configuration arguments and trained an initial model for just 5000 steps with a batch size of 16 and a smaller model (following the AIFSv1.1 paper):

```

dataloader:

batch_size:

training: 16

validation: 16

training:

max_steps: 5000

model:

num_channels: 128 # make the model smaller - default is 512 for N320 model

rollout:

start: 3

epoch_increment: 2

max: 12

lr:

warmup: 1000 # number of warmup iterations

rate: 0.625e-4 #local_lr

iterations: ${training.max_steps} # NOTE: When max_epochs < max_steps, scheduler will run for max_steps

min: 3e-7 #Not scaled by #GPU

```

We’re looking for intuition as to how to improve our forecast skill, particularly for precipitation as currently our forecasts have better RMSE for wind speed and temperature for about 3 days as compared with climatology but never have better precipitation skill. A 12 hour precipitation forecast looks quite washed out and blurry compared with reanalysis. Do you have intuition for what needs to change to improve skill for all variables and particularly precipitation? Our current ideas are just to train longer, (increase max_steps) as the validation loss is still decreasing, and to make the model larger (increase num_channels). We are testing those hypotheses but would welcome expertise from people who have tried something similar in the past.

Here are figures with 12 hour forecast lead time showing that precipitation is very blurry and faded compared with renanalysis