Hi all
I've got a coupled model, FOCI-OpenIFS, which consists of OpenIFS 40R1 + NEMO 3.6 + OASIS3-MCT3 + river routing (similar to EC-Earth).
I'm running it on HLRN-IV in Goettingen, Germany, which has Intel Skylake chips and Intel compilers.
I can compile and run the model with Intel Fortran 18.0.5 and Intel MPI 2018.4 without any problems. One year for a T159/ORCA05 configuration takes less than 45 minutes using 279 CPUs for OpenIFS and 480 CPUs for NEMO.
However, when upgrading to Intel Fortran 19.0.3 and Intel MPI 2019.3 the model got around 100 times slower!
Digging into the details with various performance tools, the support team found that NEMO runs fine with the new compiler versions but sending the ocean state to OpenIFS (SST, sea ice etc) takes up to 20 minutes. Basically, OpenIFS gets stuck at an OASIS_GET command, while NEMO has done the OASIS_PUT several minutes before.
The river routing scheme, which is a single MPI task, also freezes and waits for runoff from OpenIFS.
I can run with Intel Fortran 19.0.3 + Intel MPI 2018.4 without problems, so the issue is Intel MPI 2019. I don't know if the same problem is specific to my hardware.
The support team suspects that Intel MPI 2019, for some reason, can not handle the many asynchronous MPI messages that get sent each coupling step.
OpenIFS standalone, i.e. no coupling, runs maybe 20% slower with Intel MPI 2019 compared to 2018, which is bad but not as bad as the coupled model.
I'm wondering if anyone else has tried to run OpenIFS with Intel compilers version 2019 and noticed anything like this.
In particular, this would cause problems for coupled models like EC-Earth.
Cheers
Joakim