Bugs in ERA5 Complete Reanalysis CDS Conversion Python Scripts

Tristan_Schuler · 9 January 2025 15:30

I have downloaded Complete ERA5 Reanalysis and tried working through the tutorials for determining geopotential and pressure levels on the ECMWF Site:

https://confluence.ecmwf.int/display/CKB/ERA5%3A+compute+pressure+and+geopotential+on+model+levels%2C+geopotential+height+and+geometric+height#heading-Pressureonmodellevels

My desired output is to generate a new ERA5 netcdf file with geopotential variables and interpolated to user-defined pressure levels (If I use the same pressure levels as ERA5 on pressure levels, will I end up with the exact same output, or are there differences in the calculations/interpolations?)

I believe to have found some bugs in the scripts.

In compute_geopotential_on_ml.py the output file loses the time dimension. This can be fixed by modifying the following code chunks:

for step in codes_index_get(idx, 'step'):
                codes_index_select(idx, 'step', step)
                if not values:
                    values = get_initial_values(idx, keep_sample=True)

                values['time'] = int(time)  # NEW LINE
                values['date'] = int(date)  # NEW LINE

for lev in sorted(values['levelist'], reverse=True):
        try:
            z_h, z_f = compute_z_level(idx, lev, values, z_h)
            # store the result (z_f) in a field and add to the output
            codes_set(values['sample'], 'level', lev)

            codes_set(values['sample'], 'time', int(values['time']))  # NEW LINE
            codes_set(values['sample'], 'date', int(values['date']))  # NEW LINE

This adds the time dimension back in if multiple hours are selected within one day, however, results in a segmentation fault if I try to run the script on input data that has more than one day’s worth of data. I’m not sure what to do from here to fix the segmentation fault issue.

There is a bug in the conversion_from_ml_to_pl.py as well. It works for single pressure levels, but not multiple pressure levels. I was able to fix it with this change:

def calculate_interpolated_pressure_field(data_var_on_ml, data_p_on_ml,plevs):
    nlevs = len(data_var_on_ml.hybrid)
    p_array = np.stack(data_p_on_ml, axis=2).flatten()
    # Flatten the data array to enable faster processing
    var_array = np.stack(data_var_on_ml, axis=2).flatten()
    no_grid_points =  int(len(var_array)/nlevs)
    interpolated_var = np.empty((len(plevs), no_grid_points))
    ds_shape = data_var_on_ml.shape
    nlats_values = data_var_on_ml.coords['latitude']
    nlons_values = data_var_on_ml.coords['longitude']
    nlats = len(nlats_values)
    nlons = len(nlons_values)

   # Iterate over the data, selecting one vertical profile at a time
    count = 0
    profile_count = 0
    interpolated_values=[]
    for point in range(no_grid_points):
        offset =  count*nlevs
        var_profile = var_array[offset:offset+nlevs]
        p_profile = p_array[offset:offset+nlevs]
        interpolated_values.append(vertical_interpolate(p_profile, var_profile, plevs))
        profile_count += len(p_profile)
        count = count + 1
    interpolated_field=np.asarray(interpolated_values).reshape(len(plevs),nlats,nlons)
    return interpolated_field

Tristan_Schuler · 16 January 2025 13:58

As an update, my fix of the bug for compute_geopotential_on_ml.py on multiple levels seems to be working well, and lines up well when compared with hourly pressure level data for the same data region.

conversion_from_ml_to_pl.py is not giving good results, and is extremely slow. Instead, I have been using the CDO command line interface. Specifically the ml2pl and ml2hl functions. lines up well when compared with hourly pressure level data for the same data region.