I have downloaded Complete ERA5 Reanalysis and tried working through the tutorials for determining geopotential and pressure levels on the ECMWF Site:
My desired output is to generate a new ERA5 netcdf file with geopotential variables and interpolated to user-defined pressure levels (If I use the same pressure levels as ERA5 on pressure levels, will I end up with the exact same output, or are there differences in the calculations/interpolations?)
I believe to have found some bugs in the scripts.
In compute_geopotential_on_ml.py the output file loses the time dimension. This can be fixed by modifying the following code chunks:
for step in codes_index_get(idx, 'step'):
codes_index_select(idx, 'step', step)
if not values:
values = get_initial_values(idx, keep_sample=True)
values['time'] = int(time) # NEW LINE
values['date'] = int(date) # NEW LINE
for lev in sorted(values['levelist'], reverse=True):
try:
z_h, z_f = compute_z_level(idx, lev, values, z_h)
# store the result (z_f) in a field and add to the output
codes_set(values['sample'], 'level', lev)
codes_set(values['sample'], 'time', int(values['time'])) # NEW LINE
codes_set(values['sample'], 'date', int(values['date'])) # NEW LINE
This adds the time dimension back in if multiple hours are selected within one day, however, results in a segmentation fault if I try to run the script on input data that has more than one day’s worth of data. I’m not sure what to do from here to fix the segmentation fault issue.
There is a bug in the conversion_from_ml_to_pl.py as well. It works for single pressure levels, but not multiple pressure levels. I was able to fix it with this change:
def calculate_interpolated_pressure_field(data_var_on_ml, data_p_on_ml,plevs):
nlevs = len(data_var_on_ml.hybrid)
p_array = np.stack(data_p_on_ml, axis=2).flatten()
# Flatten the data array to enable faster processing
var_array = np.stack(data_var_on_ml, axis=2).flatten()
no_grid_points = int(len(var_array)/nlevs)
interpolated_var = np.empty((len(plevs), no_grid_points))
ds_shape = data_var_on_ml.shape
nlats_values = data_var_on_ml.coords['latitude']
nlons_values = data_var_on_ml.coords['longitude']
nlats = len(nlats_values)
nlons = len(nlons_values)
# Iterate over the data, selecting one vertical profile at a time
count = 0
profile_count = 0
interpolated_values=[]
for point in range(no_grid_points):
offset = count*nlevs
var_profile = var_array[offset:offset+nlevs]
p_profile = p_array[offset:offset+nlevs]
interpolated_values.append(vertical_interpolate(p_profile, var_profile, plevs))
profile_count += len(p_profile)
count = count + 1
interpolated_field=np.asarray(interpolated_values).reshape(len(plevs),nlats,nlons)
return interpolated_field