I’ve noticed that in the “ERA5 hourly data on pressure levels from 1940 to present” dataset, specific humidity occasionally takes on negative values. From one sample field recently downloaded from CDS:
In [9]: print(grib.data_vars)
Data variables:
q (isobaricInhPa, latitude, longitude) float32 ...
In [10]: print(grib.q.min())
<xarray.DataArray 'q' ()>
array(-5.2340856e-06, dtype=float32)
Coordinates:
number int64 ...
time datetime64[ns] 2021-01-01T18:00:00
step timedelta64[ns] ...
valid_time datetime64[ns] ...
In [19]: qlevel = grib.q.sel(isobaricInhPa=200)
In [20]: min_idx = np.argmin(qlevel.data); print(np.unravel_index(min_idx,qlevel.shape))
(319, 720)
In [22]: print(qlevel.data[317:322,718:722])
[[ 3.2670268e-06 5.3010353e-06 1.1883623e-05 1.2811221e-05]
[ 2.2276708e-06 1.3522276e-06 6.1504015e-06 1.4536030e-05]
[ 5.9551894e-08 -3.9004317e-06 -5.2340856e-06 8.8735887e-06]
[ 1.3298759e-06 -1.8328956e-06 2.1941432e-06 5.7033667e-06]
[ 2.9727289e-06 1.1436114e-06 8.7166518e-07 1.1216796e-05]]
For this particular date and level, the minimum value is about -4% of the maximum value, too large to be explained by data quantization or floating-point (roundoff) errors.
Instead, I imagine this is arising as a Gibbs-type artifact from either the horizontal interpolation (spherical harmonic / reduced Gaussian to lat/lon) or the vertical interpolation (model level to pressure level).
To avoid reinventing the data processing wheel, is there an established set of best practices on how to deal with these values?