ERA5/lat/lon/pressure -- negative specific humidity?

I’ve noticed that in the “ERA5 hourly data on pressure levels from 1940 to present” dataset, specific humidity occasionally takes on negative values. From one sample field recently downloaded from CDS:

In [9]: print(grib.data_vars)
Data variables:
    q        (isobaricInhPa, latitude, longitude) float32 ...

In [10]: print(grib.q.min())
<xarray.DataArray 'q' ()>
array(-5.2340856e-06, dtype=float32)
Coordinates:
    number      int64 ...
    time        datetime64[ns] 2021-01-01T18:00:00
    step        timedelta64[ns] ...
    valid_time  datetime64[ns] ...

In [19]: qlevel = grib.q.sel(isobaricInhPa=200)

In [20]: min_idx = np.argmin(qlevel.data); print(np.unravel_index(min_idx,qlevel.shape))
(319, 720)

In [22]: print(qlevel.data[317:322,718:722])
[[ 3.2670268e-06  5.3010353e-06  1.1883623e-05  1.2811221e-05]
 [ 2.2276708e-06  1.3522276e-06  6.1504015e-06  1.4536030e-05]
 [ 5.9551894e-08 -3.9004317e-06 -5.2340856e-06  8.8735887e-06]
 [ 1.3298759e-06 -1.8328956e-06  2.1941432e-06  5.7033667e-06]
 [ 2.9727289e-06  1.1436114e-06  8.7166518e-07  1.1216796e-05]]

For this particular date and level, the minimum value is about -4% of the maximum value, too large to be explained by data quantization or floating-point (roundoff) errors.

Instead, I imagine this is arising as a Gibbs-type artifact from either the horizontal interpolation (spherical harmonic / reduced Gaussian to lat/lon) or the vertical interpolation (model level to pressure level).

To avoid reinventing the data processing wheel, is there an established set of best practices on how to deal with these values?

After downloading the native representation, I’m even more perplexed:

In [28]:    import cdsapi
    ...:    c = cdsapi.Client()
    ...:    c.retrieve('reanalysis-era5-complete', { 
    ...:        'date'    : '2021-01-01',  
    ...:        'levelist': '/'.join((str(i) for i in range(1,138))),        ...:
    ...:        'levtype' : 'ml',
    ...:        'param'   : '133',  
    ...:        'stream'  : 'oper',        
    ...:        'time'    : '18',       
    ...:        'type'    : 'an',
    ...:        'format'  : 'grib',       
    ...:    }, 'era5_1_jan_2021_18_model.grib')     # Output file. Adapt as you wish.

Out[28]: Result(content_length=148897080,content_type=application/x-grib,location=https://download-0011-clone.copernicus-climate.eu/cache-compute-0011/cache/data6/adaptor.mars.external-1709133154.2637272-30125-17-f83e293e-1533-4147-88c2-a0273da30dbc.grib)

In [29]: grib2 = xr.open_dataset('era5_1_jan_2021_18_model.grib')

In [32]: np.amin(grib2.q.data)

Out[32]: -1.204683e-05

Per table 12 of the documentation, the native grid for specific humidity is the reduced Gaussian grid, so the above negative value should be a gridpoint value rather than a spectral harmonic.

I don’t see any other obvious moisture-carrying variable, so I’m even more perplexed by the negative value.

Hi - we have noticed the same weird problem. We are thinking about going through the subsets of the data that we downloaded which contains this issue and setting the negative values to just “0” but I also would like to know if there is a recommended set of best practices.