Missing/Wrong Metadata in ERA5-Land GRIB files

Many of the GRIB files in the ERA5-Land dataset seem to have the wrong metadata and/or completely mismatch the Parameter database

As a simple example, let’s examine the Glacier Mask invariant parameter, shortName glm. The GRIB (v2) file is called cicecap.grib

An inspection with:

gdalinfo cicecap.grib

shows the correct geographic information and expected value range and statistics for a glacier cover mask. but the wrong parameter metadata:

Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: cicecap.grib
       cicecap.grib.aux.xml
Size is 3600, 1801
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
    DATUM["unnamed",
        ELLIPSOID["Sphere",6371229,0,
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433,
            ID["EPSG",9122]]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.050000000000011,90.049999999999997)
Pixel Size = (0.100000000000000,-0.100000000000000)
Corner Coordinates:
Upper Left  (-180.0500000,  90.0500000) (180d 3' 0.00"W, 90d 3' 0.00"N)
Lower Left  (-180.0500000, -90.0500000) (180d 3' 0.00"W, 90d 3' 0.00"S)
Upper Right ( 179.9500000,  90.0500000) (179d57' 0.00"E, 90d 3' 0.00"N)
Lower Right ( 179.9500000, -90.0500000) (179d57' 0.00"E, 90d 3' 0.00"S)
Center      (  -0.0500000,  -0.0000000) (  0d 3' 0.00"W,  0d 0' 0.00"S)
Band 1 Block=3600x1 Type=Float64, ColorInterp=Undefined
  Description = 10[m] HTGL="Specified height level above ground"
  Min=0.000 Max=1.000
  Minimum=0.000, Maximum=1.000, Mean=0.113, StdDev=0.312
  Metadata:
    GRIB_UNIT=[m/s]
    GRIB_COMMENT=Wind speed [m/s]
    GRIB_ELEMENT=WIND
    GRIB_SHORT_NAME=10-HTGL
    GRIB_REF_TIME=1359072000
    GRIB_VALID_TIME=1359072000
    GRIB_FORECAST_SECONDS=0
    GRIB_DISCIPLINE=0(Meteorological)
    GRIB_IDS=CENTER=98(ECMWF) SUBCENTER=0 MASTER_TABLE=4 LOCAL_TABLE=0 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2013-01-25T00:00:00Z PROD_STATUS=0(Operational) TYPE=2(Analysis_and_forecast)
    GRIB_PDS_PDTN=0
    GRIB_PDS_TEMPLATE_NUMBERS=2 1 0 255 128 0 0 0 1 0 0 0 0 103 0 0 0 0 10 255 255 255 255 255 255
    GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=2 1 0 255 128 0 0 1 0 103 0 10 255 -127 -2147483647
    STATISTICS_MINIMUM=0
    STATISTICS_MAXIMUM=1
    STATISTICS_MEAN=0.11263618958284
    STATISTICS_STDDEV=0.31178133476451
    STATISTICS_VALID_PERCENT=100

Of particular note is the reporting that GRIB_SHORT_NAME is 10-HTGL instead of GLM, as well as the wrong GRIB_ELEMENT (reported as WIND) and GRIB_UNIT of [m/s]

Ok so what’s really in this GRIB? Let’s export to a GeoTIFF with the following command:

gdal_translate -of GTiff -b 1 cicecap.grib cicecap.tif

And we see that it definitely looks like glacial cover!

So let’s dive deeper into the GRIB contents with ecCodes. I ran a python extraction of the WMO triple (discipline, parameterCategory, and parameterNumber) to see if it matches the data documentation:

# Extracts WMO triple for the parameter in the GRIB
with open(path, 'rb') as f:
    gid = codes_grib_new_from_file(f)
    if gid is None:
        logger.info("No GRIB message found in file.")
        return
    disc = codes_get(gid, 'discipline')
    cat  = codes_get(gid, 'parameterCategory')
    num  = codes_get(gid, 'parameterNumber')
    logger.info(f"GRIB keys -> discipline: {disc}, category: {cat}, number: {num}")
    codes_release(gid)

And it reports: GRIB keys -> discipline: 0, category: 2, number: 1
Which does not match the triple on the parameter detail webpage of [2,5,0]!

Further inspection by opening it with xarray reveals that it contains a single variable:

ds = xr.open_dataset(
    path,
    engine='cfgrib',
    backend_kwargs={'filter_by_keys': {},},
)
logger.info(f"Available variables: {list(ds.data_vars)}")

Produces an output of: Available variables: ['si10']. I can’t find any documentation on a parameter referenced as si10 although it is perilously close to 10si which is the shortname for 10m Wind Speed and would align with some of the GRIB metadata including the extracted WMO triple…

So how did we get a GRIB file for Glacier Mask that:

  1. Has a filename that is not associated with any of the official property names
  2. Has the wrong WMO triple (discipline, cat, num)
  3. Has the wrong GRIB_SHORTNAME, GRIB_UNIT, and other metadata
  4. Has the wrong/invalid variable/index name

If we hadn’t chosen such an obvious parameter to look at, how could we tell what data we were really looking at? In this example, the best guess would have been 10 meter wind speed if the data values were even close to believable.