Many of the GRIB files in the ERA5-Land dataset seem to have the wrong metadata and/or completely mismatch the Parameter database
As a simple example, let’s examine the Glacier Mask invariant parameter, shortName glm
. The GRIB (v2) file is called cicecap.grib
An inspection with:
gdalinfo cicecap.grib
shows the correct geographic information and expected value range and statistics for a glacier cover mask. but the wrong parameter metadata:
Driver: GRIB/GRIdded Binary (.grb, .grb2)
Files: cicecap.grib
cicecap.grib.aux.xml
Size is 3600, 1801
Coordinate System is:
GEOGCRS["Coordinate System imported from GRIB file",
DATUM["unnamed",
ELLIPSOID["Sphere",6371229,0,
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433,
ID["EPSG",9122]]]]
Data axis to CRS axis mapping: 2,1
Origin = (-180.050000000000011,90.049999999999997)
Pixel Size = (0.100000000000000,-0.100000000000000)
Corner Coordinates:
Upper Left (-180.0500000, 90.0500000) (180d 3' 0.00"W, 90d 3' 0.00"N)
Lower Left (-180.0500000, -90.0500000) (180d 3' 0.00"W, 90d 3' 0.00"S)
Upper Right ( 179.9500000, 90.0500000) (179d57' 0.00"E, 90d 3' 0.00"N)
Lower Right ( 179.9500000, -90.0500000) (179d57' 0.00"E, 90d 3' 0.00"S)
Center ( -0.0500000, -0.0000000) ( 0d 3' 0.00"W, 0d 0' 0.00"S)
Band 1 Block=3600x1 Type=Float64, ColorInterp=Undefined
Description = 10[m] HTGL="Specified height level above ground"
Min=0.000 Max=1.000
Minimum=0.000, Maximum=1.000, Mean=0.113, StdDev=0.312
Metadata:
GRIB_UNIT=[m/s]
GRIB_COMMENT=Wind speed [m/s]
GRIB_ELEMENT=WIND
GRIB_SHORT_NAME=10-HTGL
GRIB_REF_TIME=1359072000
GRIB_VALID_TIME=1359072000
GRIB_FORECAST_SECONDS=0
GRIB_DISCIPLINE=0(Meteorological)
GRIB_IDS=CENTER=98(ECMWF) SUBCENTER=0 MASTER_TABLE=4 LOCAL_TABLE=0 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2013-01-25T00:00:00Z PROD_STATUS=0(Operational) TYPE=2(Analysis_and_forecast)
GRIB_PDS_PDTN=0
GRIB_PDS_TEMPLATE_NUMBERS=2 1 0 255 128 0 0 0 1 0 0 0 0 103 0 0 0 0 10 255 255 255 255 255 255
GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=2 1 0 255 128 0 0 1 0 103 0 10 255 -127 -2147483647
STATISTICS_MINIMUM=0
STATISTICS_MAXIMUM=1
STATISTICS_MEAN=0.11263618958284
STATISTICS_STDDEV=0.31178133476451
STATISTICS_VALID_PERCENT=100
Of particular note is the reporting that GRIB_SHORT_NAME
is 10-HTGL instead of GLM, as well as the wrong GRIB_ELEMENT
(reported as WIND
) and GRIB_UNIT
of [m/s]
Ok so what’s really in this GRIB? Let’s export to a GeoTIFF with the following command:
gdal_translate -of GTiff -b 1 cicecap.grib cicecap.tif
And we see that it definitely looks like glacial cover!
So let’s dive deeper into the GRIB contents with ecCodes. I ran a python extraction of the WMO triple (discipline
, parameterCategory
, and parameterNumber
) to see if it matches the data documentation:
# Extracts WMO triple for the parameter in the GRIB
with open(path, 'rb') as f:
gid = codes_grib_new_from_file(f)
if gid is None:
logger.info("No GRIB message found in file.")
return
disc = codes_get(gid, 'discipline')
cat = codes_get(gid, 'parameterCategory')
num = codes_get(gid, 'parameterNumber')
logger.info(f"GRIB keys -> discipline: {disc}, category: {cat}, number: {num}")
codes_release(gid)
And it reports: GRIB keys -> discipline: 0, category: 2, number: 1
Which does not match the triple on the parameter detail webpage of [2,5,0]!
Further inspection by opening it with xarray reveals that it contains a single variable:
ds = xr.open_dataset(
path,
engine='cfgrib',
backend_kwargs={'filter_by_keys': {},},
)
logger.info(f"Available variables: {list(ds.data_vars)}")
Produces an output of: Available variables: ['si10']
. I can’t find any documentation on a parameter referenced as si10
although it is perilously close to 10si
which is the shortname for 10m Wind Speed and would align with some of the GRIB metadata including the extracted WMO triple…
So how did we get a GRIB file for Glacier Mask that:
- Has a filename that is not associated with any of the official property names
- Has the wrong WMO triple (discipline, cat, num)
- Has the wrong
GRIB_SHORTNAME
,GRIB_UNIT
, and other metadata - Has the wrong/invalid variable/index name
If we hadn’t chosen such an obvious parameter to look at, how could we tell what data we were really looking at? In this example, the best guess would have been 10 meter wind speed if the data values were even close to believable.