New dataset published in CDS: ERA5 and ERA5-Land post-processed daily statistics

We are pleased to announce that the much awaited ERA5 and ERA5-Land post-processed daily statistics have been published as datasets in the new Climate Data Store (CDS), replacing the legacy ERA5 daily statistics application:

These user-friendly catalogue entries provide post-processed aggregated to daily time steps including

  • The daily aggregation statistic (daily mean, daily max, daily min)
  • The sub-daily frequency sampling of the original data (1 hour, 3 hours, 6 hours)
  • The option to shift to any local time zone in UTC (no shift means the statistic is computed from UTC+00:00)

Users should be aware that the daily aggregations are calculated during the retrieval process and are not part of a permanently archived dataset.

For any enquiries regarding this dataset, please contact us .

ECMWF Support

2 Likes

Hello and thank you very much for your work!

I could not find information on how the daily mean calculation of hourly values deals with the two time points DAY at 00:00 and DAY + 1 at 00:00.

When I am interested in the daily mean (e.g. temperature) for the period from DAY at 00:00 to DAY + 1 at 00:00, both time points at the start and end of this period seem equally relevant. Therefore, I assume the correct mean should consider both of these values, but with half the weight of all other 23 values (values from DAY at 01:00 to DAY at 23:00), to get the best estimate of the daily mean.

If omitting the value from DAY + 1 at 00:00, and just taking the mean over the 24 values from DAY at 00:00 to DAY at 23:00, one could say this represents the mean of the period from DAY-1 at 23:30 to DAY at 23:30 (i.e. a slight shift of the covered time period by half an hour).

Can you comment or provide any source to find out how exactly these daily means for hourly data from discrete time points are calculated?

Thank you and cheers!

Are there any plans to provide daily data from ERA5-Land also for accumulated variables (e.g. precipitation, radiation, …), since they are omitted in ERA5-Land post-processed daily statistics from 1950 to present?

1 Like

Hello ECMWF Support team,
Thank you very much for your work!
I have download a regional data and compared the t2m, tmin and tmax. I found tmax < t2m in some areas, and tmax=tmin in almost the whole region.
Below is my codes for downloading data of January 2023:

  1. download t2m:
    import cdsapi

dataset = “derived-era5-land-daily-statistics”
target = ‘t2m_202301.nc’
request = {
“variable”: [“2m_temperature”],
“year”: “2023”,
“month”: “01”,
“day”: [
“01”, “02”, “03”,
“04”, “05”, “06”,
“07”, “08”, “09”,
“10”, “11”, “12”,
“13”, “14”, “15”,
“16”, “17”, “18”,
“19”, “20”, “21”,
“22”, “23”, “24”,
“25”, “26”, “27”,
“28”, “29”, “30”,
“31”
],
“daily_statistic”: “daily_mean”,
“time_zone”: “utc-05:00”,
“frequency”: “1_hourly”,
“area”: [57, -96, 41, -74]
}
client = cdsapi.Client()
client.retrieve(dataset, request,target)

download tmin:
import cdsapi

dataset = “derived-era5-land-daily-statistics”
target = ‘tmin_202301.nc’
request = {
“variable”: [“2m_temperature”],
“year”: “2023”,
“month”: “01”,
“day”: [
“01”, “02”, “03”,
“04”, “05”, “06”,
“07”, “08”, “09”,
“10”, “11”, “12”,
“13”, “14”, “15”,
“16”, “17”, “18”,
“19”, “20”, “21”,
“22”, “23”, “24”,
“25”, “26”, “27”,
“28”, “29”, “30”,
“31”
],
“daily_statistic”: “daily_minimum”,
“time_zone”: “utc-05:00”,
“frequency”: “1_hourly”,
“area”: [57, -96, 41, -74]
}

client = cdsapi.Client()
client.retrieve(dataset, request,target)

download tmax;
import cdsapi

dataset = “derived-era5-land-daily-statistics”
target = ‘tmax_202301.nc’
request = {
“variable”: [“2m_temperature”],
“year”: “2023”,
“month”: “01”,
“day”: [
“01”, “02”, “03”,
“04”, “05”, “06”,
“07”, “08”, “09”,
“10”, “11”, “12”,
“13”, “14”, “15”,
“16”, “17”, “18”,
“19”, “20”, “21”,
“22”, “23”, “24”,
“25”, “26”, “27”,
“28”, “29”, “30”,
“31”
],
“daily_statistic”: “daily_maximum”,
“time_zone”: “utc-05:00”,
“frequency”: “1_hourly”,
“area”: [57, -96, 41, -74]
}

client = cdsapi.Client()
client.retrieve(dataset, request,target)

I can not figure out what’s wrong in the codes. Can you help me to find what’s wrong in my codes?
Thanks,
Ziwang

I’d also be interested in these.

Now, I figured out the reason: The daily maximum and daily minimum values are not derived from actual daily maximum or minimum values; they are the same as the hourly data. Comparing hourly data with the daily mean is not reasonable, which can result in some areas where the maximum temperature (Tmax) is less than the mean temperature (Tmean).

Thanks to the support team for their hard work. All the problems have now been fixed. That’s great!

I’d still be interested in a reply here. Thank you!

(I think the same logic would also apply to daily min or max values - both hourly values from DAY at 00:00 and DAY+1 at 00:00 seem relevant.)

I found the solution to this in the new documentation (below). The solution to get the accumulated data for day i is to make a separate API request to the hourly product at time 00:00 of day i+1.

https://confluence.ecmwf.int/display/CKB/ERA5+family+post-processed+daily+statistics+documentation

The data time-stamped YYYY/MM/DD 00:00 represents the total daily accumulation for the date YYYY/MM/DD-1. Therefore:

  1. To calculate the daily accumulation for UTC, you just need to sample the ERA5-Land data at 00:00 and be aware that the data is representative of the day before the time stamp in the data

  2. Note, this is why we do not include this, as it would in effect be the same data but with a different time stamp, leading to confusion

Thank you very much! This confirms exactly what I thought : ) For daily data from accumulated ERA5-Land data in different time zones (not UTC), one needs to do some careful calculations with the right hourly values…

How can I download the daily accumulated total precipitation? Currently, it seems that only the statistical aggregations for mean, maximum, and minimum are available, and the values do not correspond to the daily precipitation totals.

I tried using the daily_sum function in the API code, but it didn’t work and gave me precipitation values for every second, which doesn’t make sense.

Python code:
import cdsapi

dataset = “derived-era5-single-levels-daily-statistics”
request = {
“product_type”: “reanalysis”,
“variable”: [“total_precipitation”],
“year”: “2022”,
“month”: [
“01”, “02”, “03”,
“04”, “05”, “06”,
“07”, “08”, “09”,
“10”, “11”, “12”
],
“day”: [
“01”, “02”, “03”,
“04”, “05”, “06”,
“07”, “08”, “09”,
“10”, “11”, “12”,
“13”, “14”, “15”,
“16”, “17”, “18”,
“19”, “20”, “21”,
“22”, “23”, “24”,
“25”, “26”, “27”,
“28”, “29”, “30”,
“31”
],
“daily_statistic”: “daily_sum”,
“time_zone”: “utc-05:00”,
“frequency”: “1_hourly”,
“area”: [2, -100, -2, -95]
}

client = cdsapi.Client()
client.retrieve(dataset, request).download()

2 Likes