Calculating daily precipitation sum from ERA5-Land hourly data

Wade_Wall · 4 June 2024 21:20

Hi all,

I am using the cdsapi in python to download ERA5-Land climate variables. However, I am a little confused as to how to calculate precipitation sum. The only available options appear to be “daily mean”, “daily minimum”, “daily maximum” and “daily mid range”. How do I sum the hourly using the cdsapi?

I have been following this basic script that has previously been posted:

# Uncomment years as required
 
years =  [
            '1979'
#           ,'1980', '1981',
#            '1982', '1983', '1984',
#            '1985', '1986', '1987',
#            '1988', '1989', '1990',
#            '1991', '1992', '1993',
#            '1994', '1995', '1996',
#            '1997', '1998', '1999',
#            '2000', '2001', '2002',
#            '2003', '2004', '2005',
#            '2006', '2007', '2008',
#            '2009', '2010', '2011',
#            '2012', '2013', '2014',
#            '2015', '2016', '2017',
#            '2018', '2019', '2020',
#            '2021'
]
 
 
# Retrieve all months for a given year.
 
months = ['01', '02', '03',
            '04', '05', '06',
            '07', '08', '09',
            '10', '11', '12']
 
# For valid keywords, see Table 2 of:
# https://datastore.copernicus-climate.eu/documents/app-c3s-daily-era5-statistics/C3S_Application-Documentation_ERA5-daily-statistics-v2.pdf
 
# select your variable; name must be a valid ERA5 CDS API name.
var = "2m_temperature"
 
# Select the required statistic, valid names given in link above
stat = "daily_mean"
 
# Loop over years and months
 
for yr in years:
    for mn in months:
        result = c.service(
        "tool.toolbox.orchestrator.workflow",
        params={
             "realm": "user-apps",
             "project": "app-c3s-daily-era5-statistics",
             "version": "master",
             "kwargs": {
                 "dataset": "reanalysis-era5-single-levels",
                 "product_type": "reanalysis",
                 "variable": var,
                 "statistic": stat,
                 "year": yr,
                 "month": mn,
                 "time_zone": "UTC+00:0",
                 "frequency": "1-hourly",
                 "grid": "1.0/1.0",
                 "area":{"lat": [10, 60], "lon": [65, 140]}
 
                 },
        "workflow_name": "application"
        })
         
# set name of output file for each month (statistic, variable, year, month     
 
        file_name = "D:\\path\\to\\location\\" + stat + "_" + var + "_" + yr + "_" + mn + ".nc"
         
        location=result[0]['location']
        res = requests.get(location, stream = True)
        print("Writing data to " + file_name)
        with open(file_name,'wb') as fh:
            for r in res.iter_content(chunk_size = 1024):
                fh.write(r)
        fh.close()

Thanks for any information.

Kevin_Marsh · 5 June 2024 11:13

Hi, your script is calling the CDS daily application, which does not support accumulated fields (such as precipitation) from ERA5-land hourly data. This is because for ERA5-Land hourly data, the accumulated values are stored as the total since 0000 on a given day.
This means that the daily total for a given day is simply the data from the 0000 timestep of the following day (e.g. 0000 on 2nd January2024 contains the total precipitation for 1st January 2024). So for this dataset, you do not need to calculate the daily sum, as it already exists.
See the ERA5-Land documentation for details.
hope that helps
Kevin

Ignacio_Saldivia_Gonzatti · 11 June 2025 09:22

For anyone encountering the same confusion when working with ERA5-Land GRIB files in xarray:

When you open a file from the reanalysis-era5-land dataset using xarray and cfgrib, you’ll see two time-related coordinates: time and valid_time.

time refers to the initial time of the forecast — in the case of accumulated variables (like precipitation), this corresponds to the day the accumulation applies to (i.e. the day you’re interested in).
valid_time represents the end of the accumulation period, typically at 00:00 UTC on the following day (D+1 00:00). This is when the value is recorded in the GRIB metadata.

At first, I was puzzled to see data for December 31 of the previous year, even though I thought it should be stored under January 1 at 00:00. The key insight is that xarray uses the time coordinate by default, which is actually more convenient for daily analysis — because it correctly assigns the value to the day when the accumulation occurred.

So, you have two valid approaches when using daily accumulation:

Use time (initial forecast time) as is, which correctly aligns to the actual day of accumulation.
Or switch to valid_time and subtract one day to realign it with the correct day.