Mismatch between ERA5 hourly data and ERA5 stats data before 1980

I am using ERA5 daily statistics application to get daily stats for 2m_temperature since 1950. After that I update daily stats using ERA5 hourly dataset.

I use CDS api for both cases. For stats following instructions for project app-c3s-daily-era5-statistics and for ERA 5 using dataset reanalysis-era5-single-levels with 'reanalysis'

But there seems to be gap for older years. Also 1950 seems to have larger gaps that 1970. after 1980 the calculated numbers are much more closer.

I am not sure where is the gap, but seems like I cannot combine stats app data and calculate stats from ERA5 hourly data and put that into same dataset as timeseries.

Let me know, if anyone knows a solution to this. Otherwise, it seems like I need to calculate stats from ERA5 hourly starting 1950, which will much more time I assume

  • 1980 : The value gaps for various regions seems in normal gap due to difference in floating point calculation
    • Stats application

    • ERA5 hourly data

  • 1970: The value gaps seems to be much more
    • Stats app

    • ERA5 hourly

  • 1950: Same here, the gap seems to be much more than floating point variations
    • Stats app

    • ERA5 hourly

I did further check on every decade with jan 1, yyyy as a sample from stats application for daily_maximum, daily_minimum and daily_mean. Below are the results.

"stats" in the columns - mean data from stats application for whole grid

"era5" in column - meand data from era5 hourly for whole grid.

"eps" in rows - threshold to test if grid cell value is different between stats data and era5 data

num_cell: - number of cells that are different for a given eps

pct_cell - percent of cell that are different.

Looking at the table below it seems clear that before 1980 there is definitely some difference in the way these two provide data to users.

I have now changed my scripts and updated historical data based on era5 to be consistent. Similar is the case for ocean temp data.