ERA5 versus ERA-Interim Total Precipitation

Good day

I have downloaded the monthly total (accumulated) precipitation for both ERA5 and ERA-Interim. When I plot the total annual precipitation, averaged over the time period 1981-2017, the values for ERA5 are extremely high (as seen below). Why is there such a large discrepancy? Do I need to remove some outliers from the ERA5 data? 

In both data sets, I have multiplied by 1000 and by the number of days in each month to get the units to mm/month, before summing these values for annual totals.

The ERA5 data came from: https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-single-levels-monthly-means?tab=overview

The ERA-Interim data: https://apps.ecmwf.int/datasets/data/interim-mdfa/levtype=sfc/

Many thanks

Jessica


Dear Jessica,

ERA5 does suffer from "rain bombs", mainly over Africa in regions of high orography, which usually occur a few times per year. Rain bombs are concentrated areas of high rainfall. This is probably why the range of values is larger for ERA5 than ERA-Interim. You might be interested in reading about comparisons of ERA5 and ERA-Interim precipitation with various observational products, see Sections 9.2 and 9.3 in Hersbach et al. 2020:

https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.3803

The rain bombs are also mentioned in this reference, in Section 10.2.

For the ERA5 data, I presume you downloaded the Product type "Monthly averaged reanalysis".

Best wishes,

Paul

Dear Paul

Thanks for this explanation. I had come across that paper in my earlier readings and forgot about the "rain bombs". Is it best to simply remove these extreme values and treat as missing? I am new to working with ERA data and apologise if this is an obvious question.

Yes, for ERA5 data I downloaded the "Monthly averaged reanalysis".

Kind regards

Jessica

Dear Jessica,

One of my colleagues has looked at these rain bombs in ERA5, but he is on holiday at the moment.

What I can say now is that 250 mm in 6 h would be a high limit. You could start by using this limit, or the equivalent hourly rate, to detect the rain bombs. Then, you could experiment with lowering this limit, to try and find the optimum limit. Rather than setting such points to missing, I would set them to the chosen limit.

If we can refine this answer when my colleague returns from holiday, I will let you know.

Best wishes,

Paul

Dear Paul

That makes sense, although the data I am using is monthly ("ERA5 monthly averaged data on single levels from 1979 to present"). But I suppose I could apply the same logic with a monthly threshold/limit...

Thank you for your assistance and I'd appreciate you letting me know if there is a better/more refined method.

Kind regards

Jessica

Dear Jessica,

Sorry, yes, it's a bit more difficult for monthly data. You can apply similar logic to the monthly data, though it's more difficult to define a suitable limit for monthly data, with the danger that some contaminated points will not be detected while some uncontaminated points might be labelled as contaminated.

I'll let you know if we can provide a more refined method.

Best wishes,

Paul

Hi Jessica,

I am Ervin, the colleague who looked into these rain bombs.

I have tried to create a simple filtering, that would cap these extreme single grid values, but unfortunately these rain bombs are not easy to remove 'correctly' at all.

These are single grid point spikes in the data. They (as mentioned earlier above) tend to appear over high/extensive orographical areas, maybe up to few times a year (in the most extreme cases/points even 10% of the year).

But there is no exact rule on what is a 'too high' single point outlier and what is still a physically reasonable one.

I have analysed the extended range ENS precipitation fields (24-hour totals), which has a similar resolution and does not suffer from rainbombs, in order to see how big outlier single grid point values are still physically possible. And in fact there are rather surprisingly high single value cases. 

As a good assumption, one can use the 50 mm as a generic threshold though. So, taking all the neighbouring grid points (in the original ERA5 grid) and if the value is higher by at least 50 mm than all the neighbours, then simply cut this outlier to the maximum value amongst the neighbours. This would be done on the 24-hour fields, for all the days and for all the grid points, as I have been working on those.

This 50 mm, I have found to be working quite well for all value ranges. For the small values, line outlier points when all neighbours are below 0.1 mm (so basically a single point precip), the highest 'real' value of this kind I saw in the extended range ENS was ~30 mm. For the <1 mm case, the highest I have found was ~60 mm, for <10mm it was ~100mm, for <50mm it was 120-150mm, for <200mm it was 250-300mm and so on. So, as you can see, the 50mm is a good assumption for the 24-hour totals.

Then the next question is how to correct the original hourly values based on the 24-hour, in case you wanted to use 1-3-6-12-hourly values, not just 24-hourly. I have not actually done this, but as a simple approach one could replace all the hourly values by the values at the neighbour which had the maximum in the 24-hour data (in case of an outlier point). Alternatively, one should find the equivalent cut-off threshold for the other accumulation periods (1-, 3-, 6-, 12-hour). I do not know at all, how they translate from the 50mm / 24-hour.

So, as a summary, I suggest you do a correction on the 24-hour values by cutting off outliers above 50 mm and then applying the correction to the hourly data (if needed) as above. This will remove most of the rain bombs (although not all as some smaller than 50mm outliers will of course still be rain bombs) and probably remove only very few real outlier cases which are not rain bombs.

Let me know if this makes sense to you and how you are getting on with it and also what the corrected ERA5 result looks like in your original comparison.

Cheers, Ervin

Hi Jessica,

So we don't have a method to correct the monthly data. Ervin developed his method on 24 hour data, so that would be the best method to use. There is some documentation on calculating 24 hourly data:

ERA5: How to calculate daily total precipitation

Let us know if you have any questions/problems and let us know how you get on.

Best wishes,

Paul

Hi Ervin and Paul

Thank you for this detailed methodology. I will give it a try and let you know how I get on.

Kind regards

Jessica

Hi Jessica,

There's a CDS application that calculates daily values of ERA5 and ERA5-Land data, which you might find useful:

https://cds.climate.copernicus.eu/cdsapp#!/software/app-c3s-daily-era5-statistics?tab=app

You can use this web page to calculate the daily data or you can use the CDS API, which is particularly useful to produce the daily data for longer time periods. There is a forum discussion of this:

Retrieve daily ERA5/ERA5-Land data using the CDS API

The advantage of using this CDS application, over the scripts I mentioned before, is that you only need to download the results, the daily data, rather than the hourly data.

Hope this is useful.

Best wishes,

Paul

Hi Paul

Thank you for sending this through. It would certainly be much better to download daily data, rather than hourly data, for my purposes.

If I were to use the CDS application to download the daily data, in terms of applying Ervin's method, is it correct to select the Frequency as 1-hourly (instead of 3- or 6-hourly)?

Many thanks

Jessica

Hi Jessica,

Ervin developed his method on 24 hour data i.e. on 24 hour accumulations. In ERA5, in order to produce 24 hour accumulations you have to sum up the hourly accumulations. Therefore, you should choose the 1-hourly frequency.

Once you have the daily data, take a close look at the units. Ervin's method is based on mm/day, so if the daily data has different units, you would need to convert them (or convert Ervin's method).

Best wishes,

Paul

Hi Paul

Great, thank you again for all your assistance.

Kind regards

Jessica