I have downloaded ERA5 hourly data on single levels and I am referring to the total precipitation (tp) and similarly to evaporation (e).

So, according to the Documentation:

At the same time, when I look at this Documentation, it being mentioned that the data we download are point values. So, let us say I download the total precipitation for the lat-lon 89.75,0.25. What does this value represent? Is it a point value? And if someone would like to estimate the total precipitation for the highlighted yellow grid box, what would be the best practice? To take the average of the values in the 4 corners? Or get their sum?

The ERA5 variables are point measurements, valid at the points that you have sampled (these are in fact interpolated at the server-side from the original native reduced gaussian grid). If you want to estimate tp over an area (grid box) you could either: 1) sample tp at the centre-point of the box and multiply by the box area, or 2) sample tp at the four corners of the box and do a bi-linear interpolation yourself to integrate over the area. I believe the results ought to be the same, because with method (1) the ECMWF server does a bi-linear interpolation from the native grid anyway.

Coming back to this one, as I am not sure I have found the solution yet. So, for example, for a grid, the area in mid-latitudes would be 25*25 = 625 km2 (squared kilometer). Should the tp at the centre-point of the box provide 4 mm/day (which is defined per square meter), would that mean that we get for the area 625*10^6*4 mm/day? That would be some huge amount of water to consider, no? Thank you for your time.

Yes, it will be 4mm depth of water over an area of 25*25 km^2. This comes to 2.5 million m^3 of water. Sounds like a lot, but that's equivalent of 1000 olympic-size swimming pools. For comparison, the Thames has an average discharge of 70 m^3/s, so 2.5 million m^3 is discharged within 10 hours...