Calculating anomalies as percentage

Hi

We can use the function climate.anomaly(...) to calculate anomalies:

data_hist_clim = ct.climate.climatology_mean(data_hist)
data_anomalies = ct.climate.anomaly(data_scen, climatology=data_hist_clim)


From looking at the source code I gather it calculates the anomalies by subtracting a reference from the data.

But sometimes (e.g., in case of precipitation) relative anomalies (i.e. a ratio), or anomalies as a percentage of the original, make more sense. What is the best way of doing this? I can of course calculate these myself:

# first mask out very small values in the reference data
data_hist_clim = ct.cube.where(data_hist_clim > 0.001, data_hist_clim)
data_anomalies = 100*(data_scen - data_hist_clim) / data_hist_clim 


Although this seems to work, and appears to give me sensible values, and I can even visualise the result offline, my workflow then fails when I try to visualise these data, e.g. in a LiveMap. Presumably this is because climate.anomaly(...) does something with the dimensions or attributes of the result (data_anomalies) that my simple calculation doesn't, but exactly what I cannot figure out. 

So, what is the best way to calculate relative anomalies?

Many thanks!

Rutger


Hi Rutger,

Thank you for your question.

I think the method you're using for calculating anomalies as a percentage is the best way to do it at the moment. Could you give some more information about the error you get when trying to visualise these data? If you have an example workflow I could play with that would be ideal!

Many thanks,
James

Hi James


The error I is copied below. Although the actual workflow I am working on is more complicated than this, this workflow reproduces the issue: https://cds.climate.copernicus.eu/toolbox-editor/13784/precip_anomalies_example The calculation of anomalies is done in the function calc_anom(...). Literally if I switch the calculation from using climate.anomaly(...) to percentages as outlined above, the script fails in the visualisation step.


Error message:

Traceback (most recent call last):
  File "/opt/cdstoolbox/cdscompute/cdscompute/cdshandlers/services/handler.py", line 49, in handle_request
    result = cached(context.method, proc, context, context.args, context.kwargs)
  File "/opt/cdstoolbox/cdscompute/cdscompute/caching.py", line 108, in cached
    result = proc(context, *context.args, **context.kwargs)
  File "/opt/cdstoolbox/cdscompute/cdscompute/services.py", line 118, in __call__
    return p(*args, **kwargs)
  File "/opt/cdstoolbox/cdscompute/cdscompute/services.py", line 59, in __call__
    return self.proc(context, *args, **kwargs)
  File "/home/cds/cdsservices/services/wms/livemap_legacy.py", line 440, in plot
    land_style, coastline_style)
  File "/home/cds/cdsservices/services/wms/livemap.py", line 196, in plot
    crs=crs, **kwargs)
  File "/opt/cdstoolbox/cdscompute/cdscompute/download.py", line 154, in to_wms
    self._wms = self._to_wms(*args, **kwargs)
  File "/opt/cdstoolbox/cdscompute/cdscompute/download.py", line 221, in _to_wms
    engine=self._engine, **kwargs)
  File "/opt/cdstoolbox/cdscdm/cdscdm/wms/layers.py", line 335, in from_wms_related_data
    return _animated_geojson_from_data(context, dataarray, shape, style=style, **kwargs)
  File "/opt/cdstoolbox/cdscdm/cdscdm/wms/layers.py", line 410, in _animated_geojson_from_data
    dataarray.sel(**{dim_label: fid}).values.astype(float)]
  File "/opt/cdstoolbox/cdscdm/cdscdm/wms/layers.py", line 409, in <listcomp>
    values = [i if np.isfinite(i) else None for i in
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()


Thanks!

Rutger


Hi Rutger,

It looks like you've got both a time and a month dimension in your data, which is confusing the livemap (sorry for the poor error message - this needs to be caught more usefully!).

I've 'fixed' your workflow by selecting just the first month index, but I'll leave it up to you how best to handle the month dimension. Here's a working version of your workflow (the only line I've added are lines 145 and 146): https://cds.climate.copernicus.eu/toolbox-editor/20413/precip_anomalies_example.

Hope that helps!

James

Hi James


Thanks. But this is exactly my point, why is the extra month dimension problematic when I do the calculation myself, but not when I use the standard function climate.anomaly(...)

As far as I can see, the month dimension is created when calculating the climatology over the reference period. This is how my arrays look like (using the names in my example script):

##### data_scen
<xarray.DataArray 'tp' (time: 348, lat: 74, lon: 140)>
dask.array<xarray-tp, shape=(348, 74, 140), dtype=float32, chunksize=(48, 74, 140), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) datetime64[ns] 2071-01-01 2071-02-01 ... 2099-12-01
  * lat      (lat) float64 34.25 34.75 35.25 35.75 ... 69.25 69.75 70.25 70.75
  * lon      (lon) float64 -24.75 -24.25 -23.75 -23.25 ... 43.75 44.25 44.75
data_hist

<xarray.DataArray ‘tp’ (time: 360, lat: 74, lon: 140)>
dask.array<xarray-tp, shape=(360, 74, 140), dtype=float32, chunksize=(48, 74, 140), chunktype=numpy.ndarray>
Coordinates:

  • time (time) datetime64[ns] 1981-01-01 1981-02-01 … 2010-12-01
  • lat (lat) float64 34.25 34.75 35.25 35.75 … 69.25 69.75 70.25 70.75
  • lon (lon) float64 -24.75 -24.25 -23.75 -23.25 … 43.75 44.25 44.75
data_hist_clim

<xarray.DataArray ‘tp_climatology_mean’ (month: 12, lat: 74, lon: 140)>
[124320 values with dtype=float32]
Coordinates:

  • lat (lat) float64 34.25 34.75 35.25 35.75 … 69.25 69.75 70.25 70.75
  • lon (lon) float64 -24.75 -24.25 -23.75 -23.25 … 43.75 44.25 44.75
  • month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
data_anomalies # calculated with climate.anomaly(…)

<xarray.DataArray ‘tp_anomaly’ (time: 348, lat: 74, lon: 140)>
dask.array<xarray-tp_anomaly, shape=(348, 74, 140), dtype=float32, chunksize=(48, 74, 140), chunktype=numpy.ndarray>
Coordinates:

  • lat (lat) float64 34.25 34.75 35.25 35.75 … 69.25 69.75 70.25 70.75
  • lon (lon) float64 -24.75 -24.25 -23.75 -23.25 … 43.75 44.25 44.75
  • time (time) datetime64[ns] 2071-01-01 2071-02-01 … 2099-12-01
    month (time) int64 dask.array<chunksize=(48,), meta=np.ndarray>
    Attributes:
    long_name: Precipitation sum anomaly
    units: m
    cell_methods: time: sum
    standard_name: lwe_thickness_of_precipitation_amount_anomaly
    type: real
data_anomalies # calculated manually as percentage

<xarray.DataArray ‘data’ (time: 348, lat: 74, lon: 140, month: 12)>
dask.array<xarray-data, shape=(348, 74, 140, 12), dtype=float32, chunksize=(48, 74, 140, 12), chunktype=numpy.ndarray>
Coordinates:

  • time (time) datetime64[ns] 2071-01-01 2071-02-01 … 2099-12-01
  • lat (lat) float64 34.25 34.75 35.25 35.75 … 69.25 69.75 70.25 70.75
  • lon (lon) float64 -24.75 -24.25 -23.75 -23.25 … 43.75 44.25 44.75
  • month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
    Attributes:
    units: 1

So what exactly does climate.anomaly(...) do to get a non-starred month dimension, that my manual calculation of percentages doesn't do? 


Thanks

Rutger


Hi Rutger Dankers,

Here's a quick breakdown of what's happening:

  • ct.climate.climatology_mean() takes some data with a time dimension and produces a climatology with a month dimension;
  • ct.climate.anomaly() takes some data with a time dimension and a climatology with just a month dimension, and uses some handy xarray functionality behind the scenes to compare the monthly climatology to each month in your data in order to calculate an anomaly;
  • It's not possible to subtract the climatology from your data because your data still has a time dimension and your climatology just has a month dimension - the dimensionality of your data needs to match to do proper arithmetic.

So while I think it's possible to put together a messy work around for this issue within your workflow, I think what we really need is a relative or percentage argument to ct.climate.anomaly, that simply produces anomalies as percentages straight out of the service. I've raised a ticket to add this argument and it should be included in next week's release (11th Nov) - does that timescale work for you? If you need something quicker, I'm happy to put together a temporary work around.

Apologies that my earlier advice was wrong on this!

Many thanks,
James

Hi James Varndell 


Many thanks for your response. I'm happy to wait for a proper solution, if it can indeed be implemented by next week.


Cheers

Rutger


Hi Rutger,

I've just realised that there is a way to do this in the Toolbox at the moment that's not too fiddly of a workaround - you can use the ct.cube.groupby_operator and ct.cube.groupby_reduce services to directly compare single month values to multi-year monthly time dimensions. I've had a go at implementing this in this workflow: https://cds.climate.copernicus.eu/toolbox-editor/20413/precip_anomalies_example-1 - it seems to work, but the anomaly values are very high - is that to be expected? I might've done something slightly wrong in my calculation but hopefully this example demonstrates how you might be able to make use of these groupby services.

In the meantime, we will still implement a dedicated service for calculating percentage anomalies in the Toolbox.

Thanks again,
James

Great to see there's now a function relative_anomaly(...) to calculate relative anomalies:


data_anomalies = ct.climate.relative_anomaly(data_scen, climatology=data_hist_clim)


James Varndell what are the units coming out of this?


Thanks

Rutger


Hi Rutger Dankers,

The return value from ct.climate.relative_anomaly is the (unitless) ratio of the anomaly to the absolute value - so for example, an increase of 50% will be returned as 0.5. You can convert these values to percentages by simply multiplying the result by 100 in the Toolbox, i.e:

percentage_anomaly = ct.climate.relative_anomaly(data) * 100

Hope that helps!

Kind regards,
James