Slicing, changing attributes, and computing trends on the retrieved datasets

Dear CDS team,


where can I find more information on the <class 'cdsworkflows.remote.Remote'>? I got a variable of this class when retrieving data using the CDS Toolbox.


Several of the specific questions:

1) suppose I want to calculate a mean climate value for all the January's, Februaries ... in the dataset. It is easy to do using the slicing, but slicing is not defined for the class Remote.

I can, of course, try to set my retrieval process to download only Januaries for all my years, then Februaries. Then compute the statistics and merge resulting variables. But it is a slower way.


2) After applying some mathematical operations, the attribute "unit" has changed. It is used later by the plotting routine, cdstoolbox.chart.line, to name the y-axis. How can I change the "unit" for the Remote class or how can I specify the y-axis label in the chart.line function?


3) I would like to calculate a trend over the time dimension for my dataset( time=24, lat=720, lon=1440). After an hour, I had to stop the program because it was still running. Does the trend function has a limit for the input data size?

This is related to the first question of slicing. I tried specifying "area" in the data-request, but it doesn't crop my dataset.

data_trend = cdstoolbox.stats.trend(detrend, dim='time')


Kind regards,

Alex

Dear Alex,

The approach within the Toolbox is to use the tools made available to manipulate remote objects. So it is more important to understand the tools than the actual remote class. 

To address you specific questions:

1) ct.climate.climatology_mean should perform what you want to do. It is documented here: https://cds.climate.copernicus.eu/toolbox/doc/climate.html#cdstoolbox.climate.climatology_mean

By default it will return the monthly climatology over all the years your provide, the output will have a coordinate month instead of the 'time' coordinate. This example might be able to help you further: https://cds.climate.copernicus.eu/toolbox-editor/examples/12-calculate-climatologies


2) Dropping the units after some mathematical operation is part of the normal behaviour as the operation does no necessarily maintian teh unit. However you can use ct.cdm.update_attributes to update the unit after the operation. Here is an example of how to use update_attributes for demonstration purpose:

import cdstoolbox as ct

@ct.application(title=‘Hello World!’)
@ct.output.download()
def application():

data = ct.catalogue.retrieve(
    'reanalysis-era5-single-levels',
    {
        'variable': '2m_temperature',
        'product_type': 'reanalysis',
        'year': '2017',
        'month': '01',
        'day': '01',
        'time': '12:00'
    }
)

data = data * 2

data = ct.cdm.update_attributes(data, attrs={'units': 'K'})

return data</pre>


3) Regarding the 'area' keyword it might not work with all the datasets. Which data are you downloading? There should not be any limitation specific to the trend tool. I suggest you try to make tests with less data first. If 'area' does not work you could try 'grid' which allows you to set the resolution of the data you download. Here is an example of retrieve using 'grid':

data = ct.catalogue.retrieve(
        'reanalysis-era5-single-levels',
        {
            'variable': '2m_temperature',
            'grid': ['3', '3'],
            'product_type': 'reanalysis',
            'year': ['2008',],
            'month': ['01',],
            'day': ['01',]
            'time': ['00:00', '06:00', '12:00', '18:00'],
        }
    )

Try running your trend on less data and let me know how it goes.

Dear Vivien,


thank you for a swift reply.

I missed the climate section, thank you for hinting. It has the right functionality. Changing the attributes works fine. I have also figured out how to change the yaxis labels using the  "layout_kwargs" dictionary. So that is covered.

Example 12 (calculate_climatologies) shows the spread using error_y=clima_std . Is there a way to mimic matplotlib.pyplot.fill_between function? I found that it is possible to use function "add_trace" in the plotly module. But I haven't found a way to use this function.


Trend works after I cropped my input data using cube.select function. The resulting file has 4 variables (intercept, intercept_std, slope, slope_std) and lats-lons. Can you please clarify the dimension of the slope parameter: units: 1000000000 kg.s-4?


import cdstoolbox as ct

@ct.application(title='Hello World!')
@ct.output.download()
@ct.output.figure()
def application():

request=['reanalysis-era5-single-levels-monthly-means',
{
'product_type':'monthly_averaged_reanalysis',
'variable':'surface_thermal_radiation_downwards',
'time':'00:00',
'month':'01',
'year':['2015','2016']
}]

dd = ct.catalogue.retrieve(*request)

extent = {'Europe':[-11,35,30,50]} #min_lon, max_lon, min_lat, max_lat
dd1 = ct.cube.select(dd, extent=extent['Europe'])

data = dd1/86400
data = ct.cdm.update_attributes(data, attrs={'units': 'W*m^-2'})

amean = ct.cube.average(data,dim='time')
detrend = data-amean

trend_test = ct.stats.trend(detrend, dim='time')

fig = ct.cdsplot.geomap(trend_test[1])

return detrend, fig




Dear Alex,

I am glad the trend calculation worked in the end.

Regarding the unit of the trend slope you can specify the unit you expect as the slope unit with the slope_units keyword argument as in ct.stats.trend(data, slope_units='K year-1').

I can't tell you from the top of my head how the default slope unit is defined for your specific variable.

Let me know if specifying the slope unit works for you.

Dear Vivien,


Example 12 (calculate_climatologies) shows the spread using error_y=clima_std . I found that it is possible to use function "add_trace" in the plotly module. But I haven't found a way to use this function in the Toolbox.

Is there a way to mimic matplotlib.pyplot.fill_between function (like on the example)?


Best regards,

Alex


Hi Alex,

The code below should allow you to make a similar plot to what you want to do. The key parameters here is 'fill': 'tonexty'.

Note that I am not using the "Start" input here and sorry for the horrendous colours.

You can find more information here https://plot.ly/python/reference/#scatter to set the parameters of your Plotly scatter plot.


import cdstoolbox as ct

layout = {
‘input_ncols’: 3,
‘output_align’: ‘bottom’
}

variables = {
‘Near-Surface Air Temperature’: ‘2m_temperature’,
‘Eastward Near-Surface Wind’: ‘10m_u_component_of_wind’,
‘Westward Near-Surface Wind’: ‘10m_v_component_of_wind’,
‘Sea Level Pressure’: ‘mean_sea_level_pressure’,
‘Sea Surface Temperature’: ‘sea_surface_temperature’,
}

@ct.application(title=‘Calculate climatologies’, layout=layout)
@ct.input.dropdown(‘var’, label=‘Variable’, values=variables.keys())
@ct.input.dropdown(‘freq’, label=‘Frequency’, default=‘month’, values=[‘dayofyear’, ‘weekofyear’, ‘month’], link=True,
help=‘Start values will change accordingly.’)
@ct.input.dropdown(‘start’, label=‘Start’, when=‘dayofyear’, default=270, values=range(1, 367))
@ct.input.dropdown(‘start’, label=‘Start’, when=‘weekofyear’, default=46, values=range(1, 54))
@ct.input.dropdown(‘start’, label=‘Start’, when=‘month’, default=10, values=range(1, 13))
@ct.output.livefigure()
def compute_climatology(var, freq, start):
“”"
Application main steps:

- retrieve a variable over a defined time range
- select a location
- compute the monthly/daily/weekly climatology and standard deviation
- show the result as a timeseries on an interactive chart

"""

data = ct.catalogue.retrieve(
    'reanalysis-era5-single-levels',
    {
        'variable': variables[var],
        'grid': ['3', '3'],
        'product_type': 'reanalysis',
        'year': [
            '2008', '2009', '2010',
            '2011', '2012', '2013',
            '2014', '2015', '2016',
            '2017'
        ],
        'month': [
            '01', '02', '03', '04', '05', '06',
            '07', '08', '09', '10', '11', '12'
        ],
        'day': [
            '01', '02', '03', '04', '05', '06',
            '07', '08', '09', '10', '11', '12',
            '13', '14', '15', '16', '17', '18',
            '19', '20', '21', '22', '23', '24',
            '25', '26', '27', '28', '29', '30',
            '31'
        ],
        'time': ['00:00', '06:00', '12:00', '18:00'],
    }
)

data_location = ct.geo.extract_point(data, lon=-1, lat=51.5)
clima_mean = ct.climate.climatology_mean(data_location, frequency=freq)
clima_std = ct.climate.climatology_std(data_location, frequency=freq)

print(clima_std)
clima_minus_std = ct.units.convert_units(ct.operator.sub(clima_mean, clima_std), target_units="K @ 273.15")
clima_plus_std = ct.units.convert_units(ct.operator.add(clima_mean, clima_std), target_units="K @ 273.15")

# Define overall plot layout
layout_dict = {
    'title': 'Climatology mean +/- standard deviation',
    'titlefont': {'size': 15},
    'yaxis': {
        'title': 'Near-Surface Temperature Climatology (°C)',
        'hoverformat': '.1f',
        'titlefont': {'size': 13},
    },
    'xaxis': {'title': 'Month'},
}

# Define single scatter plot properties
minus_scatter_dict = {
    'name': 'Climatology minus std',
    'marker': {'color': 'orange'},
}
fig = ct.chart.line(
    clima_minus_std,
    scatter_dict=minus_scatter_dict
)

# Define single scatter plot properties
mean_scatter_dict = {
    'name': 'Climatology mean',
    'fill': 'tonexty',
    'fillcolor': 'aquamarine',
    'marker': {'color': 'green'}
}
fig = ct.chart.line(
    clima_mean,
    fig=fig,
    scatter_dict= mean_scatter_dict
)

# Define single scatter plot properties
plus_scatter_dict = {
    'name': 'Climatology plus std',
    'fill': 'tonexty',
    'fillcolor': 'aquamarine',
    'marker': {'color': 'orange'}
}
fig = ct.chart.line(
    clima_plus_std,
    fig=fig,
    scatter_dict= plus_scatter_dict,
    layout_dict=layout_dict
)

return fig</pre>