Dear ECMWF,
I find Copernicus data very important, however the NetCDF files are very difficult to work with. I've been trying different scripts in python and R to extract climate data from NetCDF files into CSV for further analysis, but it seems a very difficult task.
I would like to ask for your help.
I need to extract climate data (precipitation, average temperature, etc.) for European countries using the data provided by Copernicus - "Temperature and precipitation climate impact indicators from 1970 to 2100 derived from European climate projections ". Ideally I would like to have a panel data set, with columns per year, per country, and then separate columns for each climate variable.
I've tried the python script recommended by this page - How to convert NetCDF to CSV
This is the code used:
#this is for reading the .nc in the working folder import glob #this is reaquired ti read the netCDF4 data from netCDF4 import Dataset #required to read and write the csv files import pandas as pd #required for using the array functions import numpy as np
from matplotlib.dates import num2date
data = Dataset(‘prAdjust_tmean.nc’)
This is how data looks - contents
print(data) <class 'netCDF4._netCDF4.Dataset'> root group (NETCDF4 data model, file format HDF5): CDI: Climate Data Interface version 1.8.2 (http://mpimet.mpg.de/cdi) frequency: year CDO: Climate Data Operators version 1.8.2 (http://mpimet.mpg.de/cdo) creation_date: 2020-02-12T15:00:49ZCET+0100 Conventions: CF-1.6 institution_url: www.smhi.se invar_platform_id: - invar_rcm_model_driver: MPI-M-MPI-ESM-LR time_coverage_start: 1971 time_coverage_end: 2000 domain: EUR-11 geospatial_lat_min: 23.942343 geospatial_lat_max: 72.641624 geospatial_lat_resolution: 0.04268074 degree geospatial_lon_min: -35.034023 geospatial_lon_max: 73.937675 geospatial_lon_resolution: 0.009246826 degree geospatial_bounds: - NCO: netCDF Operators version 4.7.7 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco) acknowledgements: This work was performed within Copernicus Climate Change Service - C3S_424_SMHI, https://climate.copernicus.eu/operational-service-water-sector, on behalf of ECMWF and EU. contact: Hydro.fou@smhi.se keywords: precipitation license: Copernicus License V1.2 output_frequency: 30 year average value summary: Calculated as the mean annual values of daily precipitation averaged over a 30 year period. comment: The Climate Data Operators (CDO) software was used for the calculation of climate impact indicators (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf, https://code.mpimet.mpg.de/projects/cdo/embedded/cdo_eca.pdf). history: CDO commands (last cdo command first and separated with ;): timmean; yearmean invar_bc_institution: Swedish Meteorological and Hydrological Institute invar_bc_method: TimescaleBC, Description in deliverable C3S_D424.SMHI.1.3b invar_bc_method_id: TimescaleBC v1.02 invar_bc_observation: EFAS-Meteo, https://ec.europa.eu/jrc/en/publication/eur-scientific-and-technical-research-reports/efas-meteo-european-daily-high-resolution-gridded-meteorological-data-set-1990-2011 invar_bc_observation_id: EFAS-Meteo invar_bc_period: 1990-2018 data_quality: Testing of EURO-CORDEX data performed by ESGF nodes. Additional tests were performed when producing CII and ECVs in C3S_424_SMHI. institution: SMHI project_id: C3S_424_SMHI references: source: The RCM data originate from EURO-CORDEX (Coordinated Downscaling Experiment - European Domain, EUR-11) https://euro-cordex.net/. invar_experiment_id: rcp45 invar_realisation_id: r1i1p1 invar_rcm_model_id: MPI-CSC-REMO2009-v1 variable_name: prAdjust_tmean dimensions(sizes): x(1000), y(950), time(1), bnds(2) variables(dimensions): float32 lon(y,x), float32 lat(y,x), float64 time(time), float64 time_bnds(time,bnds), float32 prAdjust_tmean(time,y,x) groups:
After that I extract the needed variable:
t2m = data.variables['prAdjust_tmean']
Get dimensions assuming 3D: time, latitude, longitude
time_dim, lat_dim, lon_dim = t2m.get_dims() time_var = data.variables[time_dim.name] times = num2date(time_var[:], time_var.units) latitudes = data.variables[lat_dim.name][:] longitudes = data.variables[lon_dim.name][:]output_dir = ‘./’
And the Error:
OverflowError Traceback (most recent call last) <ipython-input-9-69e10e41e621> in <module> 2 time_dim, lat_dim, lon_dim = t2m.get_dims() 3 time_var = data.variables[time_dim.name] ----> 4 times = num2date(time_var[:], time_var.units) 5 latitudes = data.variables[lat_dim.name][:] 6 longitudes = data.variables[lon_dim.name][:] C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in num2date(x, tz) 509 if tz is None: 510 tz = _get_rc_timezone() --> 511 return _from_ordinalf_np_vectorized(x, tz).tolist() 512 513 C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in __call__(self, *args, **kwargs) 2106 vargs.extend([kwargs[_n] for _n in names]) 2107 -> 2108 return self._vectorize_call(func=func, args=vargs) 2109 2110 def _get_ufunc_and_otypes(self, func, args): C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in _vectorize_call(self, func, args) 2190 for a in args] 2191 -> 2192 outputs = ufunc(*inputs) 2193 2194 if ufunc.nout == 1: C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\dates.py in _from_ordinalf(x, tz) 329 330 dt = (np.datetime64(get_epoch()) + --> 331 np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us')) 332 if dt < np.datetime64('0001-01-01') or dt >= np.datetime64('10000-01-01'): 333 raise ValueError(f'Date ordinal {x} converts to {dt} (using ' OverflowError: int too big to convert
And this is the last part of the script:
import os
# Path
path = "/home"
# Join various path components
print(os.path.join(path, "User/Desktop", "file.txt"))
# Path
path = "User/Documents"
# Join various path components
print(os.path.join(path, "/home", "file.txt"))
filename = os.path.join(output_dir, 'table.csv')
print(f'Writing data in tabular form to {filename} (this may take some time)...')
times_grid, latitudes_grid, longitudes_grid = [
x.flatten() for x in np.meshgrid(times, latitudes, longitudes, indexing='ij')]
df = pd.DataFrame({
'time': [t.isoformat() for t in times_grid],
'latitude': latitudes_grid,
'longitude': longitudes_grid,
't2m': t2m[:].flatten()})
df.to_csv(filename, index=False)
print('Done')
Once again, I would like to stress that I need to extract from nc file these specific columns: year (time period), longitude, latitude and the climate variables (temperature, precipitations, etc.). I hope you can help me in this task.
Many thanks for your time and help.
Best regards, Marian