Download CDS data directly to AWS S3 cloud file using fsspec on python

,

Hello, I need to download a netcdf file from the cds servers directly to an AWS S3 storage location.

To do that I would usually have the following code:
import fsspec
import cdsapi

ct = cdsapi.Client(quiet=True)
file_to_download_path="simplecache::s3://"+aws_bucket_name+'/'+global_climate_monthly_aws_s3_prefix_nc + 'lat%s_%s_%s.nc' % (lat, year, m)

with fsspec.open(file_to_download_path, mode='wb') as f:
                  ----do something that writes to file f


when I try to use the following command:

ct.retrieve(dataset, kwargs, f) it doesn't work because ct.retrieve expects a file name, not a file Instance. I need to open my file with fsspec.open otherwise it doesn't work to write directly to the AWS S3 cloud.


Any idea how I can have the ct.retrieve command write directly to an AWS S3 file.

I am desperately trying to avoid having to write the file locally and then upload it.

Thank you



         


It’s not that easy.
You have to rewrite the parts of CDSAPI that interact with the filesystem to add the option to write to S3 with boto3.

I tried to do it in the last few weeks, because I thought it was going to be as easy as with the ecmwf web api python client where I was able to get it, but the reality is something else.
While digging through the code of cdsapi I discover it is calling classes from other modules (datapi and others), together with awful deprecation patches, so I would need to patch also these libraries. I decided it was not worth the effort.

To be fair the current status of the python clients that ECMWF “maintains” (web api, cdsapi, opendata) is so fragmented and incosistent that they should consider reorganizing it. For the moment it’s not worth patching it up to write to S3.