Ingest missing acquisitions for an AOI

 

Related Github Repos and tickets

 

https://aria.atlassian.net/browse/ARIA-29

Job Runtime

Objective



How to set up the inputs

 



Job Inputs:

  • AOI Name: name of the area of interest.

  • Datasets config file: The datasets.json file available in the work directory

  • Start Time: The start time of the scraping temporal span

  • End Time: The end time of the scraping temporal span

  • Polygon: The spatial extent of the area

CI Integration (Jenkins)

  • scihub_acquisition_scraper

  • WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker : docker rmi <acquisition scraper docker image id>. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.

  • If you need to port this container over to another cluster, for e.g. from B to C cluster

HySDS-io and Jobspec-io

hysds-io.json.acquisition_ingest-aoi

{ "submission_type":"iteration", "params" : [ { "name": "aoi_name", "from": "dataset_jpath:_id" }, { "name": "ds_cfg", "from": "value", "value": "datasets.json" }, { "name": "starttime", "from": "dataset_jpath:_source.starttime" }, { "name": "endtime", "from": "dataset_jpath:_source.endtime" }, { "name": "polygon_flag", "from": "value", "value": "--polygon" }, { "name": "polygon", "from": "dataset_jpath:_source.location", "lambda": "lambda x: __import__('json').dumps(x).replace(' ','')" }, { "name": "ingest_flag", "from": "value", "value": "--ingest" }, { "name": "purpose_flag", "from": "value", "value": "--purpose" }, { "name": "purpose", "from": "value", "value": "aoi_scrape" }, { "name": "report_flag", "from": "value", "value": "--report" } ] }

job-spec.json.acquisition_ingest-aoi

{ "command": "/home/ops/verdi/ops/scihub_acquisition_scraper/acquisition_ingest/scrape_apihub_opensearch.py", "imported_worker_files": { "/home/ops/.netrc": "/home/ops/.netrc" }, "required-queues": [ "factotum-job_worker-apihub_scraper_throttled" ], "disk_usage":"10GB", "soft_time_limit": 3300, "time_limit": 3600, "params" : [ { "name": "aoi_name", "destination": "context" }, { "name": "ds_cfg", "destination": "positional" }, { "name": "starttime", "destination": "positional" }, { "name": "endtime", "destinations": "positional" }, { "name": "polygon_flag", "destination": "positional" }, { "name": "polygon", "destination": "positional" }, { "name": "ingest_flag", "destination": "positional" }, { "name": "purpose_flag", "destination": "positional" }, { "name": "purpose", "destination": "positional" }, { "name": "report_flag", "destination": "positional" } ] }



Job Outputs

Main file that gets executed is acquisition_ingest/scrape_apihub_opensearch.py

  • This script is invoked with positional arguments:
    python acquisition_ingest/scrape_apihub_opensearch.py datasets.json

Output directory structure

<screen shot>

 

Output structure of merged/



STILL TODO: