AOI based Submitter for Acquisition Scraper

 

Related Github Repos and tickets

https://github.com/aria-jpl/scihub_acquisition_scraper/blob/master/acquisition_ingest/AOI_based_acq_submitter.py

https://aria.atlassian.net/browse/ARIA-29

Job Runtime

Objective

This job-aoi-based-acq-submitter submits the `job-acquisition-ingest-scihub that are time-segmented (by data-month)

Passes params to iterations of job-acquisition-ingest-scihub jobs:

  • start and end timestamps: 1 data-month at a time

  • polygon: (use AOI-specific bbox)

  • publish report: enabled

This must be 100% consistent. Not best effort like the hourly and dailies

How to set up the inputs

 



Job Inputs:

  • AOI_name

  • Spatial Extent

  • Start time

  • End time

  • Dataset version for acquisitions (default v2.0)

CI Integration (Jenkins)

  • scihub_acquisition_scraper

  • WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker : docker rmi <acquisition scraper docker image id>. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.

  • If you need to port this container over to another cluster, for e.g. from B to C cluster

HySDS-io and Jobspec-io

hysds-io.json.aoi_based_acq_submitter

{ "submission_type":"iteration", "label": "AOI based submission of acq scraper jobs", "params" : [ { "name": "AOI_name", "from": "dataset_jpath:_id" }, { "name": "spatial_extent", "from": "dataset_jpath:_source.location", "lambda": "lambda x: __import__('json').dumps(x).replace(' ','')" }, { "name": "start_time", "from": "dataset_jpath:_source.starttime" }, { "name": "end_time", "from": "dataset_jpath:_source.endtime" }, { "name": "dataset_version", "from": "value", "value": "v2.0" } ] }

job-spec.json.aoi_based_acq_submitter

{ "command": "python /home/ops/verdi/ops/scihub_acquisition_scraper/acquisition_ingest/AOI_based_acq_submitter.py", "imported_worker_files": { "/home/ops/.netrc": "/home/ops/.netrc" }, "required-queues": [ "factotum-job_worker-small" ], "disk_usage":"4GB", "soft_time_limit": 7200, "time_limit": 7800, "params" : [ { "name": "AOI_name", "destination": "context" }, { "name": "spatial_extent", "destination": "context" }, { "name": "start_time", "destination": "context" }, { "name": "end_time", "destination": "context" }, { "name": "dataset_version", "destination": "context" } ] }



Job Outputs

Main file that gets executed is <script>

  • <steps>

Output directory structure

<screen shot>



Output structure of merged/



STILL TODO: