AOI based Submitter for Acquisition Scraper
Related Github Repos and tickets
https://aria.atlassian.net/browse/ARIA-29
Job Runtime
Objective
This job-aoi-based-acq-submitter submits the `job-acquisition-ingest-scihub that are time-segmented (by data-month)
Passes params to iterations of job-acquisition-ingest-scihub jobs:
start and end timestamps: 1 data-month at a time
polygon: (use AOI-specific bbox)
publish report: enabled
This must be 100% consistent. Not best effort like the hourly and dailies
How to set up the inputs
Job Inputs:
AOI_name
Spatial Extent
Start time
End time
Dataset version for acquisitions (default v2.0)
CI Integration (Jenkins)
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker :
docker rmi <acquisition scraper docker image id>
. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.If you need to port this container over to another cluster, for e.g. from B to C cluster
HySDS-io and Jobspec-io
hysds-io.json.aoi_based_acq_submitter
{
"submission_type":"iteration",
"label": "AOI based submission of acq scraper jobs",
"params" : [
{
"name": "AOI_name",
"from": "dataset_jpath:_id"
},
{
"name": "spatial_extent",
"from": "dataset_jpath:_source.location",
"lambda": "lambda x: __import__('json').dumps(x).replace(' ','')"
},
{
"name": "start_time",
"from": "dataset_jpath:_source.starttime"
},
{
"name": "end_time",
"from": "dataset_jpath:_source.endtime"
},
{
"name": "dataset_version",
"from": "value",
"value": "v2.0"
}
]
}
job-spec.json.aoi_based_acq_submitter
{
"command": "python /home/ops/verdi/ops/scihub_acquisition_scraper/acquisition_ingest/AOI_based_acq_submitter.py",
"imported_worker_files": {
"/home/ops/.netrc": "/home/ops/.netrc"
},
"required-queues": [
"factotum-job_worker-small"
],
"disk_usage":"4GB",
"soft_time_limit": 7200,
"time_limit": 7800,
"params" : [
{
"name": "AOI_name",
"destination": "context"
},
{
"name": "spatial_extent",
"destination": "context"
},
{
"name": "start_time",
"destination": "context"
},
{
"name": "end_time",
"destination": "context"
},
{
"name": "dataset_version",
"destination": "context"
}
]
}
Job Outputs
Main file that gets executed is <script>
<steps>
Output directory structure
<screen shot>
Output structure of merged/
STILL TODO: