Ingest missing acquisitions for an AOI
Related Github Repos and tickets
https://aria.atlassian.net/browse/ARIA-29
Job Runtime
Objective
How to set up the inputs
Job Inputs:
AOI Name: name of the area of interest.
Datasets config file: The
datasets.json
file available in the work directoryStart Time: The start time of the scraping temporal span
End Time: The end time of the scraping temporal span
Polygon: The spatial extent of the area
CI Integration (Jenkins)
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker :
docker rmi <acquisition scraper docker image id>
. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.If you need to port this container over to another cluster, for e.g. from B to C cluster
HySDS-io and Jobspec-io
hysds-io.json.acquisition_ingest-aoi
{
"submission_type":"iteration",
"params" : [
{
"name": "aoi_name",
"from": "dataset_jpath:_id"
},
{
"name": "ds_cfg",
"from": "value",
"value": "datasets.json"
},
{
"name": "starttime",
"from": "dataset_jpath:_source.starttime"
},
{
"name": "endtime",
"from": "dataset_jpath:_source.endtime"
},
{
"name": "polygon_flag",
"from": "value",
"value": "--polygon"
},
{
"name": "polygon",
"from": "dataset_jpath:_source.location",
"lambda": "lambda x: __import__('json').dumps(x).replace(' ','')"
},
{
"name": "ingest_flag",
"from": "value",
"value": "--ingest"
},
{
"name": "purpose_flag",
"from": "value",
"value": "--purpose"
},
{
"name": "purpose",
"from": "value",
"value": "aoi_scrape"
},
{
"name": "report_flag",
"from": "value",
"value": "--report"
}
]
}
job-spec.json.acquisition_ingest-aoi
{
"command": "/home/ops/verdi/ops/scihub_acquisition_scraper/acquisition_ingest/scrape_apihub_opensearch.py",
"imported_worker_files": {
"/home/ops/.netrc": "/home/ops/.netrc"
},
"required-queues": [
"factotum-job_worker-apihub_scraper_throttled"
],
"disk_usage":"10GB",
"soft_time_limit": 3300,
"time_limit": 3600,
"params" : [
{
"name": "aoi_name",
"destination": "context"
},
{
"name": "ds_cfg",
"destination": "positional"
},
{
"name": "starttime",
"destination": "positional"
},
{
"name": "endtime",
"destinations": "positional"
},
{
"name": "polygon_flag",
"destination": "positional"
},
{
"name": "polygon",
"destination": "positional"
},
{
"name": "ingest_flag",
"destination": "positional"
},
{
"name": "purpose_flag",
"destination": "positional"
},
{
"name": "purpose",
"destination": "positional"
},
{
"name": "report_flag",
"destination": "positional"
}
]
}
Job Outputs
Main file that gets executed is acquisition_ingest/scrape_apihub_opensearch.py
This script is invoked with positional arguments:
python acquisition_ingest/scrape_apihub_opensearch.py datasets.json
Output directory structure
<screen shot>
Output structure of merged/
STILL TODO: