IPF Scraper ASF
Related Github Repos and tickets
Job Runtime
Objective
Finds and fills IPF version for acquisitions missing it from ASF.
How to set up the inputs
Job Inputs:
Acquisition ID
Acquisition metadata
Acquisition ES index name
Acquisition ES dataset type
Endpoint: for this use case, it is set to ASF
Path to
datasets.json
file
CI Integration (Jenkins)
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker :
docker rmi <acquisition scraper docker image id>
. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.If you need to port this container over to another cluster, for e.g. from B to C cluster
HySDS-io and Jobspec-io
hysds-io.json.ipf-scraper-asf
{
"submission_type":"iteration",
"label": "IPF ASF Scraper for given acquisition",
"params" : [
{
"name": "acq_id",
"from": "dataset_jpath:_id"
},
{
"name": "acq_met",
"from": "dataset_jpath:_source.metadata"
},
{
"name": "index",
"from": "dataset_jpath:_index"
},
{
"name": "dataset_type",
"from": "dataset_jpath:_type"
},
{
"name": "endpoint",
"from": "value",
"value": "asf"
},
{
"name": "ds_cfg",
"from": "value",
"value": "datasets.json"
}
]
}
job-spec.json.ipf-scraper-asf
{
"command": "python /home/ops/verdi/ops/scihub_acquisition_scraper/ipf_scrape/ipf_version.py",
"imported_worker_files": {
"/home/ops/.netrc": "/home/ops/.netrc"
},
"recommended-queues": [
"ipf-scraper-asf"
],
"disk_usage":"10GB",
"soft_time_limit": 480,
"time_limit": 960,
"params" : [
{
"name": "acq_id",
"destination": "context"
},
{
"name": "acq_met",
"destination": "context"
},
{
"name": "index",
"destination": "context"
},
{
"name": "dataset_type",
"destination": "context"
},
{
"name": "endpoint",
"destination": "context"
},
{
"name": "ds_cfg",
"destination": "positional"
}
]
}
Job Outputs
Main file that gets executed is ipf_scrape/ipf_version.py>
<steps>
Output directory structure
<screen shot>
Output structure of merged/
STILL TODO: