Related Github Repos and tickets

https://aria.atlassian.net/browse/ARIA-29

Job Runtime

Objective

How to set up the inputs

Job Inputs:

AOI Name: name of the area of interest.
Datasets config file: The datasets.json file available in the work directory
Start Time: The start time of the scraping temporal span
End Time: The end time of the scraping temporal span
Polygon: The spatial extent of the area

CI Integration (Jenkins)

scihub_acquisition_scraper
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job on an already existing worker : docker rmi <acquisition scraper docker image id>. If your job will run on a worker that will be scaled up (isn’t already running) then you don’t need to worry about this.
If you need to port this container over to another cluster, for e.g. from B to C cluster

HySDS-io and Jobspec-io

hysds-io.json.acquisition_ingest-aoi

{
  "submission_type":"iteration",
  "params" : [
    {
        "name": "aoi_name",
        "from": "dataset_jpath:_id"
    },
    {
        "name": "ds_cfg",
        "from": "value",
        "value": "datasets.json"
    },
    {
        "name": "starttime",
        "from": "dataset_jpath:_source.starttime"
    },
    {
        "name": "endtime",
        "from": "dataset_jpath:_source.endtime"
    },
    {
      "name": "polygon_flag",
      "from": "value",
      "value": "--polygon"
    },
    {
        "name": "polygon",
        "from": "dataset_jpath:_source.location",
        "lambda": "lambda x: __import__('json').dumps(x).replace(' ','')"
    },
    {
        "name": "ingest_flag",
        "from": "value",
        "value": "--ingest"
    },
    {
        "name": "purpose_flag",
        "from": "value",
        "value": "--purpose"
    },
    {
        "name": "purpose",
        "from": "value",
        "value": "aoi_scrape"
    },
    {
        "name": "report_flag",
        "from": "value",
        "value": "--report"
    }
  ]
}

job-spec.json.acquisition_ingest-aoi

{
  "command": "/home/ops/verdi/ops/scihub_acquisition_scraper/acquisition_ingest/scrape_apihub_opensearch.py",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc"
  },
  "required-queues": [
    "factotum-job_worker-apihub_scraper_throttled"
  ],
  "disk_usage":"10GB",
  "soft_time_limit": 3300,
  "time_limit": 3600,
  "params" : [
    {
       "name": "aoi_name",
       "destination": "context"
    },
    {
        "name": "ds_cfg",
        "destination": "positional"
    },
    {
        "name": "starttime",
        "destination": "positional"
    },
    {
        "name": "endtime",
        "destinations": "positional"
    },
    {
        "name": "polygon_flag",
        "destination": "positional"
    },
    {
        "name": "polygon",
        "destination": "positional"
    },
    {
        "name": "ingest_flag",
        "destination": "positional"
    },
    {
        "name": "purpose_flag",
        "destination": "positional"
    },
    {
        "name": "purpose",
        "destination": "positional"
    },
    {
        "name": "report_flag",
        "destination": "positional"
    }
  ]
}

Job Outputs

Main file that gets executed is acquisition_ingest/scrape_apihub_opensearch.py

This script is invoked with positional arguments:
python acquisition_ingest/scrape_apihub_opensearch.py datasets.json

Output directory structure

Advanced Rapid Imaging and Analysis

Ingest missing acquisitions for an AOI

Related Github Repos and tickets

Job Runtime

Objective

How to set up the inputs

Job Inputs:

CI Integration (Jenkins)

HySDS-io and Jobspec-io

hysds-io.json.acquisition_ingest-aoi

job-spec.json.acquisition_ingest-aoi

Job Outputs

Output directory structure

Output structure of merged/

STILL TODO: