Backend BOS Sarcat scraper (stand-alone CLI tool)

https://github.com/aria-jpl/bos_sarcat_scraper
- https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/bos_sarcat_scraper/bosart_scrape.py
  - This script queries BOS and outputs a json of te result set
  - Inputs
    - start/end time
    - OR
    - Since last ingest time on BOS
  - BOS expects wkt format for query

Front end GUI

Inside bos_sarcat_scraper/facetview/ repo on https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/facetview/facetview-saravail.html
Front end Faceted search is housed on aria-puccini
To edit the template:

$ ssh aria-puccini #use LDAP creds
$ ssh ariaops@localhost
$ vim /usr/local/ariaops/anaconda2/ops/tosca-sar_avail/tosca/templates/facetview-saravail.html
$ supervisorctl restart tosca-sar_avail

For the acquisitions to show up on the GUI, we need to add aliases to GRQ.

Instructions: https://github.com/aria-jpl/bos_sarcat_scraper/tree/master/dataset

PGE: bos_ingest

https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/docker/job-spec.json.bos_ingest
Runs hourly on 2-hour
This is the HySDS PGE wrapper for https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/bos_sarcat_scraper/bosart_scrape.py
Avoids ingesting past PLANNED and PREDICTED
Calls ingest from inside the PGE
- If forgot to delete the folder, then verdi will try to ingest it if the dataset dir is still there.
- Bug if cannot remove dir and verdi
Currently no retries. If job failed due to any reason like ES timeout, then job fails. Currently relying on next scrubber to run. But only get 3-hour sliding window, so about 3 tries.
Still need a daily 5-day window back-filler to run bos ingest.
If manual publishing succeeds or it is existing, then will delete the directory (acq_id)
- if it fails, it will let Verdi post-processing take another crack at it

job-spec

{ "required_queues": ["factotum-job_worker-small"], "container": "container-aria-jpl_bos_sarcat_scraper:master", "command": "/home/ops/verdi/ops/bos_sarcat_scraper/create_acquisitions.sh", "imported_worker_files": { "/home/ops/verdi/etc/datasets.json": "/home/ops/verdi/etc/datasets.json" }, "disk_usage": "1GB", "params": [ { "name": "bos_ingest_time", "destination": "context" }, { "name": "from_time", "destination": "context" }, { "name": "end_time", "destination": "context" } ] }

hysds_io

{ "submission_type": "individual", "component": "tosca", "label": "Ingest acquisitions from Bos SARCAT", "allowed_accounts": [ "ops" ], "params" : [ { "name": "bos_ingest_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"start of bos_ingest_timestamp in format yyyy-mm-ddThh:mm:ss.sssZ" }, { "name": "from_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"start of acquisition time in format yyyy-mm-ddThh:mm:ss.sssZ" }, { "name": "end_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"end of acquisition time in format yyyy-mm-ddThh:mm:ss.sssZ" } ] }

datasets.json

  {
    "ipath": "ariamh::data/acquisition-SARCAT",
    "match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat)$",
    "alt_match_pattern": null,
    "extractor": null,
    "level": "l1",
    "type": "bos-acquisition",
    "version": "2.0"
  },
  {
    "ipath": "ariamh::data/acquisition-SARCAT",
    "match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat-planned)$",
    "alt_match_pattern": null,
    "extractor": null,
    "level": "l1",
    "type": "bos-acquisition",
    "version": "2.0"
  },
  {
    "ipath": "ariamh::data/acquisition-SARCAT",
    "match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat-predicted)$",
    "alt_match_pattern": null,
    "extractor": null,
    "level": "l1",
    "type": "bos-acquisition",
    "version": "2.0"
  },

PGE: scrub_outdated_bos_acqs

https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/docker/job-spec.json.scrub_outdated_bos_acqs
Runs daily
Scrubs outdated PLANNED and PREDICTED older than 2 days

Cron scripts

Crontab settings: https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/crons/crontab-setting.txt

Python scripts:

https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/crons/ingest-cron.py
- 0 0 * * * /home/ops/verdi/bin/python /home/ops/ingest-cron.py --days 2 --tag master > /home/ops/verdi/log/ingest-cron.log 2>&1 0 * * * * /home/ops/verdi/bin/python /home/ops/ingest-cron.py --hours 2 --tag master > /home/ops/verdi/log/ingest-cron.log 2>&1
https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/crons/scrub-cron.py
- 0 0 * * * /home/ops/verdi/bin/python /home/ops/verdi/ops/bos_sarcat_scraper/crons/scrub-cron.py > /home/ops/verdi/scrub-cron.log 2>&1
This runs on b-cluster factotum currently

On-demand ops scripts to catch up

https://github.com/aria-jpl/bos_sarcat_scraper/blob/develop/mass_catchup_script.py

Script to temporal segment scrape jobs.

Flask App Services

ICS
KML
CSV

Other

Location
Log files
Debugging process
Deployment
Watchdogs to check on hourly scraper already in. current checks on bos_ingest_:master
ES on b-cluster
- Alias for sar-availability: acquisition

Advanced Rapid Imaging and Analysis

SAR-Availability Developers Guide (BOS Sarcat)

Backend BOS Sarcat scraper (stand-alone CLI tool)

Front end GUI

PGE: bos_ingest

datasets.json

PGE: scrub_outdated_bos_acqs

Cron scripts

On-demand ops scripts to catch up

Flask App Services

Other