Backend BOS Sarcat scraper (stand-alone CLI tool)
https://github.com/aria-jpl/bos_sarcat_scraper
https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/bos_sarcat_scraper/bosart_scrape.py
This script queries BOS and outputs a json of te result set
Inputs
start/end time
OR
Since last ingest time on BOS
BOS expects wkt format for query
...
Instructions: https://github.com/aria-jpl/bos_sarcat_scraper/tree/master/dataset
PGE: bos_ingest
https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/docker/job-spec.json.bos_ingest
Runs hourly on 2-hour
This is the HySDS PGE wrapper for https://github.com/aria-jpl/bos_sarcat_scraper/blob/master/bos_sarcat_scraper/bosart_scrape.py
Avoids ingesting past PLANNED and PREDICTED
Calls ingest from inside the PGE
If forgot to delete the folder, then verdi will try to ingest it if the dataset dir is still there.
Bug if cannot remove dir and verdi
Currently no retries. If job failed due to any reason like ES timeout, then job fails. Currently relying on next scrubber to run. But only get 3-hour sliding window, so about 3 tries.
Still need a daily 5-day window back-filler to run bos ingest.
If manual publishing succeeds or it is existing, then will delete the directory (acq_id)
if it fails, it will let Verdi post-processing take another crack at it
...
Code Block language json { "submission_type": "individual", "component": "tosca", "label": "Ingest acquisitions from Bos SARCAT", "allowed_accounts": [ "ops" ], "params" : [ { "name": "bos_ingest_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"start of bos_ingest_timestamp in format yyyy-mm-ddThh:mm:ss.sssZ" }, { "name": "from_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"start of acquisition time in format yyyy-mm-ddThh:mm:ss.sssZ" }, { "name": "end_time", "from": "submitter", "type": "text", "optional": true, "placeholder":"end of acquisition time in format yyyy-mm-ddThh:mm:ss.sssZ" } ] }
datasets.json
Code Block | ||
---|---|---|
| ||
{
"ipath": "ariamh::data/acquisition-SARCAT",
"match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat)$",
"alt_match_pattern": null,
"extractor": null,
"level": "l1",
"type": "bos-acquisition",
"version": "2.0"
},
{
"ipath": "ariamh::data/acquisition-SARCAT",
"match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat-planned)$",
"alt_match_pattern": null,
"extractor": null,
"level": "l1",
"type": "bos-acquisition",
"version": "2.0"
},
{
"ipath": "ariamh::data/acquisition-SARCAT",
"match_pattern": "/(?P<id>acquisition-(?P<spacecraft>.+?)_(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})T(?P<timestamp>.+?)Z_(?P<track>\\d?.+)_(?P<mode>.+?)-bos_sarcat-predicted)$",
"alt_match_pattern": null,
"extractor": null,
"level": "l1",
"type": "bos-acquisition",
"version": "2.0"
}, |
PGE: scrub_outdated_bos_acqs
Runs daily
Scrubs outdated PLANNED and PREDICTED older than 2 days
...