TopsStack Processor (PGE)
Table of Contents
- 1 Table of Contents
- 2 Related Github Repos and Tickets
- 3 Objective
- 4 How to Set Up the Inputs
- 4.1 Job Inputs
- 5 CI Integration (Jenkins)
- 6 HySDS-IO and Jobspec-IO
- 7 Job Outputs
- 8 Datasets.json Entry
- 9 ASG (Auto Scaling Group) Configurations
- 10 Assessment of Steps
- 11 Job Runtime
- 11.1 GNU Parallel
- 11.2 Multiprocessing
Related Github Repos and Tickets
Github Repo: https://github.com/aria-jpl/topsstack
Versions
Tickets:
Objective
Creating a stack of SLCs
Only within the same track (over a time period)
Prerequisite to STAMPS processing
How to Set Up the Inputs
Facets to get SLC inputs
region (ex. Hawaii) (optional)
track_number
ortrackNumber
(depends)datatype:
SLC
If SLCs track number do not match, then it would throw this error in the job:
raise Exception('Could not determine a suitable burst offset')
There must only be one track in your SLC inputs
correct facet SLC inputs incorrect facet SLC inputs
Job Inputs
Bbox [*required in Multiprocessing and optional in GNU Parallel (MBR)]
min_lat
max_lat
min_lon
max_lon
CI Integration (Jenkins)
Link: http://b-ci.grfn.hysds.io:8080/job/ops-bcluster_container-builder_aria-jpl_topsstack_master/
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job:
docker rmi <topsStack docker image id>
HySDS-IO and Jobspec-IO
hysds-io.json.topsstack
{
"label": "topsStack Processor",
"submission_type": "individual",
"allowed_accounts": [ "ops" ],
"action-type": "both",
"params": [
{
"name": "min_lat",
"from": "submitter",
"type": "number",
"optional": false
},
{
"name": "max_lat",
"from": "submitter",
"type": "number",
"optional": false
},
{
"name": "min_lon",
"from": "submitter",
"type": "number",
"optional": false
},
{
"name": "max_lon",
"from": "submitter",
"type": "number",
"optional": false
},
{
"name":"localize_products",
"from":"dataset_jpath:",
"type":"text",
"lambda" : "lambda met: get_partial_products(met['_id'],get_best_url(met['_source']['urls']),[met['_id']+'.zip'])"
}
]
}
job-spec.json.topsstack
{
"recommended_queues": ["jjacob_stack"],
"command": "/home/ops/verdi/ops/topsstack/run_stack.sh",
"imported_worker_files": {
"/home/ops/.netrc": "/home/ops/.netrc",
"/home/ops/.aws": "/home/ops/.aws"
},
"soft_time_limit": 10800,
"time_limit": 18000,
"disk_usage": "100GB",
"params": [
{
"name": "min_lat",
"destination": "context"
},
{
"name": "max_lat",
"destination": "context"
},
{
"name": "min_lon",
"destination": "context"
},
{
"name": "max_lon",
"destination": "context"
},
{
"name":"localize_products",
"destination": "localize"
}
]
}
Job Outputs
Main file that gets executed is run_stack.sh
Copies all SLCs
.zip
files tozip/
sub-directoryruns
get_bbox.py
and exports 8 coordinates as inputs for the science coderead MINLAT MAXLAT MINLON MAXLON MINLAT_LO MAXLAT_HI MINLON_LO MAXLON_HI <<< $TOKENS
Runs 10 steps to complete the stack processor
run.py -i ./run_files/run_1_unpack_slc_topo_master -p 8
run.py -i ./run_files/run_2_average_baseline -p 8
run.py -i ./run_files/run_3_extract_burst_overlaps -p 8
run.py -i ./run_files/run_4_overlap_geo2rdr_resample -p 8
run.py -i ./run_files/run_5_pairs_misreg -p 8
run.py -i ./run_files/run_6_timeseries_misreg -p 8
run.py -i ./run_files/run_7_geo2rdr_resample -p 8
run.py -i ./run_files/run_8_extract_stack_valid_region -p 8
run.py -i ./run_files/run_9_merge -p 8
run.py -i ./run_files/run_10_grid_baseline -p 8
Output directory structure
Output structure of merged/
Datasets.json Entry
file located in
.sds/files/datasets.json
ASG (Auto Scaling Group) Configurations
EC2 Instance Type | c5d.9xlarge | c5.24xlarge |
Block Devices | /dev/sdc
/dev/sda1
/dev/sdb
| /dev/sdc
/dev/sda1
/dev/sdb
|
Spot Price | 1.728 | 2.57 |
Assessment of Steps
https://docs.google.com/spreadsheets/d/1W2KzAWm8VjceB77jfc9kuQUrXtYDiwSPmqxcHuEoQIg/edit#gid=0
Job Runtime
Depends on how many SLCs are being processed and number of bursts covered by Bbox.
GNU Parallel
c5d.9xlarge (36 vCPU, 72 GiB) | c5.24xlarge (96 vCPU, 192 GiB) | |
1 year (~30 SLCS, 4 bursts) | 7 hrs, 24 mins, 46 secs | 4 hrs, 38 mins, 33 secs |
2 year (~60 SLCS, 4 bursts) | 13 hrs, 37 mins, 39 secs | 8 hrs, 16 min, 46 secs |
Multiprocessing
c5d.9xlarge (36 vCPU, 72 GiB) | c5.24xlarge (96 vCPU, 192 GiB) | |
1 year (~30 SLCS, 4 bursts) | 6 hrs, 43 mins, 27 secs | 4 hrs, 19 mins, 30 secs |
2 year (~60 SLCS, 4 bursts) | 12 hrs, 58 mins, 32 secs | 8 hrs, 30 mins, 43 secs |
3 year (~90 SLCS, 4 bursts) | 18 hrs, 14 mins, 44 secs | 10 hrs, 56 mins, 5 secs |