Related Github Repos and tickets
Tickets:
Job Runtime
Depends on how many SLCs are being processed
1+ hours for 8 SLCs
GNU Parallel
c5d.9xlarge (36 vCPU, 72 GiB) | c5.24xlarge (96 vCPU, 192 GiB) | |
1 year (~30 SLCS, 4 bursts) | 7 hrs, 24 mins, 46 secs | 4 hrs, 38 mins, 33 secs |
2 year (~60 SLCS, 4 bursts) | 13 hrs, 37 mins, 39 secs | 8 hrs, 16 min, 46 secs |
Multiprocessing
c5d.9xlarge (36 vCPU, 72 GiB) | c5.24xlarge (96 vCPU, 192 GiB) | |
1 year (~30 SLCS, 4 bursts) | 6 hrs, 43 mins, 27 secs | 4 hrs, 19 mins, 30 secs |
2 year (~60 SLCS, 4 bursts) | 12 hrs, 58 mins, 32 secs | 8 hrs, 30 mins, 43 secs |
3 year (~90 SLCS, 4 bursts) | 18 hrs, 14 mins, 44 secs | 10 hrs, 56 mins, 5 secs |
Objective
Creating a stack of SLCs
Only within the same track (over a time period)
Prerequisite to STAMPS processing
How to set up the inputs
Facets to get SLC inputs
region (ex. Hawaii) (optional)
track_number
ortrackNumber
(depends)datatype:
SLC
If SLCs track number do not match, then it would throw this error in the job:
raise Exception('Could not determine a suitable burst offset')
There must only be one track in your SLC inputs
correct facet SLC inputs incorrect facet SLC inputs
Job Inputs:
Bbox (*required)
min_lat
max_lat
min_lon
max_lon
CI Integration (Jenkins)
Link: http://b-ci.grfn.hysds.io:8080/job/ops-bcluster_container-builder_aria-jpl_topsstack_master/
WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job:
docker rmi <topsStack docker image id>
HySDS-io and Jobspec-io
hysds-io.json.topsstack
{ "label": "topsStack Processor", "submission_type": "individual", "allowed_accounts": [ "ops" ], "action-type": "both", "params": [ { "name": "min_lat", "from": "submitter", "type": "number", "optional": false }, { "name": "max_lat", "from": "submitter", "type": "number", "optional": false }, { "name": "min_lon", "from": "submitter", "type": "number", "optional": false }, { "name": "max_lon", "from": "submitter", "type": "number", "optional": false }, { "name":"localize_products", "from":"dataset_jpath:", "type":"text", "lambda" : "lambda met: get_partial_products(met['_id'],get_best_url(met['_source']['urls']),[met['_id']+'.zip'])" } ] }
job-spec.json.topsstack
{ "recommended_queues": ["jjacob_stack"], "command": "/home/ops/verdi/ops/topsstack/run_stack.sh", "imported_worker_files": { "/home/ops/.netrc": "/home/ops/.netrc", "/home/ops/.aws": "/home/ops/.aws" }, "soft_time_limit": 10800, "time_limit": 18000, "disk_usage": "100GB", "params": [ { "name": "min_lat", "destination": "context" }, { "name": "max_lat", "destination": "context" }, { "name": "min_lon", "destination": "context" }, { "name": "max_lon", "destination": "context" }, { "name":"localize_products", "destination": "localize" } ] }
Job Outputs
Main file that gets executed is run_stack.sh
Copies all SLCs
.zip
files tozip/
sub-directoryruns
get_bbox.py
and exports 8 coordinates as inputs for the science coderead MINLAT MAXLAT MINLON MAXLON MINLAT_LO MAXLAT_HI MINLON_LO MAXLON_HI <<< $TOKENS
Runs 10 steps to complete the stack processor
run.py -i ./run_files/run_1_unpack_slc_topo_master -p 8
run.py -i ./run_files/run_2_average_baseline -p 8
run.py -i ./run_files/run_3_extract_burst_overlaps -p 8
run.py -i ./run_files/run_4_overlap_geo2rdr_resample -p 8
run.py -i ./run_files/run_5_pairs_misreg -p 8
run.py -i ./run_files/run_6_timeseries_misreg -p 8
run.py -i ./run_files/run_7_geo2rdr_resample -p 8
run.py -i ./run_files/run_8_extract_stack_valid_region -p 8
run.py -i ./run_files/run_9_merge -p 8
run.py -i ./run_files/run_10_grid_baseline -p 8
Output directory structure
Output structure of merged/
merged/ baselines/ 20190506/ 20190518/ 20190530/ 20190530 20190530.full.vrt 20190530.vrt 20190530.xml geom_master/ *.rdr.aux.xml *.rdr.full *.rdr.full.aux.xml *.rdr.full.vrt *.rdr.full.xml SLC/ 20190506/ 20190518/ 20190530/ 20190530.slc.full 20190530.slc.full.aux.xml 20190530.slc.full.vrt 20190530.slc.full.xml 20190530.slc.hdr
Datasets.json entry
file located in
.sds/files/datasets.json
{ "ipath": "ariamh::data/STACK", "match_pattern": "/(?P<id>coregistered_slcs-(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})(?P<time>\\d{6}).+)$", "alt_match_pattern": null, "extractor": null, "level": "NA", "type": "stack", "publish": { "s3-profile-name": "default", "location": "s3://s3-us-west-2.amazonaws.com:80/##BUCKET##/datasets/{type}/{version}/{year}/{month}/{day}/{id}", "urls": [ "http://##WEBDAV_URL##/datasets/{type}/{version}/{year}/{month}/{day}/{id}", "s3://##S3_URL##:80/##BUCKET##/datasets/{type}/{version}/{year}/{month}/{day}/{id}" ] }, "browse": { "location": "davs://##WEBDAV_USER##@##WEBDAV USER##/browse/{type}/{version}/{year}/{month}/{day}/{id}", "urls": [ "https://##WEBDAV##/browse/{type}/{version}/{year}/{month}/{day}/{id}" ], } }
Running on ASG (Auto Scaling Group)
Currently using c5d.9xlarge
Not enough CPU
Takes 11.5 Hrs to run 30 scenes
May need to upgrade to c5d.18xlarge or i-instances
STILL TODO:
Integrate Sang-Ho/Jungkyo GNU parallel
0 Comments