...

Related Github Repos and tickets

https://github.com/aria-jpl/topsstack
Tickets:
- https://jira.jpl.nasa.gov/browse/ARIA-42
- Jira Legacy
  server System JIRA
  serverId 88de5227-42b1-365b-8364-d731c8efaf35
  key ARIA-47
- https://jira.jpl.nasa.gov/browse/ARIA-8

Job Runtime

Depends on how many SLCs are being processed
1+ hours for 8 SLCs

GNU Parallel

c5d.9xlarge

(36 vCPU, 72 GiB)

c5.24xlarge

(96 vCPU, 192 GiB)

1 year (~30 SLCS, 4 bursts)

7 hrs, 24 mins, 46 secs

4 hrs, 38 mins, 33 secs

2 year (~60 SLCS, 4 bursts)

13 hrs, 37 mins, 39 secs

8 hrs, 16 min, 46 secs

Multiprocessing

	c5d.9xlarge (36 vCPU, 72 GiB)	c5.24xlarge (96 vCPU, 192 GiB)
1 year (~30 SLCS, 4 bursts)	6 hrs, 43 mins, 27 secs	4 hrs, 19 mins, 30 secs
2 year (~60 SLCS, 4 bursts)	12 hrs, 58 mins, 32 secs	8 hrs, 30 mins, 43 secs
3 year (~90 SLCS, 4 bursts)	18 hrs, 14 mins, 44 secs	10 hrs, 56 mins, 5 secs

Objective

Creating a stack of SLCs

Only within the same track (over a time period)

Prerequisite to STAMPS processing

How to set up the inputs

Facets to get SLC inputs
- region (ex. Hawaii) (optional)
- track_number or trackNumber (depends)
- datatype: SLC

If SLCs track number do not match, then it would throw this error in the job:

Code Block
raise Exception('Could not determine a suitable burst offset')

There must only be one track in your SLC inputs

correct facet SLC inputs incorrect facet SLC inputs

...

Job Inputs:

Bbox (*required)
- min_lat max_lat min_lon max_lon

...

CI Integration (Jenkins)

ops-bcluster_container-builder_aria-jpl_topsstack_master
Link: http://b-ci.grfn.hysds.io:8080/job/ops-bcluster_container-builder_aria-jpl_topsstack_master/
- WARNING: If rebuilding on the same branch (master), make sure to remove docker image so that it reloads when restarting the job: docker rmi <topsStack docker image id>

HySDS-io and Jobspec-io

hysds-io.json.topsstack

Code Block

{
  "label": "topsStack Processor",
  "submission_type": "individual",
  "allowed_accounts": [ "ops" ],
  "action-type":  "both",
  "params": [
    {
      "name": "min_lat",
      "from": "submitter",
      "type": "number",
      "optional": false
    },
    {
      "name": "max_lat",
      "from": "submitter",
      "type": "number",
      "optional": false
    },
    {
      "name": "min_lon",
      "from": "submitter",
      "type": "number",
      "optional": false
    },
    {
      "name": "max_lon",
      "from": "submitter",
      "type": "number",
      "optional": false
    },
    {
      "name":"localize_products",
      "from":"dataset_jpath:",
      "type":"text",
      "lambda" : "lambda met: get_partial_products(met['_id'],get_best_url(met['_source']['urls']),[met['_id']+'.zip'])"
    }
  ]
}

job-spec.json.topsstack

Code Block

{
  "recommended_queues": ["jjacob_stack"],
  "command": "/home/ops/verdi/ops/topsstack/run_stack.sh",
  "imported_worker_files": {
    "/home/ops/.netrc": "/home/ops/.netrc",
    "/home/ops/.aws": "/home/ops/.aws"
  },
  "soft_time_limit": 10800,
  "time_limit": 18000,
  "disk_usage": "100GB",
  "params": [
    {
      "name": "min_lat",
      "destination": "context"
    },
    {
      "name": "max_lat",
      "destination": "context"
    },
    {
      "name": "min_lon",
      "destination": "context"
    },
    {
      "name": "max_lon",
      "destination": "context"
    },
    {
      "name":"localize_products",
      "destination": "localize"
    } 
  ]
}

Job Outputs

Main file that gets executed is run_stack.sh

Copies all SLCs .zip files to zip/ sub-directory
runs get_bbox.py and exports 8 coordinates as inputs for the science code
- read MINLAT MAXLAT MINLON MAXLON MINLAT_LO MAXLAT_HI MINLON_LO MAXLON_HI <<< $TOKENS
Runs 10 steps to complete the stack processor
- run.py -i ./run_files/run_1_unpack_slc_topo_master -p 8
- run.py -i ./run_files/run_2_average_baseline -p 8
- run.py -i ./run_files/run_3_extract_burst_overlaps -p 8
- run.py -i ./run_files/run_4_overlap_geo2rdr_resample -p 8
- run.py -i ./run_files/run_5_pairs_misreg -p 8
- run.py -i ./run_files/run_6_timeseries_misreg -p 8
- run.py -i ./run_files/run_7_geo2rdr_resample -p 8
- run.py -i ./run_files/run_8_extract_stack_valid_region -p 8
- run.py -i ./run_files/run_9_merge -p 8
- run.py -i ./run_files/run_10_grid_baseline -p 8

Output directory structure

...

Output structure of merged/

Code Block

merged/
    baselines/
        20190506/
        20190518/
        20190530/
            20190530
            20190530.full.vrt
            20190530.vrt
            20190530.xml
    geom_master/
	    *.rdr.aux.xml
        *.rdr.full
        *.rdr.full.aux.xml
        *.rdr.full.vrt
        *.rdr.full.xml
    SLC/
        20190506/
        20190518/
        20190530/
            20190530.slc.full
            20190530.slc.full.aux.xml
            20190530.slc.full.vrt
            20190530.slc.full.xml
            20190530.slc.hdr

Datasets.json entry

file located in .sds/files/datasets.json

Code Block

{
  "ipath": "ariamh::data/STACK",
  "match_pattern": "/(?P<id>coregistered_slcs-(?P<year>\\d{4})(?P<month>\\d{2})(?P<day>\\d{2})(?P<time>\\d{6}).+)$",
  "alt_match_pattern": null,
  "extractor": null,
  "level": "NA",
  "type": "stack",
  "publish": {
    "s3-profile-name": "default",
    "location": "s3://s3-us-west-2.amazonaws.com:80/##BUCKET##/datasets/{type}/{version}/{year}/{month}/{day}/{id}",
    "urls": [
      "http://##WEBDAV_URL##/datasets/{type}/{version}/{year}/{month}/{day}/{id}",
      "s3://##S3_URL##:80/##BUCKET##/datasets/{type}/{version}/{year}/{month}/{day}/{id}"
    ]
  },
  "browse": {
    "location": "davs://##WEBDAV_USER##@##WEBDAV USER##/browse/{type}/{version}/{year}/{month}/{day}/{id}",
    "urls": [
      "https://##WEBDAV##/browse/{type}/{version}/{year}/{month}/{day}/{id}"
    ],
  }
}

Running on ASG (Auto Scaling Group)

Currently using c5d.9xlarge
- Not enough CPU
- Takes 11.5 Hrs to run 30 scenes
May need to upgrade to c5d.18xlarge or i-instances

...

STILL TODO:

Integrate Sang-Ho/Jungkyo GNU parallel

Versions Compared

Old Version 15

New Version 16

Key

Related Github Repos and tickets

Job Runtime

GNU Parallel

Multiprocessing

Objective

How to set up the inputs

Job Inputs:

CI Integration (Jenkins)

HySDS-io and Jobspec-io

hysds-io.json.topsstack

job-spec.json.topsstack

Job Outputs

Output directory structure

Output structure of merged/

Datasets.json entry

Running on ASG (Auto Scaling Group)

STILL TODO:

Page Comparison

Versions Compared

Old Version 15

New Version 16

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

Related Github Repos and tickets

Job Runtime

GNU Parallel

Multiprocessing

Objective

How to set up the inputs

Job Inputs:

CI Integration (Jenkins)

HySDS-io and Jobspec-io

hysds-io.json.topsstack

job-spec.json.topsstack

Job Outputs

Output directory structure

Output structure of merged/

Datasets.json entry

Running on ASG (Auto Scaling Group)

STILL TODO: