Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

AOI Processing Plan

https://docs.google.com/spreadsheets/d/1PH9bOU0jE6bUWqkuf2wCJ_o3Chh-cMu49GKPlNl5M14/edit#gid=0

Group ID allocations and their rabbitmq queues

  • ESI2017-owen-HEC_s2037: standard_product-s1gunw-topsapp-pleiades_s2037

  • NISARST-bekaert-HEC_s2252: standard_product-s1gunw-topsapp-pleiades_s2252

  • CA-HEC_s2310: standard_product-s1gunw-topsapp-pleiades_s2310

PGEs that run on Pleiades job worker singularity

Job Metrics for pipeline

Repo of utils for Pleiades

https://github.com/hysds/hysds-hec-utils

SSH Tunnel from mamba cluster to Pleiades head node

from mamba-factotum, run screen comment, then inside the screen session, ssh with tunnel to tpfe2 head node.

screen

  • screen -ls

  • screen -U -R -D pleiades

  • screen -x pleiades # shared terminal

  • to split screen: ctrl-a and then shift-s

  • to detach screen: ctrl-a and then d

Auto-scaling job-workers singularity via PBS scripts

Run autoscaling for each group id in background mode with nohup (no hangup), with max 140 nodes in total across all group ids
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2037 140 > pbs_auto_scale_up-s2037.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2310 140 > pbs_auto_scale_up-s2310.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2252 140 > pbs_auto_scale_up-s2252.log 2>&1 &

How to stop, flush, and restart production on Pleiades

  1. stop auto-scaling scripts

    1. https://github.com/hysds/hysds-hec-utils/blob/master/pbs_auto_scale_up.sh

  2. revoke job type: job-request-s1gunw-topsapp-local-singularity:ARIA-446_singularity in mozart-figaro that are in running/queued states.

  3. qdel all jobs

    1. https://github.com/hysds/hysds-hec-utils/blob/master/qdel_all.sh

      1. qstat -u esi_sar | awk '{ if ($8 == "R" || $8 == "Q") print "qdel "$1;}'|sh

  4. then nuke all of the work dirs for the three group ids:

    1. /nobackupp12/esi_sar/s2037/worker/2020/11/**

    2. /nobackupp12/esi_sar/s2252/worker/2020/11/**

    3. /nobackupp12/esi_sar/s2310/worker/2020/11/**

  5. retry all failed topsapp jobs / on-demand submit from runconfig-topsapp

  6. start up auto scaling scripts

  • No labels