AOI Processing Plan
https://docs.google.com/spreadsheets/d/1PH9bOU0jE6bUWqkuf2wCJ_o3Chh-cMu49GKPlNl5M14/edit#gid=0
Group ID allocations and their rabbitmq queues
ESI2017-owen-HEC_s2037: standard_product-s1gunw-topsapp-pleiades_s2037
NISARST-bekaert-HEC_s2252: standard_product-s1gunw-topsapp-pleiades_s2252
CA-HEC_s2310: standard_product-s1gunw-topsapp-pleiades_s2310
PGEs that run on Pleiades job worker singularity
job type: job-request-s1gunw-topsapp-local-singularity:ARIA-446_singularity
job type: job-spyddder-sling-extract-local-asf-singularity:ARIA-446_singularity
job type: job-spyddder-sling-extract-local-scihub-singularity:ARIA-446_singularity
Repo of utils for Pleiades
https://github.com/hysds/hysds-hec-utils
SSH Tunnel from mamba cluster to Pleiades head node
from mamba-factotum, run screen
comment, then inside the screen
session, ssh with tunnel to tpfe2 head node.
screen
screen -ls
screen -U -R -D pleiades
screen -x pleiades # shared terminal
to split screen: ctrl-a and then shift-s
to detach screen: ctrl-a and then d
Auto-scaling job-workers singularity via PBS scripts
Run autoscaling for each group id in background mode with nohup
(no hangup), with max 140 nodes in total across all group idsesi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2037 140 > pbs_auto_scale_up-s2037.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2310 140 > pbs_auto_scale_up-s2310.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2252 140 > pbs_auto_scale_up-s2252.log 2>&1 &
How to stop, flush, and restart production on Pleiades
stop auto-scaling scripts
revoke job type: job-request-s1gunw-topsapp-local-singularity:ARIA-446_singularity in mozart-figaro that are in running/queued states.
qdel all jobs
https://github.com/hysds/hysds-hec-utils/blob/master/qdel_all.sh
qstat -u esi_sar | awk '{ if ($8 == "R" || $8 == "Q") print "qdel "$1;}'|sh
then nuke all of the work dirs for the three group ids:
/nobackupp12/esi_sar/s2037/worker/2020/11/**
/nobackupp12/esi_sar/s2252/worker/2020/11/**
/nobackupp12/esi_sar/s2310/worker/2020/11/**
retry all failed topsapp jobs / on-demand submit from runconfig-topsapp
start up auto scaling scripts
Add Comment