Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Run autoscaling for each group id in background mode with nohup (no hangup), with max 140 nodes in total across all group ids
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2037 140 > pbs_auto_scale_up-s2037.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2310 140 > pbs_auto_scale_up-s2310.log 2>&1 &
esi_sar@tpfe2:~/github/hysds-hec-utils> nohup pbs_auto_scale_up.sh s2252 140 > pbs_auto_scale_up-s2252.log 2>&1 &

note: these commands are wrapped in the following shell script

esi_sar@tpfe2:~/github/hysds-hec-utils> ./all_pbs_auto_scale_up.sh <num_workers> 

Daily purge of older job work dirs

...

  1. stop auto-scaling scripts

    1. https://github.com/hysds/hysds-hec-utils/blob/master/pbs_auto_scale_up.sh

  2. revoke job type: job-request-s1gunw-topsapp-local-singularity:ARIA-446_singularity in mozart-figaro that are in running/queued states.

  3. qdel all jobs

    1. https://github.com/hysds/hysds-hec-utils/blob/master/qdel_all.sh

      1. qstat -u esi_sar | awk '{ if ($8 == "R" || $8 == "Q") print "qdel "$1; }' | sh

  4. then nuke all of the work dirs for the three group ids:

    1. /nobackupp12/esi_sar/s2037/worker/2020/11/**

    2. /nobackupp12/esi_sar/s2252/worker/2020/11/**

    3. /nobackupp12/esi_sar/s2310/worker/2020/11/**

  5. retry all failed topsapp jobs / on-demand submit from runconfig-topsapp

  6. start up auto scaling scripts