Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

This is a task list outlining the recommended items Operations Engineers look at on a daily basis to ensure ARIA operations is running nominally. Operations support is limited to the standard work day unless there is a predetermined agreement stating otherwise.

Daily Items

  1. Check that services are up and running

    1. Confirm Mozart, Tosca, and ARIA Products pages are all accessible.

    2. Confirm jobs are being processed by reviewing the job status in Mozart and the queue status in RabbitMQ

    3. Confirm there are no stale queues in RabbitMQ or stale jobs in Mozart.

  2. Review Slack alert messages

    1. Resolve the alerts defined in the messages.

  3. Review failed jobs

    1. Investigate cause of failure. Resolve if possible, or contact relevant PGE developer for assistance.

  4. Generate product accountability reports

    1. Generate the AOI reports over the recently-processed AOI’s to assess status of processing campaigns.

  5. Reporting

    1. Notify customers of processing updates

    2. Update any appropriate Jira tickets

  6. Review AWS

    1. Ensure there are no runaway EC2 instances in ASG

      1. terminate stale EC2 instances

    2. Verify that the AWS Billing Daily Cost View is at expected levels

Weekly Items

  1. Reduce storage costs by purging SLCs from S3

    1. Assess AOIs that are end-dating soon

    2. Purge SLS in AOI’s region

  2. Clean up trigger rules in Tosca

    1. Delete any trigger rules for AOIs that have finished

    2. Deactivate trigger rules if you want to reference parameters in the future

  • No labels