Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • once all acq-lists have been generated, facet on such acq-lists in Tosca

  • then submit the localizer jobs:

    • Action: Standard Product S1-GUNW slc_localizer [develop]

    • Queue: factotum queue is factotum-job_worker-aria-standard_product-slc_localizer (this is an ASG and is now tagged with Charlie with the ASG queue is aria-standard_product-localizerqueue name)

    • asf_ngap_download_queue: factotum-job_worker-slc_sling-asf

      • Note the other queue slc-sling-extract-asf is an ASG

    • esa_download_queue: factotum-job_worker-slc_sling-scihub

      • Note the other queue slc-sling-extract-scihub is an ASG

    • spyddder_sling_extract_version: develop

    • Result: this job will iterate over the SLCs listed in the acq-list and submit a data sling job

      • these sling jobs take acquisition-S1-IW_SLCs as an input and will download the corresponding SLC from ASF (relatively old acquisition) or Scihub (acquisition is less than 2 weeks old) to s3 and register the SLC in the S1-IW_SLC dataset in GRQ

    • Notes:

      • Acquisition lists are in one-to-one correspondence with ifg-cfgs

      • SLCs can be shared among acquisition lists and ifg-cfg's within an AOI. Therefore, #SLCs < # acq-lists = #ifg-cfgs within your AOI. As an example, within an AOI, there were ~700 SLCs for 2300 ifg-cfgs.

      • Say you run the localizer and you see that you have a bunch ifg-cfgs haven’t been created even though most of the sling jobs have been completed successfully. You may have only a few SLCs to download (or much less than the ifg-cfgs that are missing). Check the unique SLCs in the ops report.

      • If you have the proper trigger rules set up and activated, every time a new SLC is slinged and put into the system, then an ifg-cfg is created. This is a helpful trigger rule to have. Currently it is called acqlist_evaluator.

  • if you ever need to download a particular SLC, facet on the corresponding acquisition-S1-IW_SLC (the SLC is a substring of the acquisition id) and submit the following job

    • Action: Data Sling and Extract for {asf, Scihub} [develop]

    • Queue: factotum-job_worker-{large,small}

  • sling jobs have a tendency to fail since certain products are archived in the DAACs

    • retrying/resubmitting the failed jobs a little later will usually complete

...

  • at this point, we have completed the necessary processing to now run topsapp and generate an ifg

  • facet on the ifg-cfgs and run the following job

    • Action: TopsApp PGE in *Standard-Product* Pipeline for S1-GUNW Interferograms [develop]

    • Queue: topsapp jobs take a while and run on expensive machines – therefore, this PGE significantly drives up costs for the pipeline! We have designated queues to tag the jobs with different accounts so customers can pay for these charges.

      • Current Recommended Queues (last updated 3/2021):

        • aria-standard_product-s1gunw-topsapp-NSLCT_Bekaert

        • aria-standard_product-s1gunw-topsapp-Access_Bekaert

        • aria-standard_product-s1gunw-topsapp-Volcano_Lundgren

        • aria-standard_product-s1gunw-topsapp-Rise_Limonadi

        • Note the last token in the above queues indicate the project name but more tags can be seen in the Autoscaling group setup in AWS.

    • dataset_tag: this is a comma-delimited list of tags that will be added to the produced S1-GUNW metadata.dataset_tags field and can be used to facet on the product in the future

      • for the standard product pipeline on AWS, standard_product,aws should always be included in this parameter

    • Result: a S1-GUNW product will be produced

    • Notes on Trigger Rules:

      • General trigger rules with topsApp must be created with care because making a trigger rule that is too lenient can really run up costs. Here are some general rules. For topsApp trigger rule use the following facets:

        • Spatial extent of the AOI

        • The track number associated with the AOI

        • TODO: temporal spans associate with the enumerator

      • Due to the creation of the coseismic pipeline, there are some shared datasets. It is important to use NOT "Coseismic" in the query box to ensure coseismic datasets are ignored. More specific pipelines must ignore the machine tag called s1-gunw-coseismic.

    • Notes on Errors:

      • There are some error types that are worth mentioning as they can arise even if the pipeline has been run correctly. Make sure the errors match exactly to those examples found below as ISCE errors are very, very hard to catch and a slight difference in the error output can mean be the result of totally different sources (note both error examples below mention “burst”):

        • Burst overlap errors like this job - the SLCs (on two different dates) do not have an overlap. This occurs when the metadata used to enumerate the job and create the IFG-CFG was slightly off from what is on the ground and/or the overlap is just not sufficient for ISCE2 to do it’s processing. This means that the IFG-CFG is malformed and should be ignored.

        • DEM download errors like this job - this is likely a transient error and will go away on a re-run. Simply, the DEM was not downloaded successfully from our S3 bucket during processing. If problems persist, please reach out to Nicholas Arenas.

        • Clobber errors like this job - although there are “short circuits” within the topsApp PGE exist, the PGE checks the completed GUNW database. Therefore, if two identical topsApp jobs were called on the same ifg-cfg before either could complete, then we will get these clobber errors. Note the clobber errors will generally not all be identical because it depends what file is uploaded first. However, an easy way to determine if such an error was due to duplication in the operator faceting, facet on a single input ifg-cfg and check the related topsApp jobs. Here is an example of such faceting in figaro.

      • If the errors are beyond the scope of those listed above, the relevant logs will be saved on Tosca using triaging HySDS functionality which is currently running for the topsApp PGE; here is an example of triaged job datasets. Facet on one of the failing ifg-cfg’s and send to current topsApp maintainer (as of March 2021, this is charlie.z.marshak@jpl.nasa.gov).

    • Trigger Rule:

      • Generally, you want to set a trigger rule related to topsApp prior to running the enumerator.

      • Trigger rules that are so narrowly faceted can be hard to create if no existing dataset exists. Frequently, we have some existing trigger rule and simply edit that in the menu. For reference, here is a template (what would be in the final query window of the trigger rule):

...

    • Greylisting

      • This likely will require its own page at some point. Greylisting is important for the following reason: when we deliver products over an AOI, we only do so if all possible GUNWs for a date pair are completed. That means if a GUNW can be completed as completed as outlined by a given acq-list/ifg-cfg, then we cannot deliver all the other GUNWs that have been generated. There are two types of errors that we want to group together that will not be able to finish with topsApp when they crop up for an ifg-cfg that should not hamper our delivery of products:
        a. Exception: Could not determine a suitable burst offset
        b. No swaths contain any burst overlaps ... cannot continue for interferometry applications
        These occur when we have a GUNW that is trying to be produced over water or over an area with extremely low coherence e.g. here.

        • Caution!

          • The two relevant trigger rules are:  standard-product-greylist-failed-gunw-burst-overlap and standard-product-greylist-failed-gunw-swaths-overlap (these correspond to the errors above)

          • The trigger rules for grey listing are in figaro (since they facet on jobs)

          • The trigger rules (those cited above) require specification of a container - if a topsapp job is being modified be sure to update these trigger rules reference the correct container to ensure the trigger rules are correctly invoked - Charlie M. accidentally changed the name of the job-spec.<pge_container_name> exchanging a - with a _ and trigger rules were no longer valid.

        • Facet: On the jobs in figaro using the error codes above.

        • Action: Standard Product S1-GUNW - Greylist S1-GUNW from topsapp job [python3]

        • Result: You will get a greylist id associated with the ifg-cfg which can be identified using the hash-id.

    • Notes on Errors:

      • There are some error types that are worth mentioning as they can arise even if the pipeline has been run correctly. Make sure the errors match exactly to those examples found below as ISCE errors are very, very hard to catch and a slight difference in the error output can mean be the result of totally different sources (note both error examples below mention “burst”):

        • Burst overlap errors like this job - the SLCs (on two different dates) do not have an overlap. This occurs when the metadata used to enumerate the job and create the IFG-CFG was slightly off from what is on the ground and/or the overlap is just not sufficient for ISCE2 to do it’s processing. This means that the IFG-CFG is malformed and should be ignored.

        • DEM download errors like this job - this is likely a transient error and will go away on a re-run. Simply, the DEM was not downloaded successfully from our S3 bucket during processing. If problems persist, please reach out to Nicholas Arenas.

        • Clobber errors like this job - although there are “short circuits” within the topsApp PGE exist, the PGE checks the completed GUNW database. Therefore, if two identical topsApp jobs were called on the same ifg-cfg before either could complete, then we will get these clobber errors. Note the clobber errors will generally not all be identical because it depends what file is uploaded first. However, an easy way to determine if such an error was due to duplication in the operator faceting, facet on a single input ifg-cfg and check the related topsApp jobs. Here is an example of such faceting in figaro.

      • If the errors are beyond the scope of those listed above, the relevant logs will be saved on Tosca using triaging HySDS functionality which is currently running for the topsApp PGE; here is an example of triaged job datasets. Facet on one of the failing ifg-cfg’s and send to current topsApp maintainer (as of March 2021, this is charlie.z.marshak@jpl.nasa.gov).

    • Trigger Rule:

      • Generally, you want to set a trigger rule related to topsApp prior to running the enumerator.

      • Trigger rules that are so narrowly faceted can be hard to create if no existing dataset exists. Frequently, we have some existing trigger rule and simply edit that in the menu. For reference, here is a template (what would be in the final query window of the trigger rule):

Code Block
{
  "filtered": {
    "query": {
      "bool": {
        "must": [
          {
            "term": {
              "dataset.raw": "S1-GUNW-ifg-cfg"
            }
          },
          {
            "query_string": {
              "query": "metadata.track_number:<TRACK NUMBER>",
              "default_operator": "OR"
            }
          }
        ]
      }
    },
    "filter": {
      "geo_shape": {
            "termlocation": {
              "dataset.raw"shape": "S1-GUNW-ifg-cfg"{
            }
"type": "polygon",
         },   "coordinates": [<AOI EXTENT>]
     {       ]
     "query_string": {    }
        }
 "query": "metadata.track_number:<TRACK NUMBER>",     }
    }
       "default_operator": "OR"
            }
          }
        ]
      }
    },
    "filter": {
      "geo_shape": {
        "location": {
          "shape": {
            "type": "polygon",
            "coordinates": [<AOI EXTENT>]
            ]
          }
        }
      }
    }
  }
}

Generate AOI-Tracks product

...

AOI tracks are often too large to be covered by 1 S1-GUNW for a given date-pair

...

once all the ifgs for a specific date-pair are generated an S1-GUNW-AOI_TRACK product is produced

to generate these products, facet on the S1-GUNW products and submit the following job:

...

}
}

Generate AOI-Tracks product

  • AOI tracks are often too large to be covered by 1 S1-GUNW for a given date-pair

  • once all the ifgs for a specific date-pair are generated an S1-GUNW-AOI_TRACK product is produced

  • to generate these products, facet on the S1-GUNW products and submit the following job:

    • Action: Standard Product S1-GUNW - S1-GUNW Completeness Evaluator [develop]

    • Queue: factotum-job_worker-standard_product-completeness_evaluator

    • Result: this job will look at an S1-GUNW and check if it “completes” the track for a given date-pair. If so S1-GUNW-AOI_TRACK product is generated. Otherwise, the evaluator silently completes without producing anything.

  • note that these jobs are automatically submitted by the trigger rule s1gunw-aws-s1gunw-completeness-evaluator and you should not normally need to submit them on-demand

  • once an S1-GUNW-AOI_TRACK product is produced, the S1-GUNW products will be published to ASF and ARIA-products via the following pipeline: <<TODO: delivery pipeline - there are a bunch of PGEs here and for most intents and purposes they work using trigger rules that have been set up>>

  • Checking the Delivery to ASF

    • In addition to using the various ops reports, you can go directly to ASF: https://search.asf.alaska.edu/ and us their search by “list” feature. Copying the GUNW ids into this feature can illustrate the delivery publically! This is generally a good method of “delivering” the final AOI to science customers. Here is an example.

  • Delivery Failures (flavor 1)

  • Purge localized SLCs as done here.

  • Delete trigger rules associated with TopsApp - clutters trigger rules
    • An AOITrack Exists but the GUNW wasn’t delivered. This means that the completeness evaluator succeed but some PGE downstream didn’t. Below is taken verbatim from here. This is part of the “delivery pipeline” which is not documented here, but hopefully will be at some point.

    • Facet: AOITrack datasets that are not delivering

    • Action: Product Delivery of S1-GUNW-AOI_TRACK [develop]

    • Queue: factotum-job_worker-standard_product-completeness_evaluator

    • Result: this job will look at an S1-GUNW and check if it “completes” the track for a given date-pair. If so S1-GUNW-AOI_TRACK product is generated. Otherwise, the evaluator silently completes without producing anything.

  • note that these jobs are automatically submitted by the trigger rule s1gunw-aws-s1gunw-completeness-evaluator and you should not normally need to submit them on-demand

  • once an S1-GUNW-AOI_TRACK product is produced, the S1-GUNW products will be published to ASF and ARIA-products via the following pipeline: TODO: delivery pipeline

  • Checking the Delivery to ASF

    • In addition to using the various ops reports, you can go directly to ASF: https://search.asf.alaska.edu/ and us their search by “list” feature. Copying the GUNW ids into this feature can illustrate the delivery publically! This is generally a good method of “delivering” the final AOI to science customers. Here is an example.

  • Notes on Errors:

    • This is not an error, but a confusing behavior. If any GUNW from an entire date pair is abset, then none of the GUNWs will deliver. So even if you facet on missing GUNWs, this process will complete without error, but not deliver your desired ASF.

Clean up

    • -{small,large}

    • pub_sns_arn: arn:aws:sns:us-east-1:406893895021:ingest-prod-jobs

    • callback_sns_arn: arn:aws:sns:us-west-2:151169893255:aria-torresal-011-daac-cnm-response

    • Result: submits individual product delivery jobs of each ifg in the track

  • Notes on Other Errors:

    • If any GUNW from an entire date pair is absent, then none of the GUNWs will deliver. So even if you facet on missing GUNWs, this process will complete without error, but not create an AOITrack dataset and therefore not deliver your desired ASF.

Generate AOI-Tracks product using Greylist IDs (Completeness via greylist)

Did you have some greylist IDs that were generated on demand? We did! The below action needs to be tested, but we are pretty confident it works

  • Facet: Greylist Ids.

  • Action: Standard Product S1-GUNW - S1-GUNW Completeness Evaluator By GreyList [develop]

  • Result: You will run the completeness evaluator on the new GUNWs.

Clean up

  • Purge localized SLCs as done here.

  • Delete trigger rules associated with TopsApp - clutters trigger rules - or turn them off!

  • Check stray instances.

    • In certain cases, RabbitMQ does not accurately capture all the instances that are running from a given queue in ASG. You can check the queues used for the standard product pipeline in EC2 > Auto Scaling Groups (sidebar) and check the queues there. Then, set the “desired capacity” to 0 in the topmost menu when you click on a given queue. Alternatively, to view all the running instances, AWS console > EC2 > Instances (sidebar) and check the instances that are “running”.

...

Notes

Faceting in Tosca and Figaro

...

  • Ensuring that SLCs for an AOI are downloaded en masse. That is every acquisition list has all its SLCs. Of course, getting all the SLCs on the system is never attainable in practice. However, the more SLCs from an AOI that are downloaded, the more directly and thus faster the topsApp processing can be done. Also the purging can be done more quickly.

  • speed of processing staged SLCs (post enumeration) into GUNWs using topsapp - in other words, ensuring the topsApp jobs are run quickly once the SLCs have been staged so that you are not waiting

    • this is most efficiently done with trigger rules on ifg-cfgs (see the topsapp section above).

  • Purging SLCs that are no longer needed

    • Removing the datasets also purges the SLCs from S3

    • While it is beneficial to purge SLCs that are no longer needed, note figuring which are needed and which are not is complicated and is why its best to get as many of the required SLCs downloaded at once

    • If you have a small number of GUNWs that are missing it’s best to purge the existing SLCs and repeat the pipeline on the related acquisition lists/ifg-cfgs as those required to produce the GUNWs.

Removing Jobs after bad Facets

It is inevitable there will be times

TopsApp Bug Documented (related to intermediate datasets)