Semantic Workflows for Cancer Clinical Omics

In collaboration with the Knight Cancer Institute at the Oregon Health and Science University, we are using Wings workflows to annotate patient sequence variants obtained through clinical DNA sequencing. The diagram below shows a workflow for the annotation of identified genomic variants with: 1) the potential protein consequence resulting from the sequence variant annotated with the use of Bioconductor, 2) previously known mutations found in COSMIC, 3) known sequence variants curated within Ensembl and 4) a manually curated in-house database of variants from previous clinical samples. These biological data sources are continually being updated with new information and with corrections that could affect the patient’s annotations. Thus it is vital that the versions of each data source are captured as provenance records for the annotations of each patient’s data. Reproducibility is also important in order to ensure consistency across patient annotations.

Semantic workflows are crucial for this application:

  • The workflow that the users see has abstract classes for each step, which are automatically specialized to executable codes that use the latest versions. This way, the users interact with the same high-level workflow and do not have to manage version updates.
  • The versions of the different steps have to be consistent, which is captured in semantic constraints in the workflow. For example, all the data sources should be based on the same genomic assembly.
  • The system guides the user by identifying data sources that are consistent with each patient’s data.

The diagram below shows the high-level workflow, followed by the executable workflow indicating all the versions of data sources and codes automatically tracked by the system.

For more details, see: