WINGS is a semantic workflow system that assists scientists with the design of computational experiments. A unique feature of WINGS is that its workflow representations incorporate semantic constraints about datasets and workflow components, and are used to create and validate workflows and to generate metadata for new data products. WINGS submits workflows to execution frameworks such as Pegasus and OODT to run workflows at large scale in distributed resources. [more]

Wings Standalone Versions Released

For quick installation, we now also provide a Standalone Wings Bundle with all other software packages that it depends on. You can download the Standalone versions of Wings (with in-built Apache, MySQL, PHP, Tomcat and pre-installed Wings) from here:

Download Wings Standalone Bundle

Exporting Wings provenance using the emerging W3C PROV standard

We are now exporting provenance records for Wings workflows as Linked Open Data using the W3C PROV provenance standard


OPMW (Open Provenance Model for Workflows)

OPMW is an ontology for describing workflows based on the Open Provenance Model. OPMW allows the publication of workflow execution traces as well as the more abstract reusable workflows that were originally used.

Workflows as Linked Data

The goal of this work is to extend articles with scientific workflows to 1) represent computations carried out to obtain the published results, essentially capturing explicitly data analysis pipelines, and 2) represent an abstraction of those computations that captures the semantics of the data analysis method in an execution-independent manner. This would make scientific results more reproducible because articles would have not just a textual description of the computational process described in the article but also a workflow that, as a computational artifact, could be analyzed and re-run automatically.
In recent years, a variety of systems have been developed that export the workflows used to analyze data and make them part of published articles. The workflows that are published in current approaches are dependent on the specific codes used for execution, the specific workflow system used, and the specific workflow catalogs where they are published.

In this work, we take a new approach that addresses these shortcomings and makes workflows more reusable through: 1) the use of abstract workflows to complement executable workflows to make them reusable when the execution environment is different, 2) the publication of both abstract and executable workflows using standards such as the Open Provenance Model that can be imported by other workflow systems, 3) the publication of workflows as Linked Data that results in open web accessible workflow repositories. Our initial focus is a complex workflow that we re-created from an influential drug discovery publication that describes the generation of ‘drugomes’.

The TB Drugome Workflow

Our initial focus is on a reusable computational workflow the method to derive the drug-target network of an organism (i.e., its drugome) published in (Kinnings et al 11) (a preprint is available, see also the project web site ).
The article describes a computational pipeline that accesses data from the Protein Data Base (PDB) and carries out a systematic analysis of the proteome of Mycobacterium tuberculosis (TB) against all approved drugs. The process uncovers protein receptors in the organism that could be targeted by drugs currently in use for other purposes. The result is a drug-target network (a “drugome”) that includes all known approved drugs. Although the article focuses on a particular organism (TB), the method itself can be used for other pathogens or pathways and has the potential to be a key resource to develop new more comprehensive treatments for other diseases of interest.
With the help of the authors of the article, we have created the executable workflow that reflects the steps that were described in the original article and run it with data used in the original experiments.
The final executable workflow can be seen here:
Drugome Executable Workflow

To export the workflows we developed OPMW as an extension of OPM that can represent abstract workflows.

OPM is a widely-used domain-independent provenance model result of the Provenance Challenge Series and years of workflow provenance exchange and standardization in the scientific workflow community.

There are several reasons to use OPM. First, OPM has been already used successfully in many scientific workflow systems, thus making our published workflows more reusable. Another advantage is that the core definitions in OPM are domain independent and extensible to accommodate other purposes, in our case workflow representations. In addition, OPM can be considered the basis of the emerging W3C Provenance Interchange Language (PROV), which is currently being developed by the W3C Provenance Working Group as a standard for representing and publishing provenance on the Web.

OPM offers several core concepts and relationships to represent provenance. OPM models the resources (datasets) as artifacts (immutable pieces of state), processes (action or series of actions performed on artifacts), and agents (controllers of processes). Their relationships are modeled in a provenance graph with five causal edges: used (a process used some artifact), wasControlledBy (an agent controlled some process), wasGeneratedBy (a process generated an artifact), wasDerivedFrom (an artifact was derived from another artifact) and wasTriggeredBy (a process was triggered by another process). It also introduces the concept of roles to assign the type of activity that artifacts, processes or agents played when interacting with each other, and the notion of accounts and provenance graphs to group sets of OPM assertions into different subgraphs. An account represents a particular view on the provenance of an artifact based on what was executed. We mapped Wings ontologies to the OPM core model, extending OPM core concepts and relationships according to our needs in a new profile called OPMW.

We use two OPM ontologies for our mapping. OPMV is a lightweight RDF vocabulary implementation of the OPM model that only has a subset of the concepts in OPM but it facilitates modeling and query formulation. OPMO covers the full functionality of the OPM model, and we use it for mapping to OPM concepts that are not in OPMV, such as Account or OPM Graph.

Figure 1: OPMW extension Figure 1 shows a high level diagram of the mappings to OPM of an abstract workflow on the left and a specific execution on the right. The workflow shown here has one step (executionNode1), which runs the workflow component (specComp1) that has one input (execInput1) and one output (executionOutput1). For some of the concepts there is a straightforward mapping: datasets are a subtype of Artifacts, while workflow steps, also called nodes, map to OPM Processes. Notice that each node has a link to the component that is run in that step, for example the workflow in Figure 1 has two nodes that run the same component SMAPV2. There is no OPM term that can be mapped to components, so we used our own terms (represented with the ac prefix in the Figure 1).

In the figure, the terms taken from OPMO and OPMV are indicated using their namespaces. The new terms that we defined in our extension profile use the OPMW prefix. The ontology can be browsed here

WINGS workflow system

WINGS is a workflow system that assists scientists with the design of computational experiments. A computational experiment specifies how selected datasets are to be processed by a series of software components in a particular configuration. Earth scientists use computational experiments to estimate seismic hazard through simulations of earthquake forecasts. Biologists use computational experiments for analysis of gene expression microarray data or molecular interaction networks and pathways. Social scientists analyze large social networks to discover structural regularities based on mining relations among individuals.

We use workflows to represent computational experiments. Workflows represent application components and their dependencies in terms of dataflow among them. Workflow systems have been developed to assist users with some aspect of the process, for example to assemble workflows out of large component libraries, to optimize execution performance, and for workflow sharing. None of these systems provides comprehensive support for workflow design and exploration. To learn more about the state of the art in workflow systems, please visit

Jena semantic framework

Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine.
Jena is open source and grown out of work with the HP Labs Semantic Web Program.

Allegro Graph database

AllegroGraph is a modern, high-performance, persistent graph database. AllegroGraph uses efficient memory utilization in combination with disk-based storage, enabling it to scale to billions of quads while maintaining superior performance. AllegroGraph supports SPARQL, RDFS++, and Prolog reasoning from numerous client applications.


Pubby can be used to add Linked Data interfaces to SPARQL endpoints. Much Semantic Web data lives inside triple stores and can be accessed only by sending SPARQL queries to a SPARQL endpoint. It is hard to connect information in these stores with other external data sources.
Pubby makes it easy to turn a SPARQL endpoint into a Linked Data server. It is implemented as a Java web application.

Project Members



This project is sponsored by Elsevier Labs, the National Science Foundation with award number CCF-0725332, the Air Force Office of Scientific Resarch with award number FA9550-11-1-0104, and by internal funds from the University of Southern California's Information Sciences Institute and from the University of California, San Diego.

A Framework for Efficient Text Analysis in Wings

We have developed a framework to assist scientists with computational experiments for text analysis. It takes advantage of the unique capabilities of Wings to reason about constraints and encompasses over 50 components for machine learning and text analysis tasks. The framework also contains workflows for text classification, text clustering and the visualization of results and can be used with several of the most common datasets from the text classification research community.

Wings Workflows for measuring "Water Metabolism"

We are using Wings to measure "Water Metabolism" for the Aqua-Flow Project. The idea behind the project is to observe, model and manage water resources to optimize stream ecology while sustaining society’s water needs.

With regards to Wings, we would like demonstrate knowledge-rich workflow design and large-scale distributed execution capabilities to assist users in the development and management of complex analyses through guidance and automation.

Workflows for "Next Generation Sequencing data"

We are using Wings to execute workflows for Next Generation Sequencing data. The diagram shows a workflow for processing mRNA-Seq data, which performs seven alignment steps (genome, junction, fusion, polyA, polyT, miR, and paired) in the analysis. The workflow completes expression measures for each gene, exon, and splice form from a given set of samples.

Tutorial / Reference for the Wings Workflow Portal

We've created a simple tutorial that works as a reference for users already familiar with the Workflow Portal, and provides new users with steps explaining how to go about creating data, components and workflows, as well as how to run the workflows and access output data.

The tutorial can be accessed here:

New Wings web site

Wings has now a new web site, with more comprehensive information as well as interactive features for our collaborators and users.

Syndicate content