Provenance challenge myGrid David De Roure

Yüklə 5,96 Mb.
ölçüsü5,96 Mb.

Provenance challenge --- myGrid

  • David De Roure

  • University of Southampton

  • Jun Zhao, Carole Goble and Daniele Turi

  • University of Manchester


  • Short team introduction

  • Workflow implementation

  • Provenance schema and storage

  • Provenance queries

  • Suggestions

  • Reflection

  • Acknowledgement

Provenance Challenge Overview

  • Given an abstract workflow

  • Implement this workflow in your system

  • Collect provenance from runs of this workflow

  • Present the implemented workflow and collected provenance

  • Answer a list of provenance questions and present these answers

Taverna and myGrid

  • A UK e-Science project to build middleware for in silico experiments by individual life scientists, stuck in under-resourced labs, who use other people’s applications.

  • Sequence analysis, microarray analysis, proteomics, chemoinformatics, image processing, rendering Dilbert cartoons.


  • Data links

  • Control links: limited support

  • Failure tolerance: retry and alternative services

  • Implicit iterations: cross/dot iterations

  • Nested workflows

  • Semantic metadata annotations

What has to be done

  • Design the workflow using Scufl in Taverna

  • Build services (Web services, Soaplab services, local java, or beanshell scripts) to implement each process

  • Gather and process the real data products

Doing it properly

  • Wrap each procedure as a service

  • Process the real data as a real experiment

  • Use iterations, nested workflow or interactive workflows supported by Taverna

  • Real examples:

    • Chimatica ( supports high throughput workflows using Taverna 1.X
    • MIAS-Grid ( uses myGrid to build medical image processing workflows

What we did actually

  • Realize each procedure as a beanshell script, to avoid real service implementation and deployment

  • Pass pseudo data products rather than real image data products

  • But keep the metadata about data products along with provenance to answer semantic questions

Implemented Scufl workflow in Taverna

Provenance schema

  • Four aspects

    • Workflow provenance
    • Data provenance
    • Organization provenance
    • Knowledge provenance
  • Provenance ontology

    • RDFS
    • OWL-lite

Provenance Pyramid Model

Workflow provenance ontology

Data provenance ontology

Organization & Knowledge provenance ontology

  • userPredicate

    • Semantic concept about a data product or a service, e.g. nucleotide_sequence
    • Semantic (knowledge) relationships between two data products, e.g. similar_sequence_to

Collected & stored provenance

  • LSIDs used to identify:

    • data, workflows, workflow runs
    • LSIDs are names of graphs
  • Named RDF graphs

    • retrieve whole workflow runs
    • implementation in
      • Sesame2 native store
        • scalable
        • alpha release (bugs)
      • NG4J (Jena + MySQL)
        • scalability issues
      • Future implementations: Oracle and Boca

Answer matrix

  • Find the process that led to d0 (Atlas X Graphic)

  • Find the process that led to d0 (Atlas X Graphic) excluding everything prior to d1 (the averaging of images with softmean)

  • Find the Stage 3, 4 and 5 details of the process that led to d0 (Atlas X Graphic)

  • Find all invocations of procedure align_warp using p0 (a twelfth order nonlinear 1365 parameter model)

Answer matrix

  • 7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs.

Suggested Workflow Variants

Suggested Workflow Variants

Suggested Workflow Variants

Suggested Queries

  • Compare, merge and union provenance from different workflow runs

  • Explain why different outputs were produced in repeated workflow runs

  • Replay a workflow run

Categorisation of queries

  • Four levels:

  • 1. queries to support the provenance browser

  • 2. semantic queries

  • 3. integration queries

  • 4. pre-canned queries to support provenance usage scenarios.

Live systems

  • Taverna:


  • Provenance plugin and browser beta release: bundled with the Taverna release 1.4.

  • Provenance ontology:


  • System requirement:

    • Windows, Linux, Mac
    • Java 5.0
    • mySQL database (optional)


  • A systematic provenance query framework is needed

  • Separate data and provenance metadata

    • Better storage scalability
    • Avoid archiving duplicate data products
  • A consensus of provenance models


  • The myGrid Taverna team: Tom Oinn, Stuart Owen, Stian Soiland, David Withers, Katy Wolstencroft and June Finch

  • Daniele Turi: provenance plugin

  • Matthew Gamble: Taverna provenance browser

  • Chris Wroe from the original myGrid project

Yüklə 5,96 Mb.

Dostları ilə paylaş:

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2020
rəhbərliyinə müraciət

    Ana səhifə