Provenance challenge myGrid David De Roure



Yüklə 469 b.
tarix07.07.2017
ölçüsü469 b.


Provenance challenge --- myGrid

  • David De Roure

  • University of Southampton

  • Jun Zhao, Carole Goble and Daniele Turi

  • University of Manchester


Outline

  • Short team introduction

  • Workflow implementation

  • Provenance schema and storage

  • Provenance queries

  • Suggestions

  • Reflection

  • Acknowledgement



Provenance Challenge Overview

  • Given an abstract workflow

  • Implement this workflow in your system

  • Collect provenance from runs of this workflow

  • Present the implemented workflow and collected provenance

  • Answer a list of provenance questions and present these answers



Taverna and myGrid

  • A UK e-Science project to build middleware for in silico experiments by individual life scientists, stuck in under-resourced labs, who use other people’s applications.

  • Sequence analysis, microarray analysis, proteomics, chemoinformatics, image processing, rendering Dilbert cartoons.





Scufl

  • Data links

  • Control links: limited support

  • Failure tolerance: retry and alternative services

  • Implicit iterations: cross/dot iterations

  • Nested workflows

  • Semantic metadata annotations



What has to be done

  • Design the workflow using Scufl in Taverna

  • Build services (Web services, Soaplab services, local java, or beanshell scripts) to implement each process

  • Gather and process the real data products



Doing it properly

  • Wrap each procedure as a service

  • Process the real data as a real experiment

  • Use iterations, nested workflow or interactive workflows supported by Taverna

  • Real examples:

    • Chimatica (http://www.chimatica.co.uk/) supports high throughput workflows using Taverna 1.X
    • MIAS-Grid (http://www.mias-irc.net/) uses myGrid to build medical image processing workflows


What we did actually

  • Realize each procedure as a beanshell script, to avoid real service implementation and deployment

  • Pass pseudo data products rather than real image data products

  • But keep the metadata about data products along with provenance to answer semantic questions



Implemented Scufl workflow in Taverna



Provenance schema

  • Four aspects

    • Workflow provenance
    • Data provenance
    • Organization provenance
    • Knowledge provenance
  • Provenance ontology

    • RDFS
    • OWL-lite


Provenance Pyramid Model





Workflow provenance ontology



Data provenance ontology



Organization & Knowledge provenance ontology

  • userPredicate

    • Semantic concept about a data product or a service, e.g. nucleotide_sequence
    • Semantic (knowledge) relationships between two data products, e.g. similar_sequence_to


Collected & stored provenance

  • LSIDs used to identify:

    • data, workflows, workflow runs
    • LSIDs are names of graphs
  • Named RDF graphs

    • retrieve whole workflow runs
    • implementation in
      • Sesame2 native store
        • scalable
        • alpha release (bugs)
      • NG4J (Jena + MySQL)
        • scalability issues
      • Future implementations: Oracle and Boca


Answer matrix

  • Find the process that led to d0 (Atlas X Graphic)

  • Find the process that led to d0 (Atlas X Graphic) excluding everything prior to d1 (the averaging of images with softmean)

  • Find the Stage 3, 4 and 5 details of the process that led to d0 (Atlas X Graphic)

  • Find all invocations of procedure align_warp using p0 (a twelfth order nonlinear 1365 parameter model)



Answer matrix

  • 7. A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs.



Suggested Workflow Variants



Suggested Workflow Variants



Suggested Workflow Variants



Suggested Queries

  • Compare, merge and union provenance from different workflow runs

  • Explain why different outputs were produced in repeated workflow runs

  • Replay a workflow run



Categorisation of queries

  • Four levels:

  • 1. queries to support the provenance browser

  • 2. semantic queries

  • 3. integration queries

  • 4. pre-canned queries to support provenance usage scenarios.



Live systems

  • Taverna:

  • http://taverna.sourceforge.net

  • Provenance plugin and browser beta release: bundled with the Taverna release 1.4.

  • Provenance ontology:

  • http://cvs.mygrid.org.uk/cgi-bin/viewcvs.cgi/mygrid/miasgrid/rdf-provenance/etc/ontology/

  • System requirement:

    • Windows, Linux, Mac
    • Java 5.0
    • mySQL database (optional)


Reflection

  • A systematic provenance query framework is needed

  • Separate data and provenance metadata

  • A consensus of provenance models



Acknowledgement

  • The myGrid Taverna team: Tom Oinn, Stuart Owen, Stian Soiland, David Withers, Katy Wolstencroft and June Finch

  • Daniele Turi: provenance plugin

  • Matthew Gamble: Taverna provenance browser

  • Chris Wroe from the original myGrid project




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2016
rəhbərliyinə müraciət

    Ana səhifə