Main Page

From Wings and Pegasus Provenance

Jump to: navigation, search

Contents

Wings and Pegasus Provenance - Provenance Challenge 2006

http://twiki.ipaw.info/bin/view/Challenge/FirstProvenanceChallenge

Participating Team

  • Short team name: USC/ISI
  • Participant names: Ewa Deelman, Yolanda Gil, Jihie Kim, Gaurang Mehta, Varun Ratnakar
  • Project URL: WINGS [1] ,PEGASUS [2]

Workflow Representation

Wings Template

The Template was created for the fMRI domain using the fMRI Component Library, and the fMRI file Ontology.

Wings Template for fMRI
Enlarge
Wings Template for fMRI

There were a few runs of the Wings DAX generator over this Template with certain random parameters, and it resulted in a bunch of DAX'es, and workflow instances (compact and expanded).

Wings File and Component Library

The fMRI domain File Library was populated with descriptions of all the files used/generated in the runs above, and the metadata/parameters that we provided for those runs. The queries basically involve SPARQL queries to this File Library, to get the associated Workflow Instance(s), and querying that workflow instance for more detailed information.

The fMRI Component Library provides 5 different components for use in the fMRI domain : align_warp, reslice, softmean, slice, convert. These components are used while creating a Template.

Workflow Instance (Dax)

Wings creates a workflow instance in the VDS dax format. The image below shows a GRAPHVIZ vizualization of the dax file. The XML dax file is also provided at the bottom that is provided as an input file to the Pegasus Workflow Planner.

Workflow Instance (DAX)
Enlarge
Workflow Instance (DAX)


Download the FMRI dax file fmri.xml

Pegasus Planned Workflow (Dag)

Planner Workflow (Condor Dag)
Enlarge
Planner Workflow (Condor Dag)


Download the planned workflow condor dag and submit files fmridag.tar.gz

Provenance Trace

WINGS Provenance

Wings file provenance is contained in the File Library.

Wings process flow provenance is contained in the specific Workflow Instance, or the Workflow Template

Pegasus Provenance

There are certain steps in planning the workflow that are not currently stored in any database or catalog but would like to. The following slides present what kind of provenance is generated in the planning process.


Pegasus as part of VDS stores the execution provenance in the Provenance Tracking Catalog in DB. This information is collected at the execution runtime by wrapping the executables in a wrapper called Kickstart (credit UofChicago and Jens Voeckler). This wrapper collects various runtime information like start and endtimes, system usage, exitcode of the executables etc.. The schema of the tables is as follows.

describe ptc_invocation;
+--------------+--------------+------+-----+---------------------+-------+
| Field        | Type         | Null | Key | Default             | Extra |
+--------------+--------------+------+-----+---------------------+-------+
| id           | bigint(20)   |      | PRI | 0                   |       |
| creator      | varchar(16)  |      |     |                     |       |
| creationtime | datetime     |      |     | 0000-00-00 00:00:00 |       |
| wf_label     | varchar(32)  | YES  |     | NULL                |       |
| wf_time      | datetime     | YES  |     | NULL                |       |
| version      | varchar(4)   | YES  |     | NULL                |       |
| start        | datetime     |      |     | 0000-00-00 00:00:00 |       |
| duration     | double       |      |     | 0                   |       |
| tr_namespace | varchar(255) | YES  |     | NULL                |       |
| tr_name      | varchar(255) | YES  |     | NULL                |       |
| tr_version   | varchar(20)  | YES  |     | NULL                |       |
| dv_namespace | varchar(255) | YES  |     | NULL                |       |
| dv_name      | varchar(255) | YES  |     | NULL                |       |
| dv_version   | varchar(20)  | YES  |     | NULL                |       |
| resource     | varchar(48)  | YES  |     | NULL                |       |
| host         | varchar(16)  | YES  |     | NULL                |       |
| pid          | int(11)      | YES  |     | NULL                |       |
| uid          | int(11)      | YES  |     | NULL                |       |
| gid          | int(11)      | YES  |     | NULL                |       |
| cwd          | text         | YES  |     | NULL                |       |
| arch         | bigint(20)   | YES  | MUL | NULL                |       |
| total        | bigint(20)   | YES  | MUL | NULL                |       |
+--------------+--------------+------+-----+---------------------+-------+

describe ptc_job;
+----------+--------------+------+-----+---------------------+-------+
| Field    | Type         | Null | Key | Default             | Extra |
+----------+--------------+------+-----+---------------------+-------+
| id       | bigint(20)   |      | PRI | 0                   |       |
| type     | char(1)      |      | PRI |                     |       |
| start    | datetime     |      |     | 0000-00-00 00:00:00 |       |
| duration | double       |      |     | 0                   |       |
| pid      | int(11)      | YES  |     | NULL                |       |
| rusage   | bigint(20)   | YES  | MUL | NULL                |       |
| stat     | bigint(20)   | YES  | MUL | NULL                |       |
| exitcode | int(11)      |      |     | 0                   |       |
| exit_msg | varchar(255) | YES  |     | NULL                |       |
| args     | mediumblob   | YES  |     | NULL                |       |
+----------+--------------+------+-----+---------------------+-------+

 describe ptc_lfn;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| id      | bigint(20)   |      | MUL | 0       |       |
| stat    | bigint(20)   |      | MUL | 0       |       |
| initial | char(1)      | YES  |     | NULL    |       |
| lfn     | varchar(255) | YES  |     | NULL    |       |
+---------+--------------+------+-----+---------+-------+

describe ptc_rusage;
+----------+------------+------+-----+---------+-------+
| Field    | Type       | Null | Key | Default | Extra |
+----------+------------+------+-----+---------+-------+
| id       | bigint(20) |      | PRI | 0       |       |
| utime    | double     |      |     | 0       |       |
| stime    | double     |      |     | 0       |       |
| minflt   | int(11)    | YES  |     | NULL    |       |
| majflt   | int(11)    | YES  |     | NULL    |       |
| nswaps   | int(11)    | YES  |     | NULL    |       |
| nsignals | int(11)    | YES  |     | NULL    |       |
| nvcsw    | int(11)    | YES  |     | NULL    |       |
| nivcsw   | int(11)    | YES  |     | NULL    |       |
+----------+------------+------+-----+---------+-------+

describe ptc_stat;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | bigint(20)  |      | PRI | 0       |       |
| errno | smallint(6) |      |     | 0       |       |
| fname | text        | YES  |     | NULL    |       |
| fdesc | int(11)     | YES  |     | NULL    |       |
| size  | bigint(20)  | YES  |     | NULL    |       |
| mode  | int(11)     | YES  |     | NULL    |       |
| inode | bigint(20)  | YES  |     | NULL    |       |
| atime | datetime    | YES  |     | NULL    |       |
| ctime | datetime    | YES  |     | NULL    |       |
| mtime | datetime    | YES  |     | NULL    |       |
| uid   | int(11)     | YES  |     | NULL    |       |
| gid   | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+

describe ptc_uname;
+----------+-------------+------+-----+---------+-------+
| Field    | Type        | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| id       | bigint(20)  |      | PRI | 0       |       |
| archmode | varchar(16) | YES  | MUL | NULL    |       |
| sysname  | varchar(64) | YES  |     | NULL    |       |
| release  | varchar(64) | YES  |     | NULL    |       |
| machine  | varchar(64) | YES  |     | NULL    |       |
+----------+-------------+------+-----+---------+-------+

Provenance Queries

Click here to go to the Provenance Queries Page

Slides and References


Consult the User's Guide for information on using the wiki software.

Getting started

This work is supported by the National Science Foundation under grant number SCI--0455361

Personal tools