Main Page
From Wings and Pegasus Provenance
Contents |
Wings and Pegasus Provenance - Provenance Challenge 2006
http://twiki.ipaw.info/bin/view/Challenge/FirstProvenanceChallenge
Participating Team
- Short team name: USC/ISI
- Participant names: Ewa Deelman, Yolanda Gil, Jihie Kim, Gaurang Mehta, Varun Ratnakar
- Project URL: WINGS [1] ,PEGASUS [2]
Workflow Representation
Wings Template
The Template was created for the fMRI domain using the fMRI Component Library, and the fMRI file Ontology.
There were a few runs of the Wings DAX generator over this Template with certain random parameters, and it resulted in a bunch of DAX'es, and workflow instances (compact and expanded).
Wings File and Component Library
The fMRI domain File Library was populated with descriptions of all the files used/generated in the runs above, and the metadata/parameters that we provided for those runs. The queries basically involve SPARQL queries to this File Library, to get the associated Workflow Instance(s), and querying that workflow instance for more detailed information.
The fMRI Component Library provides 5 different components for use in the fMRI domain : align_warp, reslice, softmean, slice, convert. These components are used while creating a Template.
Workflow Instance (Dax)
Wings creates a workflow instance in the VDS dax format. The image below shows a GRAPHVIZ vizualization of the dax file. The XML dax file is also provided at the bottom that is provided as an input file to the Pegasus Workflow Planner.
Download the FMRI dax file fmri.xml
Pegasus Planned Workflow (Dag)
Download the planned workflow condor dag and submit files fmridag.tar.gz
Provenance Trace
WINGS Provenance
Wings file provenance is contained in the File Library.
Wings process flow provenance is contained in the specific Workflow Instance, or the Workflow Template
Pegasus Provenance
There are certain steps in planning the workflow that are not currently stored in any database or catalog but would like to. The following slides present what kind of provenance is generated in the planning process.
- Provenance of the workflow refinement process. - Download slides
Pegasus as part of VDS stores the execution provenance in the Provenance Tracking Catalog in DB. This information is collected at the execution runtime by wrapping the executables in a wrapper called Kickstart (credit UofChicago and Jens Voeckler). This wrapper collects various runtime information like start and endtimes, system usage, exitcode of the executables etc.. The schema of the tables is as follows.
describe ptc_invocation; +--------------+--------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+--------------+------+-----+---------------------+-------+ | id | bigint(20) | | PRI | 0 | | | creator | varchar(16) | | | | | | creationtime | datetime | | | 0000-00-00 00:00:00 | | | wf_label | varchar(32) | YES | | NULL | | | wf_time | datetime | YES | | NULL | | | version | varchar(4) | YES | | NULL | | | start | datetime | | | 0000-00-00 00:00:00 | | | duration | double | | | 0 | | | tr_namespace | varchar(255) | YES | | NULL | | | tr_name | varchar(255) | YES | | NULL | | | tr_version | varchar(20) | YES | | NULL | | | dv_namespace | varchar(255) | YES | | NULL | | | dv_name | varchar(255) | YES | | NULL | | | dv_version | varchar(20) | YES | | NULL | | | resource | varchar(48) | YES | | NULL | | | host | varchar(16) | YES | | NULL | | | pid | int(11) | YES | | NULL | | | uid | int(11) | YES | | NULL | | | gid | int(11) | YES | | NULL | | | cwd | text | YES | | NULL | | | arch | bigint(20) | YES | MUL | NULL | | | total | bigint(20) | YES | MUL | NULL | | +--------------+--------------+------+-----+---------------------+-------+ describe ptc_job; +----------+--------------+------+-----+---------------------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+--------------+------+-----+---------------------+-------+ | id | bigint(20) | | PRI | 0 | | | type | char(1) | | PRI | | | | start | datetime | | | 0000-00-00 00:00:00 | | | duration | double | | | 0 | | | pid | int(11) | YES | | NULL | | | rusage | bigint(20) | YES | MUL | NULL | | | stat | bigint(20) | YES | MUL | NULL | | | exitcode | int(11) | | | 0 | | | exit_msg | varchar(255) | YES | | NULL | | | args | mediumblob | YES | | NULL | | +----------+--------------+------+-----+---------------------+-------+ describe ptc_lfn; +---------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------+------+-----+---------+-------+ | id | bigint(20) | | MUL | 0 | | | stat | bigint(20) | | MUL | 0 | | | initial | char(1) | YES | | NULL | | | lfn | varchar(255) | YES | | NULL | | +---------+--------------+------+-----+---------+-------+ describe ptc_rusage; +----------+------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+------------+------+-----+---------+-------+ | id | bigint(20) | | PRI | 0 | | | utime | double | | | 0 | | | stime | double | | | 0 | | | minflt | int(11) | YES | | NULL | | | majflt | int(11) | YES | | NULL | | | nswaps | int(11) | YES | | NULL | | | nsignals | int(11) | YES | | NULL | | | nvcsw | int(11) | YES | | NULL | | | nivcsw | int(11) | YES | | NULL | | +----------+------------+------+-----+---------+-------+ describe ptc_stat; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | id | bigint(20) | | PRI | 0 | | | errno | smallint(6) | | | 0 | | | fname | text | YES | | NULL | | | fdesc | int(11) | YES | | NULL | | | size | bigint(20) | YES | | NULL | | | mode | int(11) | YES | | NULL | | | inode | bigint(20) | YES | | NULL | | | atime | datetime | YES | | NULL | | | ctime | datetime | YES | | NULL | | | mtime | datetime | YES | | NULL | | | uid | int(11) | YES | | NULL | | | gid | int(11) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ describe ptc_uname; +----------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+-------------+------+-----+---------+-------+ | id | bigint(20) | | PRI | 0 | | | archmode | varchar(16) | YES | MUL | NULL | | | sysname | varchar(64) | YES | | NULL | | | release | varchar(64) | YES | | NULL | | | machine | varchar(64) | YES | | NULL | | +----------+-------------+------+-----+---------+-------+
Provenance Queries
Click here to go to the Provenance Queries Page
Slides and References
- Provenance of the workflow refinement process slides
- "Semantic Metadata Generation for Large Scientific Workflows", Jihie Kim, Yolanda Gil, Varun Ratnakar. In International Semantic Web Conference, ISWC-2006. (http://www.isi.edu/ikcap/scec-it/papers/Wings-metadata-ISWC-2006.pdf)
- WINGS/Pegasus: Semantic Metadata Reasoning for Large Scientific Workflows (powerpoint slides at http://www.isi.edu/ikcap/scec-it/Wings-provenance.pdf)
Consult the User's Guide for information on using the wiki software.
