ChangeLog

From Pegasus Wiki

Jump to: navigation, search

Contents

[edit] 2.2.0

[edit] Dirmanager modifications

Vahi 18:55, 29 August 2008 (PDT)

The dirmanager executable can now remove and create multiple directories. This is achieved by specifying a whitespace separated list of directories to the --dir option.

[edit] Worker Package Deployment

Vahi 18:49, 29 August 2008 (PDT)

If the cleanup option is passed to Pegasus, cleanup nodes will be added that remove the deployed worker package from the remote sites.


[edit] Minor Changes to Ranking Module

Vahi 17:12, 25 August 2008 (PDT)

The ranking file created by rank-dax now lists the makespan of the DAX in the third column. Additionally, it writes out the values for all the DAX'es. If the number of DAX'es is more than top n parameter, then the remaining dax'es values are printed as comments in the ranking file.

[edit] Fixed the specification of storage directory in properties file

Vahi 15:22, 18 August 2008 (PDT)

Fixed a bug in Pegasus where the storage directory was not constructed correctly for the output site, when the user specified the absolute path in the properties file. This was a transient error that cropped up during site catalog integration and is now fixed

[edit] Added Windward Specific Transfer Refiner and Transfer Implementation

Vahi 16:35, 15 August 2008 (PDT)

Checked in a Windward transfer refiner. This refiner is a composite refiner that calls out to the Bundle refiner for staging in the data. It however short circuits the stage-out and the inter site data transfers. The reason being each workflow runs against a workflow specific knowledge base in an AllegroGraph server. To use this set the following property

      pegasus.transfer.refiner Windward

Additionally, a new Transfer implementation called BAE was checked in. This uses the BAE provided dc-transfer client to stagein data to a knowledge base. To use this set the following property

      pegasus.transfer.*.imp BAE


[edit] Fixed loading of sites in the internal Site Store

Vahi 15:22, 13 August 2008 (PDT)

In the new Site Catalog implementation, only the sites that are required are loaded into the internal site store instead of loading all the sites present. The following sites are loaded

- execution sites specified at command line
- output site if specified at command line
- site local

However there was a bug that the output site was not loaded. This is now fixed.

Vahi 17:13, 25 August 2008 (PDT)

This however resulted in a new bug, where the output site was added to the list of execution sites. This is now fixed.

[edit] Two Wings related Properties added

Vahi 16:00, 11 August 2008 (PDT)

Added two new properties

pegasus.wings.properties   the path to wings properties file
pegasus.wings.request.id   the request id that is associated with the DAX being planned.

The above were required for the SR Submit Client and for passing hte request id to the Windward PC implementations.


[edit] Refactoring of the Transformation Catalog API

Vahi 16:00, 11 August 2008 (PDT)

Earlier the transformation catalog API relied on singleton access to it. Additionally there was no way to pass extra parameters to the implementations, due to lack of an initialize method.

This is fixed now. The Transformation Catalog API has an initialize method now. Additionally the Factory was fixed so that there is no reliance on Singleton access to the Transformation Catalog.


[edit] Fixed third party transfers

Vahi 15:22, 12 August 2008 (PDT)

After the integration of the new site catalog format, the third party transfers had stopped working. There was a bug in the manner of how the destination URL's were being constructed.

This is now fixed.

This change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=36


[edit] idle-node and total-nodes for GridGateway class

Vahi 16:48, 8 August 2008 (PDT)

Added the attribute idle-nodes and total-nodes in GridGateway. Fixed the parser and adapter from old format to new format accordingly. Cropped while getting ranking of workflows to work with new site catalog format.

[edit] Re-factoring of the Create Directory Refiner

Vahi 16:36, 28 July 2008 (PDT)

Did re-factoring of the Create Directory Refiner in Pegasus. Now we can specify separately the strategy ( how the nodes are added in the graph ) and the implementation ( how the create dir jobs are rendered/executable used ).

This was required for the Windward project where in addition to creating a directory on the remote site, the create dir job needs to add a knowledge base to AllegroGraph Store.

Vahi 18:00, 1 August 2008 (PDT)

There are now Windward implementations both for Strategy and Implementation. The Windward Strategy uses the hourglass strategy. In addition, it adds a root node that creates a workflow specific KB in the Allegro Graph Server using the Windward implementation. The Windward Implementation uses seqexec to create a directory in the Allegro graph base and create a KB in the Allegro Graph server.

The following properties need to be set to create the kb

pegasus.windward.allegro.site  isi_wind
pegasus.windward.allegro.host wind.isi.edu
pegasus.windward.allegro.port 4567
pegasus.windward.allegro.basekb /var/spool/agraph 

[edit] Preserve of Line Breaks in DAX parsing

Vahi 11:23, 18 July 2008 (PDT)

Added a Boolean property pegasus.parser.dax.preserver.linebreaks On setting it to true, the DAX parser will preserve any line breaks that appear in the arguments section of the job in the DAX This was required for the Windward project where the jobs expect there in a csv configuration file.

[edit] Added a Cluster Transfer Refiner

Vahi 23:01, 7 July 2008 (PDT)

A cluster refiner that builds upon the Bundle Refiner. It clusters the stage-in jobs and stage-out jobs per level of the workflow. The difference from the Bundle refiner being

         - stagein is also clustered/bundled per level. In Bundle it was for the 
           whole workflow.
         - keys that control the clustering ( old name bundling are ) 
           cluster.stagein and cluster.stageout
  
 In order to use the transfer refiner 
         - the property pegasus.transfer.refiner  must be set to value Cluster
  

This refiner also adds dependencies between the stagein transfer jobs on different levels of the workflow to ensure that stagein for the top level happens first and so on. Image:black_tx_cluster.jpg

[edit] Rescue option to pegasus-plan

Vahi 17:44, 27 June 2008 (PDT)

A rescue option to pegasus-plan has been added. The rescue option takes in an integer value, that determines the number of times re-planning is triggered in case of failures in deferred planning. For this to work, Condor 7.1.0 or higher is required as it relies on the recently implemented auto rescue feature in Condor DAGMan. Vahi 23:01, 7 July 2008 (PDT) Even though re-planning is triggered, Condor DAGMan still ends up submitting the rescue dag as it auto detects. The fix to it is to remove the rescue dag files in case of re-planning. This is still to be implemented

[edit] Added color-file option to showjob

Vahi 17:44, 26 June 2008 (PDT)

Added a --color-file option to show-job in $PEGASUS_HOME/contrib/showlog to pass a file that has the mappings from transformation name to colors.

The format of each line is as follows

transformation-name color

This can be used to assign different colors to compute jobs in a workflow. The default color assigned is gray if none is specified.

[edit] Added jobstate-summary tool

Akumar 11:53, 23 Jun 2008 (PDT)

jobstate-summary tool will help in debugging failed job information. It will show all the information associated with a failed job. It gets the list of failed job from the jobstate.log file.After that it parses latest kickstart file for each failed job and show the exit code and all the other information.

Usage: jobstate-summary --i <input directory> [--v(erbose)] [--V(ersion)] [--h(elp)]

  • input directory is the place where all the log files including jobstate.log file reside.
  • v option is for verbose debugging.
  • V option gives the pegasus version.
  • h option prints the help message.

A sample run is like jobstate-summary -i

  • /batigol/project/tutorial/dags/akumar/pegasus/diamond/run0013 -v

[edit] Dynamic Staging of Worker Package to remote sites

Vahi 11:42, 18 June 2008 (PDT)

Pegasus now supports staging of worker package as part of the workflow.

This feature is also being tracked through pegasus bugzilla . http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=35

The first stab at the implementation does the following

  • The worker package is staged automatically to the remote site, by adding a

setup transfer job to the workflow. The setup transfer job by default uses GUC to stage the data. However, this can be configured by setting the property

pegasus.transfer.setup.impl property. If you have pegasus.transfer.*.impl set in your properties file, then you need to set pegasus.transfer.setup.impl to GUC

  • The code discovers the worker package by looking up pegasus::worker in the

transformation catalog. For the time being, you will have to put the entries in the transformation catalog . The location of the appropriate worker package can be picked up from http://pegasus.isi.edu/mapper/code.php#Worker_Packages In future, we will automatically look up this link to determine the locations. Note: that the basename of the url's should not be changed. Pegasus parses the basename to determine the version of the worker package.

  • There is an untar job added to the workflow after the setup job that un tars

the worker package on the remote site. It defaults to /bin/tar . However can be overriden by specifying the entry tar in the transformation catalog for a particular site.

Vahi 11:14, 19 June 2008 (PDT)

Pegasus now automatically determines the location of the worker package to deploy on the remote site.

Currently default mappings are as follows INTEL32 => x86 AMD64 => x86_64 or x86 if not available INTEL64 =>x86

OS LINUX = rhel3

[edit] GUC from Globus 4.x series support

Vahi 12:19, 11 June 2008 (PDT)

Introduced a new transfer implementation named GUC, that allows us to transfer multiple files in a single job. The globus-url-copy client that ships in Globus 4.x series will be compatible with this implementation

In order to use this transfer implementation

         - the property pegasus.transfer.*.impl must be set to value GUC.
  

There should be an entry in the transformation catalog with the fully qualified name as globus::guc for all the sites where workflow is run, or on the local site in case of third party transfers.

Pegasus can automatically construct the path to the globus-url-copy client, if the environment variable GLOBUS_LOCATION is specified in the site catalog for the site.

The arguments with which the client is invoked can be specified

         - by specifying the property pegasus.transfer.arguments
         - associating the Pegasus profile key transfer.arguments
 


[edit] Updated the Heft implementation

Vahi 12:19, 11 June 2008 (PDT)

The expected runtime for the jobs can now be specified in the DAX as pegasus profile. The key for the profile is runtime. The profile in the DAX can be overloaded by associating the profile with the transformation in the transformation catalog.

[edit] Added -j|--job-prefix option to pegasus-plan

Vahi 14:23, 29 May 2008 (PDT)

pegasus-plan can now be passed the -j|--job-prefix option to designate the prefix that needs to be used for constructing the job submit

Vahi 10:59, 3 June 2008 (PDT)

The bundled transfer jobs and the directory creation jobs did not have the prefix set. Fixed now.

[edit] Properties Documentation

Vahi 16:14, 1 May 2008 (PDT)

Documented the following properties

  • pegasus.gridstart.kickstart.stat
  • pegasus.gridstart.generate.lof
  • pegasus.dagman.[category].maxjobs

[edit] Support for DAGMan node categories

Vahi 12:02, 1 May 2008 (PDT)

Added support for DAGMan node categories in Pegasus. DAGMan now allows to specify CATEGORIES for jobs, and then specify tuning parameters ( like maxjobs ) per category. This functionality is exposed in Pegasus as follows

The user can associate a dagman profile key category with the jobs. The key attribute for the profile is category and value is the category to which the job belongs to. For example you can set the dagman category in the DAX for a job as follows

 <job id="ID000001" namespace="vahi" name="preprocess" version="1.0" level="3" dv-namespace="vahi" dv-name="top" dv-version="1.0">
     <profile namespace="dagman" key="CATEGORY">short-running</profile>


    <argument>-a top -T 6  -i <filename file="david.f.a"/>  -o <filename file="vahi.f.b1"/>

    <filename file="vahi.f.b2"/>
    </argument>
    <uses file="david.f.a" link="input" register="false" transfer="true" type="data"/>
    <uses file="vahi.f.b1" link="output" register="true" transfer="true" />
    <uses file="vahi.f.b2" link="output" register="true" transfer="true" />
  </job>

The property pegasus.dagman.[category].maxjobs can be used to control the value. For the above example, we can set the property as follows

pegasus.dagman.short-running.maxjobs 2

In the DAG file generated you will see the category associated with jobs. For the above example, it will look as follows

MAXJOBS short-running 2

CATEGORY preprocess_ID000001 short-running
JOB preprocess_ID000001 preprocess_ID000001.sub
RETRY preprocess_ID000001 2

[edit] Handling of pass through LFN

Vahi 16:06, 22 April 2008 (PDT)

If a job in a DAX, specifies the same LFN as an input and an output, it is a pass through LFN. Internally, the LFN is tagged only as an input for the job. The reason for this, being that we need to make sure that the replica catalog is queried for the location of the LFN. If this is not handled specially, then LFN is tagged internally as inout ( meaning it is generated during workflow execution ). LFN's with type inout are not queried for in the Replica Catalog in the force mode of operation

[edit] Updated sample.properties

Vahi 16:04, 8 April 2008 (PDT) Updated the sample properties. Added documentation for

  • pegasus.clusterer.job.aggregator.seqexec.firstjobfail
  • DCLauncher mode of gridstart

[edit] Support for OSU Datacutter jobs

Vahi 17:25, 7 April 2008 (PDT)

  • Pegasus has new gridstart mode called DCLauncher. This allows us to launch the Data Cutter jobs using the wrapper that Vijay wrote.
  • There is a new pegasus profile key gridstart.path to specify the path to the gridstart.
  • Pegasus now supports the condor parallel universe.

To launch a job using DCLauncher, the following pegasus profile keys need to be associated with the job

gridstart          to DCLauncher
gridstart.path     the path to the DCLauncher script

[edit] Tripping seqexec on first job failures

Vahi 17:19, 18 March 2008 (PDT) By default seqexec does not stop execution even if one of the clustered jobs it is executing fails. This is because seqexec tries to get as much work done as possible. If for some reason, you want to make seqexec stop on first job failure, set the following property in the properties file

pegasus.clusterer.job.aggregator.seqexec.firstjobfail true

[edit] 2.1.0

[edit] Release Pegasus 2.1.0

Vahi 17:18, 21 February 2008 (PDT)

SVN was tagged and pegasus 2.1.0 was released. Available for download on the website.

[edit] Fixed the parsing of DAX while partitioning

Vahi 13:54, 6 February 2008 (PST) With the new DAX2.1 schema, the type attribute could be specified. However, it was not being parsed correctly during partitioning. It is now fixed.

[edit] Added --submit|-S option to pegasus-plan

Vahi 13:53, 6 February 2008 (PST)
If this option is specified, the workflow would be submitted automatically using pegasus-run.

[edit] Updated the Stork Transfer Intefaces

Vahi 16:39, 4 February 2008 (PST)
The Stork Transfer interfaces in Pegasus have been updated to the latest version of Stork. To use Stork as a transfer mechanism in Pegasus, a user needs to set the following properties

pegasus.transfer.refiner = SDefault
pegasus.transfer.*.impl=Stork

[edit] Enabled transfer of kickstart

Vahi 16:30, 17 January 2008 (PST)
A user can now transfer kickstart from the submit host to the remote execution site for the jobs. This is acheived by the leveraging the transfer_executable feature in Condor. Setting condor profile key transfer_executable to true for a particular job, results in kickstart being transferred. The transfer of the compute executables is managed separately through staging of executables module in Pegasus.


[edit] Fixed the clustering interfaces to support worker node execution

Vahi 15:53, 8 January 2008 (PST)
The clustering and job aggregator interfaces were extended to be initialized with PegasusBag object. Changed the interfaces/implementations/associated factories

Vahi 16:28, 17 January 2008 (PST)
Additionally the enabling of the constitutent jobs in a clustered job had to be disabled. The whole clustered job undergoes worker node execution enabling. The constituent jobs are not enabled. This was to surpress multiple file transfers that were associated otherwise.

[edit] Modified Condor Transfer Refiner for stageout

Vahi 15:53, 3 January 2008 (PST)
The stageout using transfer_output_file did not work, as transfer_output_files in Condor does not take paths. It only takes base filenames. Hence, I have changed the behaviour of the Condor refiner.

The stageout in the condor refiner adds a separate transfer node, that transfers the output data from the submit directory to the output directory on the local site. The transfer node is a /bin/true job that uses Condor file transfer mechanism to transfer the data.

[edit] Added support for Condor File Transfer Mechanism

Vahi 13:50, 2 November 2007 (PDT)
Added support for using the Condor file transfer mechanism to transfer the input files from the submit host to the shared filesytem of the remote grid site. This is achieved by adding a /bin/true job for a transfer job and specifying transfer_input_files classad for that job.

To make the above work, the user needs to user a Local replica selector. This is because by default, the replica selectors in Pegasus only pick up a file url only if the pool attribute associated with them match the execution site.

Summarizing to use Condor file transfer mechanism set the following properties

pegasus.transfer.stagein.impl Condor
pegasus.selector.replica      Local

Only file url's with pool attribute set to local will be considered.

The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=15

[edit] Specifying relative submit directory

Vahi 13:50, 30 October 2007 (PDT)
Since pegasus 2.0 , pegasus-plan creates a directory structure in the base submit directory. The base submit directory is specified by --dir option to pegasus-plan.

If a user, want to specify a relative submit directory, he can use the --relative-dir option to pegasus-plan.

The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=14

[edit] Relative Path to DAX

Vahi 13:50, 30 October 2007 (PDT)
An incorrect path to the dax was generated internally when a user specified a relative path to the dax to pegasus-plan

This is fixed now, and was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=13


[edit] Moved to DAX 2.1

Vahi 14:38, 30 October 2007 (PDT)

Moved to the new DAX schema version 2.1

http://pegasus.isi.edu/schema/dax-2.1.xsd

The main change in it is that the dontTransfer and dontRegister flags have been replaced by transfer and register flags.

Changes were made both to the Java DAX Generator and Pegasus to conform to the new schema.

Additionally, the parser in Pegasus looks at the schema version to determine whether to pick up dontTransfer and dontRegister flags ( to support backward compatibility with the older daxes).

Also with the filename type added a type attribute. It defaults to data. Additionally can have the values executable|pattern.

The above change was tracked via bugzilla http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=6


[edit] Using multiple grid ftp servers for stageout

Vahi 15:41, 23 October 2007 (PDT)
There was a bug in Pegasus, whereby even if the user specified multiple grid ftp servers for the output site in the site catalog, all were not used. The above feature is useful if for each workflow a lot of files need to be transferred to the output site, as in the SCEC case.
More info can be found at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=3

[edit] Specifying the jobmanager universe for the compute jobs in the DAX

Vahi 15:41, 19 October 2007 (PDT)
Introduced an enhancement in Pegasus, that allows the user to specify the jobmanager type for the compute jobs in the DAX. This is achieved by specifying the jobmanager.universe profile key in the hints namespace.

Valid values for this are transfer|vanilla.

This is useful for users who are running on a grid site, with the worker nodes behind a firewall and want a subset of their jobs to run on the head node.


More info can be found at http://vtcpc.isi.edu/bugzilla/show_bug.cgi?id=5

[edit] Completed Integration with System Research AC

Vahi 07:58, 1 October 2007 (PDT)
Implemented the Windward interface to the Transformation Catalog Interface. This talks to the System Research DC, that in turn interfaces with both ISI internal AC and Windward AC. Currently, all the testing has been done with SR-ISI AC

[edit] Integrated the System Research DC

Vahi 13:42, 28 September 2007 (PDT)
Implemented the Windward interface to the Replica Catalog Interface. This talks to the System Research DC, that in turn interfaces with both ISI internal DC and Windward DC. Currently, all the testing has been done with SR-ISI DC

[edit] Workflow and Planner Metrics Logging

Vahi 13:42, 25 September 2007 (PDT)
Workflow and Planning metrics are now logged for each workflow that is planned by Pegasus. By default, they are logged to $PEGASUS_HOME/var/pegasus.log
To turn metrics logging off, set

pegasus.log.metrics false

To change the file to which the metrics are logged set

pegasus.log.metrics.file  path/to/log/file

Here is a snippet from the log file that shows what is logged

{
user = vahi
vogroup = pegasus-ligo
submitdir.base = /nfs/asd2/vahi/jbproject/Pegasus/dags
submitdir.relative = /vahi/pegasus-ligo/blackdiamond/run0064
planning.start = 2007-09-24T18:14:23-07:00
planning.end = 2007-09-24T18:14:29-07:00
properties =/nfs/asd2/vahi/jbproject/Pegasus/dags/vahi/pegasus-ligo/blackdiamond/run0064/pegasus.6766.properties
dax = /nfs/asd2/vahi/jbproject/Pegasus/blackdiamond_dax.xml
dax-label = blackdiamond
compute-jobs.count = 3
si-jobs.count = 1
so-jobs.count = 3
inter-jobs.count = 0
reg-jobs.count = 3
cleanup-jobs.count = 2
total-jobs.count = 14
}

[edit] Support for querying multiple replica catalogs

Vahi 15:19, 7 September 2007 (PDT)
Checked in a multiple replica catalog implementation that allows users to query different multiple catalogs at the same time.
To use it set

 pegasus.catalog.replica MRC
 


Each associated replica catalog can be configured via properties as follows. The user associates a variable name referred to as [value] for each of the catalogs, where [value] is any legal identifier (concretely [A-Za-z][_A-Za-z0-9]*)

For each associated replica catalogs the user specifies the following properties.
 pegasus.catalog.replica.mrc.[value]      to specify the type of replica catalog.
 pegasus.catalog.replica.mrc.[value].key  to specify a property name key for a
                                          particular catalog
 

For example, if a user wants to query two lrc's at the same time he/she can specify as follows

    pegasus.catalog.replica.mrc.lrc1 LRC
    pegasus.catalog.replica.mrc.lrc2.url rls://sukhna

    pegasus.catalog.replica.mrc.lrc2 LRC
    pegasus.catalog.replica.mrc.lrc2.url rls://smarty

In the above example, lrc1, lrc2 are any valid identifier names and url is the property key that needed to be specified.

[edit] Updated the sample.properties

Vahi 15:44, 28 August 2007 (PDT)
Updated the sample.properties file to include information about the new HEFT site selector.

[edit] Changed the Site Selector API

Vahi 15:44, 28 August 2007 (PDT)
The Site Selector API was changed to map a whole workflow. The earlier api, allowed for mapping at a job level only. All the site selector implementations were modified to make them conformant to the new API

[edit] Heft Based Site Selection

Vahi 15:41, 28 August 2007 (PDT)
Added a new site selector that is based on the HEFT processor scheduling algorithm.

The implementation assumes default data communication costs when jobs are not scheduled on to the same site. Later on this may be made more configurable.
The runtime for the jobs is specified in the transformation catalog by associating the pegasus profile key runtime with the entries.

The number of processors in a site is picked up from the attribute idle-nodes associated with the vanilla jobmanager of the site in the site catalog.

[edit] Logging of the pegasus build timestamp

Vahi 10:07, 16 August 2007 (PDT)
The timestamp of the pegasus build that is used to plan a workflow is now logged in the braindump file. The key is pegasus_build. This will alllow users to verify later which build of pegasus was used to plan workflows.

[edit] RLS java api bug fix 4114

Gmehta 12:59, 10 August 2007 (PDT)
globus_rls_client.jar updated with the bug fix 4114. Also added the jar for java 1.4 in lib/java1.4

[edit] Documented Properties

Vahi 16:48, 6 August 2007 (PDT)
The following properties were documented in the properties file

pegasus.dir.useTimestamp
pegasus.dir.storage.deep

[edit] Deep Directory structure on the stageout site

Vahi 16:44, 6 August 2007 (PDT)
Modified the deep directory structure creation on the remote output site, to create a Hashed Directory structure using the HashedFileFactory.
The Hashed Directory structure is in decimal format, rather than the defaul hex format to account for SCEC's requirements.
This is triggered by setting the property

pegasus.dir.storage.deep true

[edit] Passing of DAGMan parameters via properties in case of deferred planning

Vahi 17:27, 24 July 2007 (PDT)
In case of deferred planning, the properties that control DAGMan execution were not being passed as options to DAGMan. This is now fixed.
The following properties are being handled

pegasus.dagman.maxjobs
pegasus.dagman.maxpre
pegasus.dagman.maxidle
pegasus.dagman.maxpost


[edit] Scalable Directory structure on the stageout site

Vahi 16:44, 20 July 2007 (PDT)
Introduced a Boolean property pegasus.dir.storage.deep . On setting this property to true, the relative submit directory structure is replicated on the output site.
This allows for each partition's output files to be transferred to a partition specific directory on the stageout site.
Also introduced a property pegasus.dir.useTimestamp . This results in the timestamp being used to created the run number for the relative submit directory.

[edit] 2.0.1

[edit] 2.0.0

Personal tools