10.7 years ago by
My favorite way of defining pipelines is by writing Makefiles, about which you can find a very good introduction in Software Carpentry for Bioinformatics: http://swc.scipy.org/lec/build.html .
Although they have been originally developed for compiling programs, Makefiles allow to define which operations are needed to create each file, with a declarative syntax that it is a bit old-style but still does its job. Each Makefile is composed of a set of rules, which define operations needed to calculate a file and that can be combined together to make a pipeline. Other advantages of makefiles are conditional execution of tasks, so you can stop the execution of a pipeline and get back to it later, without having to repeat calculations. However, one of the big disadvantages of Makefiles is its old syntax... in particular, rules are identified by the names of the files that they create, and there is no such thing as 'titles' for rules, which make more tricky.
I think one of the best solutions would be to use BioMake, that allow to define tasks with titles that are not the name of the output files. To understand it better, look at this example: you see that each rule has a title and a series of parameters like its output, inputs, comments, etc.
Unfortunately, I can't make biomake to run on my computer, as it requires very old dependencies and it is written in a very difficult perl. I have tried many alternatives and I think that rake is the one that is more close to biomake, but unfortunately I don't understand ruby's syntax.
So, I am still looking for a good alternative... Maybe one day I will have to time to re-write BioMake in python :-)