Berlin BOSC Codefest 2013, day 1

I’m at the 4th Bioinformatics Open Source Conference in a warm and sunny Berlin.

After everyone found the venue, a preliminar brainstorming helped everyone organize the tasks across several workgroups:
Bioruby and Illumina Basespace, Visualization, Cloudbiolinux/Cloudman, Ontologies and metadata, Data Handling, Biopython, etc…

Our contribution

Valentine, Guillermo and I sat in front of Rory Kirchner and Brad Chapman to whip up SLURM support into their ipython-cluster-helper module. That would help SciLifeLab to move from the old bcbio-nextgen pipeline to the new ipython-backed version with all the neat parallelization tricks needed to run up to 1500 Human WGS.

The motivation behind our specific task is to:

Implement basic SLURM support by understanding the already existing classes, which already support SGE, LSF, Torque and Condor schedulers in ipython-cluster-helper.
Learning from that, introduce the use of the DRMAA connector, generalizing all the specific classes for the different job schedulers.
Ultimately port such generalization into ipython so that python scientific computations can be executed efficiently across different clusters around the world.

That was the idea, what really happened, as with any software jorney is that planning the trip differs somehow from actually walking it:

We realized that array jobs are not supported on SLURM <2.6.x and then implemented a [workaround using srun][14].

@UPPMAX, can you please update to #SLURM >= 2.6.x so that we can run job arrays and @RoryKirchner’s ipython-cluster-helper? #kthxbye

— Roman Valls (@braincode) July 17, 2013

We realized that Since DRMAA does not generate job templates to send via cmdline, it might be wiser to put that support directly into ipython.
Guillermo got his hands dirty with installing SLURM in a couple of vagrant machines so that we don’t have to wait long queues on our compute cluster.

Other stuff happening outside our coding bubble

During the day, the original draft ideas outlined in the board changed as participants got to talk to each other. If anything, that would be the highlight and the common modus operandi of most hackathons I’ve been involved in: how self-organized groups turn vague questions such as “what are you up to?” to useful working code and collaborations.

During a very quick walk around the room, I discovered a variant analysis pipeline based on Ruffus used by the Victorian Life Sciences Computation Initiative, University of Melbourne. This is meant to play well with Enis Afgan’s integration, or CloudBiolinux flavor, for the australian national cloud infrastructure.

From provenance standardization to workflow systems and a prototype to collect runtime metrics by Per Unneberg gives a grasp on the exciting ways left to walk for genomics and computational biology.

Definitely looking forward to some more action tomorrow morning :)