Berlin BOSC Codefest 2013, day 2

Here I am in my second day of the BOSC hackathon, polishing work from tomorrow, but also seeing new interesting projects taking off. These are my notes from the second day. See also my notes from the first day.

Pencils down for the coding

So today we try to wrap up the support for SLURM into ipython-cluster-helper. We realized that generalizing job managers is hard. Even when at the basic level they do the same, namely submit jobs and handle hardware resources, the different flavors exist for a reason.

Extra arguments or “native specifications” that do not fit in the normal job scheduler blueprint must be passed along nicely and that final mile effort takes some time to nail down.

Furthermore, a generalized DRMAA patch towards ipython parallel on upstream requires more than 2 days to whip up, so we instead move on to optimize what we have in two different fronts:

  1. Getting old SLURM versions to work with ipython-cluster-helper without job arrays in an efficient way.
  2. Automating the deployment of SLURM server(s) with a configuration management tool: Saltstack

Other projects

Per Unneberg manages to setup a proof of concept for a metrics client that reports runtime statistics and system information from different running processes to a web service. This idea stems from bioplanet’s Genome Comparison Analytics Testing. In that site, several pipelines are compared from the accuracy perspective, but nothing is showed about performance, questions such as:

  • How long did it take to run such a pipeline from beggining to end?
  • Which hardware resources such as CPUs and memory where you using?
  • Which organism(s) and to which depth were you running that pipeline against?

Some interesting talk around biolite, a data provenance system for bioinformatics arises as a side result of this work. In fact, the bcbio-nextgen pipeline includes preliminary support for such a system.

Guillermo unearths a cool project which he wanted to recover for a while, that is, pytravis, a python API to interact with our favourite continuous integration system at SciLifeLab.

Meanwhile the guys over Cloudbiolinux come up with nice automation and deployment scripts using puppet/chef that should eventually ease the pain for those genomics centers trying to tame their reference genomes.

Those are just a few of the many initiatives going on in this hackathon that is over today, tomorrow BOSC starts. If you want to know more, don’t miss the CodeFest 2013 official wiki, there’s a nice wrap up of the many parallell projects there.