Galaxy community conference 2011

It has been a while since the GCC2011 took place in Lunteren, in the Netherlands. As a result of my visit, I gained some more valuable insight about what I like to call the metasploit of computational biology, if such an analogy could be made between computer security and biology.

A few words about Galaxy

With a 15+ core team and a very active contributor base, Galaxy is trying hard to provide a fix for the biomedical Babel in which life scientists work nowadays.

From its modest origin as a single perl script, later on morphing into a python web framework, Galaxy evolved rapidly. In short, Galaxy can be thought as the glue code that wraps and uniformizes a considerable amount of bioinformatics programs into a more consistent web interface.

But there’s much more under the hood: cluster job management, data conversion, dataset access controls, security, web services, etc… to name a few components and features.

“Everything is possible in Galaxy, As long as you can run it on the command line, you can incorporate it into Galaxy.”
– Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research

But not everything shines in the galaxy since NGS tool inclusion hogged its main site at some point. This fact only proves the point that single sites like Galaxy main, handling 130.000 cluster jobs/month and 1TiB uploads per week, face sustainability issues on the big datasets era we’re living in. As a result, other than imposing reasonable cluster quotas, interesting scaling strategies are being tested on real research projects. Therefore, federation and cloud computing are the next steps on this particular quest to the bio-universe.

One interesting realization on the conference is that not only labs are rolling their own Galaxy instances, there was a big sequencing industry player showing some interest on it too:

“Galaxy is an attractive workflow engine candidate”
– Kirt Haden, Illumina Inc

Continue reading →