March 4th, 2013 — Uncategorized

Sometimes education can be a daunting process. It is quite obvious from the student side, we all have gone through exercises, corrections, learning what we did wrong on some of them, fixing and learning from those errors, rinse and repeat. That’s how it generally works.
On the teacher’s side, correcting assignments is easy and unbiased unless the number of students is considerably large. At
one of the sessions of our now official KTH course “DD3436 Scientific Programming in Python for Computational Biology” I was given the task to hold a session on software testing and continuous integration in Python… for around 50 students.
Continue reading →
November 23rd, 2011 — Uncategorized
Dealing with software package management can be a daunting task, even for experienced sysadmins. From the long forgotten graft, going through the modern and insanely tweakable portage to the (allegedly) multiplatform pkgsrc or the very promising xbps, several have tried to build an easy to use, community-driven, simple, with good dependency-handling, optimal, reliable, generic and portable packaging system.
In my experience on both sides of the iron, as a sysadmin and developer, none of them work as one would like to.
But first, let’s explore what several HPC centers have adopted as a solution and why… and most importantly, how to fix it eventually.
Continue reading →
July 11th, 2011 — Uncategorized
It has been a while since the GCC2011 took place in Lunteren, in the Netherlands. As a result of my visit, I gained some more valuable insight about what I like to call the metasploit of computational biology, if such an analogy could be made between computer security and biology.
A few words about Galaxy
With a 15+ core team and a very active contributor base, Galaxy is trying hard to provide a fix for the biomedical Babel in which life scientists work nowadays.
From its modest origin as a single perl script, later on morphing into a python web framework, Galaxy evolved rapidly. In short, Galaxy can be thought as the glue code that wraps and uniformizes a considerable amount of bioinformatics programs into a more consistent web interface.
But there’s much more under the hood: cluster job management, data conversion, dataset access controls, security, web services, etc… to name a few components and features.
“Everything is possible in Galaxy, As long as you can run it on the command line, you can incorporate it into Galaxy.”
– Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research
But not everything shines in the galaxy since NGS tool inclusion hogged its main site at some point. This fact only proves the point that single sites like Galaxy main, handling 130.000 cluster jobs/month and 1TiB uploads per week, face sustainability issues on the big datasets era we’re living in. As a result, other than imposing reasonable cluster quotas, interesting scaling strategies are being tested on real research projects. Therefore, federation and cloud computing are the next steps on this particular quest to the bio-universe.
One interesting realization on the conference is that not only labs are rolling their own Galaxy instances, there was a big sequencing industry player showing some interest on it too:
“Galaxy is an attractive workflow engine candidate”
– Kirt Haden, Illumina Inc
Continue reading →
June 23rd, 2011 — Uncategorized
UPDATE:
This documentation below has been superseeded by a much simpler, generalized and automated alternative: VirtualEnv-burrito.
Continue reading →
April 21st, 2011 — Uncategorized

When one is developing a daemonized service, it’s rather usual to encounter minor errors that require no further attention than just restarting the daemon. That could be like not being able to connect to a remote machine for some time:
Traceback (most recent call last):
(...)
File "python2.6/urllib2.py", line 1170, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "python2.6/urllib2.py", line 1145, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>
Granted, we want to fix this on the code so that the daemon does not die, but meanwhile it’s good to have a safety net that we can rely on. That’s were supervisord comes in handy. Let’s see how it’s done.
Continue reading →