The “module system”: The good, the bad and the ugly

Dealing with software package management can be a daunting task, even for experienced sysadmins. From the long forgotten graft, going through the modern and insanely tweakable portage to the (allegedly) multiplatform pkgsrc or the very promising xbps, several have tried to build an easy to use, community-driven, simple, with good dependency-handling, optimal, reliable, generic and portable packaging system.

In my experience on both sides of the iron, as a sysadmin and developer, none of them work as one would like to.

But first, let’s explore what several HPC centers have adopted as a solution and why… and most importantly, how to fix it eventually.

Continue reading →

Galaxy on UPPMAX, simplified

This post is intended to be shortened over time, eventually becoming an automated procedure… a wiki-post from dahlo’s magic until upstream patches settle down. All commands are issued on the cluster, unless otherwise stated.

Please report any issues via comments !

  1. Firsly, follow my earlier post on how to setup your own python virtual environment on UPPMAX.
  2. Once you have a prompt similar to: (devel) hostname ~$, you can continue, else, jump to 1.
  3. pip install drmaa Mercurial PyYAML
  4. Add the following env variables to your .bashrc:
    export DRMAA_LIBRARY_PATH=/bubo/sw/apps/build/slurm-drmaa/lib/libdrmaa.so
    export DRMAA_PATH=$DRMAA_LIBRARY_PATH
    
  5. Create a file ~/.slurm_drmaa.conf with the contents:
    job_categories: {
          default: "-A <your project_account> -p devel"
    }
    
  6. hg clone http://bitbucket.org/brainstorm/galaxy-central
  7. Edit universe_wsgi.ini from the provided sample so that it contains:
    admin_users = <your_admin_user>@example.com
    enable_api = True
    start_job_runners = drmaa
    default_cluster_job_runner = drmaa://-A <your project_account> -p devel
    
  8. On your local machine: ssh -f <your_user>@<uppmax> -L 8080:localhost:8080 -N
  9. On your local machine: Fire up your browser and connect to http://localhost:8080

As a betatester you may expect some issues when running galaxy in that way. Firstly, keep in mind that it’ll not perform as fast as a production-quality setup, it’s just a developer instance. Furthermore the node you’re in might have time limit restrictions, meaning that your instance will be killed in 30 minutes if you don’t reserve a slot beforehand as Martin recommended on the section “Run galaxy on a node”.

Galaxy community conference 2011

It has been a while since the GCC2011 took place in Lunteren, in the Netherlands. As a result of my visit, I gained some more valuable insight about what I like to call the metasploit of computational biology, if such an analogy could be made between computer security and biology.

A few words about Galaxy

With a 15+ core team and a very active contributor base, Galaxy is trying hard to provide a fix for the biomedical Babel in which life scientists work nowadays.

From its modest origin as a single perl script, later on morphing into a python web framework, Galaxy evolved rapidly. In short, Galaxy can be thought as the glue code that wraps and uniformizes a considerable amount of bioinformatics programs into a more consistent web interface.

But there’s much more under the hood: cluster job management, data conversion, dataset access controls, security, web services, etc… to name a few components and features.

“Everything is possible in Galaxy, As long as you can run it on the command line, you can incorporate it into Galaxy.”
– Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research

But not everything shines in the galaxy since NGS tool inclusion hogged its main site at some point. This fact only proves the point that single sites like Galaxy main, handling 130.000 cluster jobs/month and 1TiB uploads per week, face sustainability issues on the big datasets era we’re living in. As a result, other than imposing reasonable cluster quotas, interesting scaling strategies are being tested on real research projects. Therefore, federation and cloud computing are the next steps on this particular quest to the bio-universe.

One interesting realization on the conference is that not only labs are rolling their own Galaxy instances, there was a big sequencing industry player showing some interest on it too:

“Galaxy is an attractive workflow engine candidate”
– Kirt Haden, Illumina Inc

Continue reading →

How to install python modules with VirtualEnv… on UPPMAX

Why bother ?

Both virtualenv and virtualenvwrapper ease the hassle of managing python modules when one does not have root access on a system. In addition, no more “–prefix” flags are needed when installing modules. Or maybe better explained, from the official docs:

The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.

Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.

Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.

In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).

After this howto you’ll be able to create an isolated clean python environment where you can install as many python modules as you want and where your PYTHONPATH, PYTHONHOME and friends are not tainted… unless there’s a module system in the way, oh, my !

We’ll see how to tame that beast too. Keep reading.

Continue reading →

supervisord: one process to rule them all

supervisord logo

When one is developing a daemonized service, it’s rather usual to encounter minor errors that require no further attention than just restarting the daemon. That could be like not being able to connect to a remote machine for some time:


Traceback (most recent call last):
(...)
File "python2.6/urllib2.py", line 1170, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "python2.6/urllib2.py", line 1145, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

Granted, we want to fix this on the code so that the daemon does not die, but meanwhile it’s good to have a safety net that we can rely on. That’s were supervisord comes in handy. Let’s see how it’s done.

Continue reading →