The "module system": The good, the bad and the ugly

Roman Valls Guimerà bio photo By Roman Valls Guimerà Comment

Dealing with software package management can be a daunting task, even for experienced sysadmins. From the long forgotten graft, going through the modern and insanely tweakable portage to the (allegedly) multiplatform pkgsrc or the very promising xbps, several have tried to build an easy to use, community-driven, simple, with good dependency-handling, optimal, reliable, generic and portable packaging system.

In my experience on both sides of the iron, as a sysadmin and developer, none of them work as one would like to.

But first, let’s explore what several HPC centers have adopted as a solution and why… and most importantly, how to fix it eventually.

The good

Widely used in different research facilities, the module system allows users to choose different versions of several software. The approach is simple, just type “module load program/1.0″ and off you go.

On the sysadmin side, it’s the same old familiar spell “tar xvfz && make && make install”, and a “vim program” to define the module script that will set PATH, LD_LIBRARY or other variables and whatnot.

Consequently, the time required to wrap a software is minimal, conferring sysadmins with speedy quick hack superpowers. After all, velocity in research does matter, and getting things done to let research continue its way is mandatory.

Moreover, user-coded modules can be shared easily within the same cluster by simply tweaking MODULEPATH variable. What’s the catch ? Technical debt and most importantly, lack of automation.

The bad

Software packaging is a time consuming task that shouldn’t be kept inside institutional cluster firewalls, but openly published. Indeed, a single program could have been re-packaged a number of times on each academic cluster for each university department that has HPC resources. When new versions come up for each package the sysadmin has to take care of bumping it by creating directories and additional recipes. How does one justify this time investment ? It just doesn’t scale. Skip to “solutions?” section for some relief.

From a technical perspective, using package systems that are not shipped with the operating system introduces an extra layer of complexity. More often than not, updates on the base distribution will break compiled programs that rely on old libraries. Stacking package managers should be considered harmful.

Ruby, python and perl have their own mature way to install packages for most UNIXes, stacking package managers by rpm-packaging python or ruby modules, has several bad consequences. Granted, there are some concerns on uniformity, updates and security, but those again can be solved by the individual package managers.

But getting back to the module system, how well does it play with cloud computing ?

It doesn’t, thankfully !

One would have to install all the modules, and re-package the software for the virtual instances. In contrast, existing package systems, be it rpm, deb, pip, gem or lein solved that by themselves. On top of that, the module system will tweak crucial system variables such as $PATH or $LD_LIBRARY_PATH with bad side effects for python virtual environments or any other user-defined PATHs.

Lastly, from a human resources perspective, writing modules does not add value or expertise to your IT toolbox. On the other hand, search engines have something to say when looking up search terms such as job experience packaging software rpm deb. Actually, being involved in open source communities via packaging can give you some very valuable insights on how open source projects work.

The ugly

With aging and relatively unmantained software, some bugs arise. Under some circumstances here’s what occurs:

$ modulecmd bash purge
*** glibc detected *** modulecmd: free(): invalid next size (fast): 0x0000000001b88050 ***
======= Backtrace: =========
(...)

$ export MODULEPATH=AAAAAAAAAAAAAAAAAAAAAAAAA:AAAAAAAAAAAAAAAAAAAAA:/bin/bash && modulecmd bash purge
    *** glibc detected *** modulecmd: corrupted double-linked list: 0x00000000009c4600 ***

I would like to light a candle for those who dare running modulecmd with suid bit. I can only think of one sysadmin that could do that while being totally self-confident.

Solutions?

Here’s some brainstorming that might help in the long run:

  1. Instead of complicating infrastructure, just state the software versions you are running in your publications. In python, pip freeze helps. Want an older version ? DIY.
  2. Use CDE, rbenv and virtualenv before and after publishing if concerned about platform updates during your research.
  3. Use virtual machine images to reproduce experiments.
  4. If you really need the module system, at least publish the modules somewhere for people to reuse them.
  5. If you are a sysadmin, get started with FPM as a first approach with the world of package management for your distribution.
  6. Try to get those packages accepted upstream (best!) and/or create your own rpm/deb repo.
  7. Learn puppet and/or chef.
comments powered by Disqus