<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>BrainBlog</title>
	<atom:link href="http://blogs.nopcode.org/brainstorm/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.nopcode.org/brainstorm</link>
	<description>braindumping myself</description>
	<lastBuildDate>Wed, 23 Nov 2011 10:20:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The &#8220;module system&#8221;: The good, the bad and the ugly</title>
		<link>http://blogs.nopcode.org/brainstorm/2011/11/23/module-system-bad-and-ugly/</link>
		<comments>http://blogs.nopcode.org/brainstorm/2011/11/23/module-system-bad-and-ugly/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 23:41:44 +0000</pubDate>
		<dc:creator>brainstorm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[university]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[uppnex]]></category>

		<guid isPermaLink="false">http://blogs.nopcode.org/brainstorm/?p=579</guid>
		<description><![CDATA[Dealing with software package management can be a daunting task, even for experienced sysadmins. From the long forgotten graft, going through the modern and insanely tweakable portage to the (allegedly) multiplatform pkgsrc or the very promising xbps, several have tried to build an easy to use, community-driven, simple, with good dependency-handling, optimal, reliable, generic and [...]]]></description>
			<content:encoded><![CDATA[<p>Dealing with software <a href="http://ianmurdock.com/solaris/how-package-management-changed-everything/">package management</a> can be a daunting task, even for experienced sysadmins. From the long forgotten <a href="http://peters.gormand.com.au/Home/tools/graft/graft-html">graft</a>, going through the modern and insanely tweakable <a href="http://www.gentoo.org/proj/en/devrel/handbook/handbook.xml?part=2&#038;chap=1">portage</a> to the (allegedly) multiplatform <a href="http://www.netbsd.org/docs/software/packages.html">pkgsrc</a> or the very promising <a href="http://code.google.com/p/xbps/">xbps</a>, several have tried to build an easy to use, community-driven, simple, with good dependency-handling, optimal, reliable, generic and portable packaging system.</p>
<p>In my experience on both sides of the iron, as a sysadmin and developer, <strong>none</strong> of them work as one would like to.</p>
<p>But first, let&#8217;s explore what several <acronym title='High Performance Computing'>HPC</acronym> centers have adopted as a solution and why&#8230; and most importantly, how to fix it eventually.</p>
<p><span id="more-579"></span></p>
<h2>The good</h2>
<p>Widely used in different research facilities, the <a href="http://modules.sf.net/">module system</a> allows users to choose different versions of several software. The approach is simple, just type <strong>&#8220;module load program/1.0&#8243;</strong> and off you go.</p>
<p>On the sysadmin side, it&#8217;s the <strong>same old familiar spell &#8220;tar xvfz &#038;&#038; make &#038;&#038; make install&#8221;</strong>, and a &#8220;vim program&#8221; to define the module script that will set PATH, LD_LIBRARY or other variables and whatnot.</p>
<p>Consequently, the time required to wrap a software is minimal, conferring sysadmins with speedy quick hack superpowers. After all, <a href="http://blog.jcuff.net/2011/04/velocity-in-research-computing-really.html">velocity in research does matter</a>, and <strong>getting things done</strong> to let research continue its way is mandatory.</p>
<p>Moreover, user-coded modules can be shared easily within the same cluster by simply tweaking MODULEPATH variable. What&#8217;s the catch ? <a href="http://en.wikipedia.org/wiki/Technical_debt">Technical debt</a> and most importantly, lack of <a href="http://www.opscode.com/chef/">automation</a>.</p>
<h2>The bad</h2>
<p>Software <strong>packaging is a time consuming task</strong> that shouldn&#8217;t be kept inside institutional cluster firewalls, <a href="https://github.com/scilifelab/modules.sf.net">but openly published</a>. Indeed, a single program could have been <strong>re-packaged</strong> a number of times on each academic cluster for each university department that has HPC resources. When new versions come up for each package the sysadmin has to take care of bumping it by creating directories and additional recipes. How does one justify this time investment ? <strong>It just doesn&#8217;t scale</strong>. Skip to &#8220;solutions?&#8221; section for some relief.</p>
<p>From a technical perspective, using package systems that are not shipped with the operating system introduces an <strong>extra layer of complexity</strong>. More often than not, updates on the base distribution will break compiled programs that rely on old libraries. <strong>Stacking package managers should be considered harmful</strong>.</p>
<p>Ruby, python and perl have their own mature way to install packages for most UNIXes, stacking package managers by rpm-packaging python or ruby modules, has <a href="http://stakeventures.com/articles/2008/12/04/rubygem-is-from-mars-aptget-is-from-venus">several</a> <a href="http://www.b-list.org/weblog/2008/dec/14/packaging/">bad</a> consequences. Granted, there are some concerns on uniformity, updates and security, but those again can be solved by the individual package managers.</p>
<p>But getting back to the <a href="http://modules.sf.net">module system</a>, <strong>how well does it play with cloud computing</strong> ?</p>
<p>It doesn&#8217;t, thankfully !</p>
<p>One would have to install all the modules, and re-package the software for the virtual instances. In contrast, existing package systems, be it rpm, deb, <a href="http://pypi.python.org/pypi">pip</a>, <a href="http://rubygems.org/">gem</a> or <a href="http://clojars.org/">lein</a> solved that by themselves. On top of that, the module system will tweak crucial system variables such as $PATH or $LD_LIBRARY_PATH with bad side effects for <a href="http://blogs.nopcode.org/brainstorm/2011/06/23/how-to-install-python-modules-with-virtualenv-on-uppmax/">python virtual environments</a> or any other user-defined PATHs.</p>
<p>Lastly, from a human resources perspective, writing <a href="http://modules.sf.net">modules</a> does not <strong>add value or expertise to your IT toolbox</strong>. On the other hand, search engines have something to say when looking up search terms such as <a href="http://duckduckgo.com/?q=experience+packaging+software+rpm+deb+job">job experience packaging software rpm deb</a>. Actually, being involved in open source communities via packaging can give you some very valuable insights on how open source projects work.</p>
<h2>The ugly</h2>
<p>With aging and relatively unmantained software, some bugs arise. Under some circumstances here&#8217;s what occurs:</p>
<pre>
$ modulecmd bash purge
*** glibc detected *** modulecmd: free(): invalid next size (fast): 0x0000000001b88050 ***
======= Backtrace: =========
(...)

$ export MODULEPATH=AAAAAAAAAAAAAAAAAAAAAAAAA:AAAAAAAAAAAAAAAAAAAAA:/bin/bash &#038;&#038; modulecmd bash purge
    *** glibc detected *** modulecmd: corrupted double-linked list: 0x00000000009c4600 ***
</pre>
<p>I would like to light a candle for those who dare running modulecmd with <a href="http://en.wikipedia.org/wiki/Setuid">suid bit</a>. I can only think of <a href="http://isp.surfnet.fi/aktuelltbilder/schukken.jpg">one sysadmin</a> that could do that while being totally self-confident.</p>
<h2>Solutions?</h2>
<p>Here&#8217;s some brainstorming that might help in the long run:</p>
<ol>
<li>Instead of complicating infrastructure, just state the software versions you are running in your publications. In python, <strong>pip freeze</strong> helps. Want an older version ? DIY.</li>
<li>Use <a href="http://stanford.edu/~pgbovine/cde.html">CDE</a>, <a href="http://rbenv.org/">rbenv</a> and <a href="http://pypi.python.org/pypi/virtualenv">virtualenv</a> before and after publishing if concerned about platform updates during your research.</li>
<li>Use <a href="http://cloudbiolinux.org/">virtual machine images</a> to reproduce experiments.</li>
<li>If you really need the module system, at least <a href="https://github.com/scilifelab/modules.sf.net">publish the modules somewhere</a> for people to reuse them.</li>
<li>If you are a sysadmin, get started with <a href="http://www.semicomplete.com/blog/tags/deb">FPM</a> as a first approach with the world of package management for your distribution.</li>
<li>Try to get those packages accepted upstream (<strong>best!</strong>) and/or create your own rpm/deb <a href="http://www.howtoforge.com/creating_a_local_yum_repository_centos">repo</a>.</li>
<li>Learn <a href="http://puppetlabs.com/">puppet</a> and/or <a href="http://www.opscode.com/chef/">chef</a>.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://blogs.nopcode.org/brainstorm/2011/11/23/module-system-bad-and-ugly/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Galaxy on UPPMAX, simplified</title>
		<link>http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/</link>
		<comments>http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 14:36:28 +0000</pubDate>
		<dc:creator>brainstorm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bio]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[uppnex]]></category>

		<guid isPermaLink="false">http://blogs.nopcode.org/brainstorm/?p=550</guid>
		<description><![CDATA[This post is intended to be shortened over time, eventually becoming an automated procedure&#8230; a wiki-post from dahlo&#8217;s magic until upstream patches settle down. All commands are issued on the cluster, unless otherwise stated. Please report any issues via comments ! Firsly, follow my earlier post on how to setup your own python virtual environment [...]]]></description>
			<content:encoded><![CDATA[<p>This post is intended to be shortened over time, eventually becoming an automated procedure&#8230; a wiki-post from <a href="http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html">dahlo&#8217;s magic</a> until upstream patches settle down. All commands are issued on the cluster, unless otherwise stated.</p>
<p>Please report any issues via comments !</p>
<ol>
<li>Firsly, follow my earlier <a href="http://blogs.nopcode.org/brainstorm/2011/06/23/how-to-install-python-modules-with-virtualenv-on-uppmax/" title="How to install python modules with VirtualEnv… on UPPMAX">post</a> on how to setup your own python virtual environment on UPPMAX.</li>
<li>Once you have a prompt similar to: <em>(devel) hostname ~$</em>, you can continue, else, jump to 1.</li>
<li>pip install drmaa Mercurial PyYAML</li>
<li>Add the following env variables to your .bashrc:
<pre>
export DRMAA_LIBRARY_PATH=/bubo/sw/apps/build/slurm-drmaa/lib/libdrmaa.so
export DRMAA_PATH=$DRMAA_LIBRARY_PATH
</pre>
</li>
<li>Create a file ~/.slurm_drmaa.conf with the contents:
<pre>
job_categories: {
      default: "-A &lt;your project_account&gt; -p devel"
}
</pre>
</li>
<li>hg clone http://bitbucket.org/brainstorm/galaxy-central</li>
<li>Edit universe_wsgi.ini from the provided sample so that it contains:
<pre>
admin_users = &lt;your_admin_user&gt;@example.com
enable_api = True
start_job_runners = drmaa
default_cluster_job_runner = drmaa://-A &lt;your project_account&gt; -p devel
</pre>
</li>
<li>On your local machine: <em>ssh -f &lt;your_user&gt;@&lt;uppmax&gt; -L 8080:localhost:8080 -N</em></li>
<li>On your local machine: Fire up your browser and connect to http://localhost:8080</li>
</ol>
<p>As a betatester you may expect some issues when running galaxy in that way. Firstly, keep in mind that it&#8217;ll not perform as fast as a <a href="http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Production%20Server" title="Running galaxy on production">production-quality setup</a>, it&#8217;s just a developer instance. Furthermore the node you&#8217;re in might have time limit restrictions, meaning that your instance will be killed in 30 minutes if you don&#8217;t reserve a slot beforehand as <a href="http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html" title="Galaxy on UPPMAX by Martin Dahlo">Martin</a> recommended on the section &#8220;Run galaxy on a node&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.nopcode.org/brainstorm/2011/08/22/galaxy-on-uppmax-simplified/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Galaxy community conference 2011</title>
		<link>http://blogs.nopcode.org/brainstorm/2011/07/11/galaxy-community-conference-2011/</link>
		<comments>http://blogs.nopcode.org/brainstorm/2011/07/11/galaxy-community-conference-2011/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 14:00:20 +0000</pubDate>
		<dc:creator>brainstorm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bio]]></category>
		<category><![CDATA[cloud]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[uppnex]]></category>

		<guid isPermaLink="false">http://blogs.nopcode.org/brainstorm/?p=474</guid>
		<description><![CDATA[It has been a while since the GCC2011 took place in Lunteren, in the Netherlands. As a result of my visit, I gained some more valuable insight about what I like to call the metasploit of computational biology, if such an analogy could be made between computer security and biology. A few words about Galaxy [...]]]></description>
			<content:encoded><![CDATA[<p>It has been a while since the <a href="http://wiki.g2.bx.psu.edu/GCC2011">GCC2011</a> took place in <a href="http://en.wikipedia.org/wiki/Lunteren">Lunteren</a>, in the Netherlands. As a result of my visit, I gained some more valuable insight about what I like to call the <a href="http://www.metasploit.com/">metasploit</a> of <a href="http://en.wikipedia.org/wiki/Computational_biology">computational biology</a>, if such an analogy could be made between computer security and biology.</p>
<h2>A few words about Galaxy</h2>
<p>With a <a href="http://wiki.g2.bx.psu.edu/Galaxy%20Team">15+ core team</a> and a <a href="http://dir.gmane.org/gmane.science.biology.galaxy.devel">very active</a> contributor base, <a href="http://galaxy.psu.edu/">Galaxy</a> is trying hard to provide a fix for the biomedical Babel in which life scientists work nowadays.</p>
<p>From its modest origin as a <strong>single perl script</strong>, later on morphing into a <strong>python web framework</strong>, Galaxy evolved rapidly. In short, Galaxy can be thought as the glue code that wraps and <strong>uniformizes</strong> a considerable amount of <strong>bioinformatics programs</strong> into a more consistent web interface.</p>
<p>But there&#8217;s much more under the hood: cluster job management, data conversion, dataset access controls, security, web services, etc&#8230; to name a few components and features.</p>
<blockquote><p><a href="http://wiki.g2.bx.psu.edu/Events/GCC2011?action=AttachFile&#038;do=get&#038;target=SixKeyInsights.pdf">&#8220;Everything is possible in Galaxy, As long as you can run it on the command line, you can incorporate it into Galaxy.&#8221;</a><br />
&#8211; Hans-Rudolf Hotz, Friedrich Miescher Institute for Biomedical Research</p></blockquote>
<p>But not everything shines in the galaxy since NGS tool inclusion hogged its <a href="http://main.g2.bx.psu.edu/">main site</a> at some point. This fact only proves the point that single sites like Galaxy main, handling 130.000 cluster jobs/month and 1TiB uploads per week, face sustainability issues on the big datasets era we&#8217;re living in. As a result, other than imposing reasonable cluster quotas, interesting <a href="http://www.biomedcentral.com/1471-2105/11/S12/S4">scaling</a> <a href="https://bitbucket.org/steder/galaxy-globus">strategies</a> are being tested on real research projects. Therefore, federation and cloud computing are the <a href="http://wiki.g2.bx.psu.edu/Future/Distributed%20Galaxy">next steps</a> on this particular quest to the bio-universe.</p>
<p>One interesting realization on the conference is that not only labs are rolling their own Galaxy instances, there was a big sequencing industry player showing some interest on it too:</p>
<blockquote><p><a href="http://wiki.g2.bx.psu.edu/Events/GCC2011?action=AttachFile&#038;do=get&#038;target=RunningGalaxyDRMAAJobsAsDifferentUsers.pdf">&#8220;Galaxy is an attractive workflow engine candidate&#8221;</a><br />
&#8211; Kirt Haden, Illumina Inc</p></blockquote>
<p><span id="more-474"></span></p>
<h2>Common concerns</h2>
<p>There are some <a href="http://wiki.g2.bx.psu.edu/Learn/FAQ#Central_Galaxy_server_or_Galaxy_source_distribution">FAQ</a> I&#8217;ve been asked by colleagues and that came up on the past conference too. Therefore, I would like to keep them here for future reference, feedback and further questions are very welcome via the comments&#8230; and their <a href="http://wiki.g2.bx.psu.edu/Support">support options</a>.</p>
<h3>Our compute cluster is not used due to IT policy restrictions, what should I do ?</h3>
<p>You might want to run it as a single user, as <a href="http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html">Martin Dahlö</a> describes, setting up a SSH tunnel. Obviously, this option has many problems when it&#8217;s aimed at non-developer scenarios: <a href="http://wiki.g2.bx.psu.edu/Admin/Config/Performance/Production%20Server">it does not scale</a>. Make sure you explain the <a href="http://wiki.g2.bx.psu.edu/Big%20Picture/Choices">big picture</a> and aim to reach consensus with your IT department. Again, some basic <a href="http://wiki.g2.bx.psu.edu/Events/GCC2011?action=AttachFile&#038;do=get&#038;target=SixKeyInsights.pdf">key insights</a> and common sense might help here.</p>
<h3>Are software versions kept in the history when running a workflow ?</h3>
<p>During the conference <a href="https://bitbucket.org/kanwei">Kanwei Li</a> came up with some patch to keep track of the software versions by appending a &lt;version&gt; tag on a particular tool&#8217;s xml. This tag just runs a &#8220;&#8211;version&#8221; when the tool is used and appends this information in the history. You might want to ask him through <a href="http://lists.bx.psu.edu/listinfo/galaxy-dev">galaxy-dev</a> mailing list if you&#8217;re interested in this feature.</p>
<h3>How&#8217;s Galaxy&#8217;s sample tracking moving forward ?</h3>
<p>There are currently two big sample tracking systems present on the galaxy sphere: the one already <a href="http://wiki.g2.bx.psu.edu/Admin/Sample%20Tracking/Demo">present on main</a> and <a href="http://wiki.g2.bx.psu.edu/Admin/Sample%20Tracking/Next%20Gen">some nextgen patches by Brad Chapman</a>. Try them out, or better yet, join the upcoming <a href="http://www.open-bio.org/wiki/BOSC_2011">BOSC2011</a> <a href="http://www.open-bio.org/wiki/Codefest_2011">Codefest</a> and improve or merge the current systems.</p>
<h3>What is the API status ?</h3>
<p>In short: <a href="https://bitbucket.org/galaxy/galaxy-central/src/8b97f197b759/lib/galaxy/web/api/">it is growing</a> as needed.<br />
Shorter: <strong>@web.expose_api</strong> decorator.<br />
Better explained: Read up by <a href="http://wiki.g2.bx.psu.edu/Events/GCC2011?action=AttachFile&#038;do=get&#038;target=GalaxyDeploymentandAPI.pdf">slide 26</a>.</p>
<h3>My cluster uses a custom job scheduler, will galaxy work with it ?</h3>
<p>If your batch system supports DRMAA, there&#8217;s a better chance to get it rolling. Check out the <a href="http://mdahlo.blogspot.com/2011/06/galaxy-on-uppmax.html">recent progress on SLURM</a> system for instance.</p>
<h3>How does galaxy splitting of datasets for embarassingly parallel tools ?</h3>
<p>There&#8217;s <a href="https://bitbucket.org/galaxy/galaxy-central/issue/79/split-large-jobs-over-multiple-nodes-for">a ticket</a> for that.</p>
<h2>Next stop: <a href="http://www.open-bio.org/wiki/BOSC_2011">BOSC2011</a> and <a href=http://www.iscb.org/ismbeccb2011"">ISMB</a></h2>
<p>I have just covered a few topics shown in the conference, but you can take the time to <a href="http://wiki.g2.bx.psu.edu/GCC2011">explore it further</a>, both by videos and slides. Obviously the galaxy evolution does not end up here, there&#8217;s still more to come in a few days in Vienna.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.nopcode.org/brainstorm/2011/07/11/galaxy-community-conference-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to install python modules with VirtualEnv&#8230; on UPPMAX</title>
		<link>http://blogs.nopcode.org/brainstorm/2011/06/23/how-to-install-python-modules-with-virtualenv-on-uppmax/</link>
		<comments>http://blogs.nopcode.org/brainstorm/2011/06/23/how-to-install-python-modules-with-virtualenv-on-uppmax/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 22:47:53 +0000</pubDate>
		<dc:creator>brainstorm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[howto]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[unix]]></category>
		<category><![CDATA[uppnex]]></category>

		<guid isPermaLink="false">http://blogs.nopcode.org/brainstorm/?p=484</guid>
		<description><![CDATA[Why bother ? Both virtualenv and virtualenvwrapper ease the hassle of managing python modules when one does not have root access on a system. In addition, no more &#8220;&#8211;prefix&#8221; flags are needed when installing modules. Or maybe better explained, from the official docs: The basic problem being addressed is one of dependencies and versions, and [...]]]></description>
			<content:encoded><![CDATA[<h2>Why bother ?</h2>
<p>Both <a href="http://pypi.python.org/pypi/virtualenv">virtualenv</a> and <a href="http://www.doughellmann.com/projects/virtualenvwrapper/">virtualenvwrapper</a> ease the hassle of managing python modules when one does not have root access on a system. In addition, no more &#8220;&#8211;prefix&#8221; flags are needed when installing modules. Or maybe better explained, from the official docs:</p>
<blockquote><p>
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform&#8217;s standard location is), it&#8217;s easy to end up in a situation where you unintentionally upgrade an application that shouldn&#8217;t be upgraded.</p>
<p>Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.</p>
<p>Also, what if you can&#8217;t install packages into the global site-packages directory? For instance, on a shared host.</p>
<p>In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn&#8217;t share libraries with other virtualenv environments (and optionally doesn&#8217;t access the globally installed libraries either).
</p></blockquote>
<p>After this howto you&#8217;ll be able to create an isolated clean python environment where you can install as many python modules as you want and where your PYTHONPATH, PYTHONHOME and friends are not tainted&#8230; unless there&#8217;s a <a href="http://modules.sourceforge.net/">module system</a> in the way, oh, my !</p>
<p>We&#8217;ll see how to tame that beast too. Keep reading.</p>
<p><span id="more-484"></span></p>
<h2>Go ahead</h2>
<p>First we&#8217;ll edit our <strong>.bashrc</strong>. The &#8220;~/opt/mypython&#8221; directory is needed in order to bootstrap the virtualenvs:</p>
<p><code><br />
# User specific aliases and functions</p>
<p>export PATH=$PATH:~/opt/mypython/bin<br />
export PYTHONPATH=~/opt/mypython/lib/python2.6/site-packages<br />
source ~/opt/mypython/bin/virtualenvwrapper.sh<br />
export WORKON_HOME=~/.virtualenvs<br />
</code></p>
<p>Then, we install virtualenv(wrapper) by running 3 commands:</p>
<p><code><br />
$ source ~/.bashrc >&#038; /dev/null &#038;&#038; mkdir -p $HOME/opt/mypython/lib/python2.6/site-packages<br />
$ easy_install --prefix=~/opt/mypython pip<br />
$ pip install virtualenvwrapper --install-option="--prefix=~/opt/mypython" &#038;&#038; source ~/.bashrc<br />
</code></p>
<p>Once we have that, we create a virtual environment called &#8220;devel&#8221;, or whatever name you prefer. That will ignore whatever is installed on the cluster (note the &#8211;no-site-packages flag):</p>
<p><code><br />
$ mkvirtualenv --python=python2.6 --no-site-packages devel<br />
</code></p>
<h2>Module system fixes</h2>
<p>Finally, if you want this virtual environment to go along well with <a href="http://modules.sourceforge.net/">the module system</a> in <a href="http://www.uppmax.uu.se/">UPPMAX</a> you should define the following code in <strong>~/.virtualenvs/devel/bin/postactivate</strong>:</p>
<p><code><br />
#!/bin/bash<br />
# This hook is run after this virtualenv is activated.</p>
<p>source ~/bin/reload_uppmax_modules.sh</p>
<p># We don't want UPPMAX's custom python<br />
PATH="${PATH/'/sw/comp/python/2.6.6_kalkyl/bin'}"</p>
<p># We unset PYTHONHOME set by the module system,<br />
# otherwise the system will not use the python<br />
# of virtualenv<br />
unset PYTHONHOME<br />
</code></p>
<p>You can have a look <a href="https://raw.github.com/brainstorm/scilifelab/master/scripts/reload_uppmax_modules.sh">at how I load my modules</a>, or skip reload_uppmax_modules entirely if you use another approach.</p>
<h2>Installing my crazy module</h2>
<p>Installing new modules is as simple as running &#8220;pip&#8221;, no more 90&#8242;s .tar.bz2 and environment variables manual munging needed:</p>
<p><code><br />
(devel)$ pip search cutadapt</p>
<p>cutadapt                  - trim adapters from high-throughput sequencing reads</p>
<p>(devel)$ pip install cutadapt</p>
<p>Downloading/unpacking cutadapt<br />
  Downloading cutadapt-0.9.4.tar.gz (46Kb): 46Kb downloaded<br />
  Running setup.py egg_info for package cutadapt<br />
Installing collected packages: cutadapt<br />
  Running setup.py install for cutadapt<br />
    building 'cutadapt.calign' extension<br />
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c lib/cutadapt/calignmodule.c -o build/temp.linux-x86_64-2.6/lib/cutadapt/calignmodule.o<br />
    gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.6/lib/cutadapt/calignmodule.o -o build/lib.linux-x86_64-2.6/cutadapt/calign.so<br />
    changing mode of build/scripts-2.6/cutadapt from 644 to 755<br />
    changing mode of /home/romanvg/.virtualenvs/devel/bin/cutadapt to 755<br />
Successfully installed cutadapt<br />
Cleaning up...<br />
</code></p>
<p>See your &#8220;(devel)&#8221; prefix on your prompt ? Remember that you can create as many virtual environments as you want with the <strong>mkvirtualenv</strong> command shown above&#8230; And switch between those environments with the &#8220;workon&#8221; command:</p>
<p><code><br />
$ cutadapt<br />
-bash: cutadapt: command not found<br />
$ workon devel<br />
(devel)$ cutadapt --version<br />
0.9.4<br />
</code></p>
<p>For the <a href="http://www.ruby-lang.org/en/">rubyists</a> out there, there&#8217;s a similar tool called <a href="https://rvm.beginrescueend.com/">RVM</a> and for the <a href="http://www.perl.org/">camels</a>, there&#8217;s <a href="http://terrarum.net/development/perl-virtual-environments.html#locallib">local::lib</a>.</p>
<p>Choose your weapon and enjoy your new <a href="http://en.wikipedia.org/wiki/User_space">userland</a> freedom <img src='http://blogs.nopcode.org/brainstorm/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.nopcode.org/brainstorm/2011/06/23/how-to-install-python-modules-with-virtualenv-on-uppmax/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>supervisord: one process to rule them all</title>
		<link>http://blogs.nopcode.org/brainstorm/2011/04/21/supervisord-one-process-to-rule-them-all/</link>
		<comments>http://blogs.nopcode.org/brainstorm/2011/04/21/supervisord-one-process-to-rule-them-all/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 12:26:48 +0000</pubDate>
		<dc:creator>brainstorm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[admin]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[sysadmin]]></category>
		<category><![CDATA[unix]]></category>

		<guid isPermaLink="false">http://blogs.nopcode.org/brainstorm/?p=463</guid>
		<description><![CDATA[When one is developing a daemonized service, it&#8217;s rather usual to encounter minor errors that require no further attention than just restarting the daemon. That could be like not being able to connect to a remote machine for some time: Traceback (most recent call last): (...) File "python2.6/urllib2.py", line 1170, in http_open return self.do_open(httplib.HTTPConnection, req) [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://supervisord.org/"><img src="http://blogs.nopcode.org/brainstorm/wp-content/uploads/2011/04/supervisord-150x60.gif" alt="supervisord logo" title="supervisord" width="150" height="60" class="alignleft size-thumbnail wp-image-470" /></a></p>
<p>When one is developing a daemonized service, it&#8217;s rather usual to encounter minor errors that require no further attention than just restarting the daemon. That could be like not being able to connect to a remote machine for some time:</p>
<p><code><br />
Traceback (most recent call last):<br />
(...)<br />
File "python2.6/urllib2.py", line 1170, in http_open<br />
   return self.do_open(httplib.HTTPConnection, req)<br />
File "python2.6/urllib2.py", line 1145, in do_open<br />
   raise URLError(err)<br />
urllib2.URLError: &lt;urlopen error [Errno 111] Connection refused&gt;<br />
</code></p>
<p>Granted, we want to fix this on the code so that the daemon does not die, but meanwhile it&#8217;s good to have a safety net that we can rely on. That&#8217;s were <a href="http://supervisord.org/">supervisord</a> comes in handy. Let&#8217;s see how it&#8217;s done.</p>
<p><span id="more-463"></span></p>
<p>Its <a href="http://supervisord.org/index.html">documentation</a> is comprehensive and well written, but one may need a &#8220;simplest&#8221; example out of the full-featured output from &#8220;echo_supervisord_conf&#8221;. The absolute barebones to get up and running is what I wanted to share with you:</p>
<pre>
; Ideally to be put under /etc/supervisord.conf
; you can refer to it via "supervisord -c"
; if the fileconf lays somewhere else
[supervisord]
nodaemon=true

[program:name_the_daemon_to_monitor]
command=/path/to/your/daemon
</pre>
<p><strong>IMPORTANT:</strong> Since supervisord is meant to be the parent of all your monitored processes, you should not have instances of your program running before supervisord. Instead, supervisord will take care of (re)spawning them for you.</p>
<p>Now, we launch supervisord:</p>
<pre>
$ supervisord
2011-04-21 13:01:34,159 INFO supervisord started with pid 14806
2011-04-21 13:01:35,161 INFO spawned: 'analyze_sequences' with pid 14807
2011-04-21 13:01:36,162 INFO success: analyze_sequences entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
</pre>
<p>Now, let&#8217;s do a reality check, kill the program it&#8217;s monitoring. In my case, &#8220;analyze_sequences&#8221;:</p>
<pre>
$ kill 14807
</pre>
<p>And supervisord should gracefully detect that its child dies and respawn it shortly after:</p>
<pre>
(...)
2011-04-21 14:09:12,764 INFO exited: analyze_sequences (exit status 143; not expected)
2011-04-21 14:09:13,768 INFO spawned: 'analyze_sequences' with pid 18380
2011-04-21 14:09:14,769 INFO success: analyze_sequences entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
</pre>
<p>So supervisord got that the exit status was unexpected (143 instead of 0 or 2) and put the program back into RUNNING state, sweet !</p>
<p>After this check, one would want to get rid of the &#8220;nodaemon&#8221; directive so that supervisord runs on background and revise echo_supervisord_conf command to include the fancier features it offers. But after what I&#8217;ve shown, the default settings seem very reasonable to me.</p>
<p>Det var lätt som en plett, eller ? <img src='http://blogs.nopcode.org/brainstorm/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.nopcode.org/brainstorm/2011/04/21/supervisord-one-process-to-rule-them-all/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

