September 5th, 2013 — Uncategorized
Disclaimer: Those are my opinions only and not from my employer, etc, etc…
Today it has been two months since I joined the International Neuroinformatics Coordinating Facility, INCF located inside the Karolinska Institutet campus in Stockholm. Coincidentally I happened to land the job in the mist of a Neuroinformatics conference:
Before that, I spent almost 3 years in another field of (data) science, genomics. I think I’m extremely lucky to be involved on those two different cutting-edge computational life sciences disciplines, so rich and diverse at the science level and yet pretty similar in infrastructure needs: more storage, more processing and more standards needed.
Also today I got to answer a series of seemingly routine questions prior to attending a workshop (EUDAT). While I was writing, I realized that I was drawing a nice portrait of today’s data science wins and struggles, be it genomics, neuroscience or other data-hungry sciences I might encounter during my career.
Suddenly I couldn’t resist to share my braindumpings, I hope you enjoy the read
Continue reading →
July 19th, 2013 — Uncategorized
Here I am in my second day of the BOSC hackathon, polishing work from tomorrow, but also seeing new interesting projects taking off. These are my notes from the second day. See also my notes from the first day.
Pencils down for the coding
So today we try to wrap up the support for SLURM into ipython-cluster-helper. We realized that generalizing job managers is hard. Even when at the basic level they do the same, namely submit jobs and handle hardware resources, the different flavors exist for a reason.
Extra arguments or “native specifications” that do not fit in the normal job scheduler blueprint must be passed along nicely and that final mile effort takes some time to nail down.
Furthermore, a generalized DRMAA patch towards ipython parallel on upstream requires more than 2 days to whip up, so we instead move on to optimize what we have in two different fronts:
- Getting old SLURM versions to work with ipython-cluster-helper without job arrays in an efficient way.
- Automating the deployment of SLURM server(s) with a configuration management tool: Saltstack
Continue reading →
July 18th, 2013 — Uncategorized
I’m at the 4th Bioinformatics Open Source Conference in a warm and sunny Berlin.
After everyone found the venue, a preliminar brainstorming helped everyone organize the tasks across several workgroups:
Bioruby and Illumina Basespace, Visualization, Cloudbiolinux/Cloudman, Ontologies and metadata, Data Handling, Biopython, etc…
Continue reading →
May 20th, 2013 — Uncategorized
Some PEP‘s have revolved around the problem of software versioning and dependency tracking.
So, in addition to having some blueprints such as those proposed by the Semantic Versioning guidelines, one needs specifics on how to integrate those practices in our day to day work with version control systems.
Setuptools saves the day by introducing versioning via git tags. In a post by Douglas Creager a strategy to use setuptools with git tags is devised. The workflow for tagging a new version results in:
- Tag your release via git tag if the changes are significant.
- Run python setup.py install, to bump the version on the filesystem.
- git push.
The following code makes it happen:
# Fetch version from git tags, and write to version.py.
# Also, when git is not available (PyPi package), use stored version.py.
version_py = os.path.join(os.path.dirname(__file__), 'version.py')
version_git = subprocess.check_output(["git", "describe"]).rstrip()
with open(version_py, 'r') as fh:
version_git = open(version_py).read().strip().split('=')[-1].replace('"','')
version_msg = "# Do not edit this file, pipeline versioning is governed by git tags"
with open(version_py, 'w') as fh:
fh.write(version_msg + os.linesep + "__version__=" + version_git)
As an addition to the git tags workflow proposed by Douglas, the ‘__version__’ attribute will be stored in version.py file. This allows the versions to be tracked even when our git repository is not available (i.e, via PyPi package installation), or when such a version needs to be queried from inside your own package.
Thanks Guillermo and Brad for the feedback and suggestions on this strategy.
April 29th, 2013 — Uncategorized
This is an on-demand blog post, none of the actors are real institutions nor people, anything resembling real life might be pure coincidence
So there’s a day, that day when an organization realizes that there’s a real need to have a solid cloud platform as an official infrastructure offering. Admit it, we all have some idle cycles we could make better use of.
A bad cloud
Then someone owning some computer resources types some commands frenetically in a console and voilà, a beta cloud service is born, a fiction dialog between user and cloud provider follows:
- This is great! I want to have an account on your new service, where can I get it?
- Well, you have to come at our offices, we will scan your passport, have a 1 hour long meeting and then give you an account.
- Ermm, ok, I just want to use that service…
In the meeting, there’s an introduction on how to use the service, many people did not prepare their machines before the session and they get stuck by the overcomplicated client installation instructions, which involve installing pre-compiled binaries and config files inside a .tar.bz (ABI issues galore!). Next, you are given a password via SMS, “abc123″, which you cannot change (nor are encouraged to). You should explicitly ask the admins to change it for you.
Dirty secret: if they are not forced to change it, nobody ever does.
After editing some cloud templates text files, your first instance is up and running. Time to clone your CloudBioLinux copy and get some bioinformatics software installed in it… Unfortunately, it does not take very long to discover that the base distribution is more than 3 releases old. The user emails the imaginary beta cloud support and says:
- Hi cloud-support! Is it possible to have the newest Ubuntu release as an image?
- No it’s not at the moment.
- Emm, ok, I tried to apt-get dist-upgrade but it just runs out of space, can I get more space in the VM to do that upgrade myself then? One cannot do much with 4GB of disk these days, you know.
- No, it is not possible, you can use the 1TB NFS-mounted scratch space instead.
- Why isn’t it possible? Anyway, I see no straightforward way to use that scratch as an extension of the OS, while the VM is running, and I cannot access the filesystem offline and move, say, /usr away without doing some hackish stuff involving squashfs, ramdisks, etc… this is actually giving me more headaches than is worth.
- I’m sorry, we cannot bundle another distribution for you.
So what can we learn from that experience? What can a bad cloud do to become a better cloud?
- Distributing a readily installable and tested client CLI package for the most popular platforms instead of a precompiled .tar.gz would have cut down that 1 hour long meeting to nil. Documentation should never be a substitute nor shortcut for a tested, directly installable package.
- Distributing passwords, even via SMS, should adhere to basic good password policies at all times, even in beta services. Go double factor authentication if you fancy it.
- All services should be auto-provisioned. Asking sysadmins to perform routine operations like changing passwords should be off the table.
- Dimensioning a cloud (disk, memory, network interfaces) is not an easy task if the users have wildly different needs, but at least, it should be possible to easily increase VM image space, within reasonable limits. Other metrics such as RAM, network interfaces, DNS records, mountpoints should be directly accessible to the user, auto-provisioned.
- Creating new cloud images from vanilla OS’s automatically should be in place somehow and before launching the beta.
One year passes, some more console typing and moving to new hardware resources should get the service a new face, ready for a second try.
The same issues arise, instead of automating the deployment of the whole cloud to other machines, it just has been moved to the new hardware. There is no evidence of automation being done since last year.
A better cloud
Automated testing is about software, since clouds are software, why not automate bits and pieces of the deployment until it becomes fully automatic? It is easier said than done, it takes a great deal of patience to go and:
- Build and test a cloud component.
- Automate its deployment, testing it elsewhere.
- Take the whole cloud stack down, recreating it again from scratch.
- Automate basic user-side (stress)-testing: create instance, record a DNS change, attach new volumes, destroy instance, etc…
Automation and testing are hard, it takes time to get them right and not overfit your immediate environment. But look, those guys over there seem to have gotten it right:
- So you only need my public SSH key? That’s all? No meetings nor passport, fingerprints, blood samples or photos?
- That’s exactly right, just login as root, break as much as you want in your own cloud, we can wipe your whole stuff out in less than 20 minutes. We’ll of course be gathering metrics from outside, just in case we detect something bad coming out your instance(s). We don’t want to get in your way.
- Nice! What about having the latest Ubuntu release…
- We just provisioned it as we speak (true story).
- I’m launching some hadoop jobs right now. It took me a few minutes to provision the nodes. thank you guys, you’re awesome!
Continue reading →