My take on the ELK stack: best opensource dashboard

A dashboard with pretty plots and numbers about your organization… It shouldn’t be that difficult, right?

Luckily, ElasticSearch+Logstash+Kibana did came up with a nice stack to solve this problem. Here’s my result and how I solved some integration hiccups. The resulting beta dashboard, as it is now, looks like this:

DataSpace beta operations panel

Many many many other blog posts have addressed this issue, but I would like to share a couple of tweaks I came up with while working on it.

Continue reading

OwnCloud + iRODS: a small step towards popularizing scientific data sharing?

iRODS logo


OwnCloud logo

Context and iDROP

This is an open, on-demand blog post and drafty software specification.

When I started at INCF, among other duties I inherited the supervision of an ambitious data sharing project called DataSpace. After going through its system architecture and python helper tools, it seems that the system follows good design principles.

Still, from the operational and usability sides, there is still plenty of room for improvement.

One of the most commonly mentioned drawbacks with the system has to do with how the data is presented to the researchers on the web. At the moment of writing this, an iDROP web interface is the canonical interface to access DataSpace for end users. The web client integrates well with the underlying iRODS infrastructure. One can perform common data operations plus manage its metadata attributes, as any commandline user would do with the underlying iRODS i-commands.

And even share (publish) a file as a (non-shortened, ugly) public link to share data with researchers:

OwnCloud development at INCF

I also took charge of leading an ongoing contract with OwnCloud Inc to completion on the first prototype of an iRODS-OwnCloud integration. After some weeks of testing, debugging, realizing about legacy PHP limitations and plenty of help from Chris Smith and a profficient OwnCloud core developer, a working proof of concept was put together with PRODS, a PHP-based iRODS client.

Today it works on both OwnCloud community and enterprise editions.

Despite it being a proof of concept having some serious performance issues, at least two scientific facilities have reported they are testing it already. Even if these are good news, it does need more work to be a robust solution. In the next lines, I will be describing what is needed to make it so, I will try to be as specific as possible.

But first, please have a look at the following GitHub pullrequest for some context on the current issues that need fixing:

Continue reading

INCF and the quest for global data sharing in neuroscience

Disclaimer: Those are my opinions only and not from my employer, etc, etc…

Today it has been two months since I joined the International Neuroinformatics Coordinating Facility, INCF located inside the Karolinska Institutet campus in Stockholm. Coincidentally I happened to land the job in the mist of a Neuroinformatics conference:

INCF Neuroinformatics congress 2013

Before that, I spent almost 3 years in another field of (data) science, genomics. I think I’m extremely lucky to be involved on those two different cutting-edge computational life sciences disciplines, so rich and diverse at the science level and yet pretty similar in infrastructure needs: more storage, more processing and more standards needed.

Also today I got to answer a series of seemingly routine questions prior to attending a workshop (EUDAT). While I was writing, I realized that I was drawing a nice portrait of today’s data science wins and struggles, be it genomics, neuroscience or other data-hungry sciences I might encounter during my career.

Suddenly I couldn’t resist to share my braindumpings, I hope you enjoy the read :)

Continue reading

Berlin BOSC Codefest 2013, day 2

Here I am in my second day of the BOSC hackathon, polishing work from tomorrow, but also seeing new interesting projects taking off. These are my notes from the second day. See also my notes from the first day.

Pencils down for the coding

So today we try to wrap up the support for SLURM into ipython-cluster-helper. We realized that generalizing job managers is hard. Even when at the basic level they do the same, namely submit jobs and handle hardware resources, the different flavors exist for a reason.

Extra arguments or “native specifications” that do not fit in the normal job scheduler blueprint must be passed along nicely and that final mile effort takes some time to nail down.

Furthermore, a generalized DRMAA patch towards ipython parallel on upstream requires more than 2 days to whip up, so we instead move on to optimize what we have in two different fronts:

  1. Getting old SLURM versions to work with ipython-cluster-helper without job arrays in an efficient way.
  2. Automating the deployment of SLURM server(s) with a configuration management tool: Saltstack

Continue reading