Running bcbio-nextgen and CWL with Docker for Mac

The case for local, dockerized bioinformatics

Dockerized portable bioinformatics

Since I started using docker on my local computer (a MacBook Air 11” at the time of writing this), I encountered issues with boot2docker+VirtualBox combo. Installing VirtualBox (plus guest additions), Docker Toolbox via Brew Casks, most of the problems stem from the volume sharing and UID/GID mappings between host and docker containers.

Then to my relief I requested access to the private Docker for Mac beta program, which uses a lightweight hypervisor and base image (hyperkit+alpine) to run containers on OSX, conveniently hiding the installation woes. This setup worked quite well and while docker on OSX does not yet support GPU passthrough processing yet (for those interested in things like Tensorflow and Keras), docker for osx is a really convenient local docker setup.

Frustratingly, my local docker setup was always accompanied by tests on our local HPC cluster that has limited docker support and a small AWS instance. Almost invariably I resumed my development efforts on the HPC/AWS setups instead, because, you know, beta.

In contrast, a large share of my colleagues do use OSX as a workstation mosh shell to their respective HPC clusters where they launch test runs. Why don’t we use the local CPUs a bit more for testing?

That was the state of the art with bcbio+docker+osx: not using it locally…

Until today!

bcbio-nextgen and CWL

As peers in the bioinformatics community have noticed, the Common Workflow Language is getting workflow and pipeline engines migrating to CWL as their underlying workflow representation, including but not limited to Arvados, Galaxy and bcbio-nextgen.

In order to have a minimal development environment while migrating bcbio-nextgen internal logic to CWL, Brad Chapman wrapped a small demo that can launch bcbio_vm, CWL and Toil (SLURM support under Toil is a WIP right now).

So please go ahead and:

  1. Install Docker for Mac.
  2. Install Miniconda if you don’t have it already.
  3. conda install bcbio-nextgen-vm -c bioconda.
  4. wget https://s3.amazonaws.com/bcbio/cwl/test_bcbio_cwl.tar.gz && tar xvfz test_bcbio_cwl.tar.gz && cd test_bcbio_cwl.
  5. chmod +x run_cwltool.sh && ./run_cwltool.

Those will download a ~2GB bcbio docker image and then run a sample bioinformatics workflow under docker for OSX in your computer.

Thanks to Robin Andeer for being one of the first brave souls to test this out and please feel free to report back your experiences running this experimental setup in the comments section below!