Posts Tagged ‘Jupyter’

Installing the R kernel for Jupyter on linux

Monday, March 7th, 2016

To paraphrase a saying:

Give a scientist a script and they will analyse data for a day. Teach a scientist to script and they won’t have any more time to do analysis for the rest of their lifetime.

Scripts are a great way to make reproducible workflows, but they are too technical for many situations where you have to report to scientists. Jupyter notebooks are a great way to do an analysis, and report the results at the same time. A Jupyter notebook can contain the analysis, the results, and the documentation that explains the results together in a single file, making it at once understandable and reproducible.

Jupyter started its life as IPython, or “interactive python”. But since they added support for other languages besides python, they had to rename. In principle you can install support for a wide range of scripting languages, but in practice it may be a little difficult to set up. Jupyter consists of multiple ‘kernels’, to get support for a different language you have to install that language, and then install the Jupyter kernel for it. It took me a while to get that working for the R scripting language. What follows are some notes I took during that process, in the hope that they are useful for anybody else trying to do the same thing.

So here is the ‘howto’

First you need to have R and Jupyter installed, but I’m assuming you already got that far. Anyway, that is the easy bit.

The R kernel for Jupyter is available here, and the installation instructions are on the same page. I’ve copied them here for convenience:

install.packages(c('rzmq','repr','IRkernel','IRdisplay'),
repos = c('http://irkernel.github.io/', getOption('repos')))
IRkernel::installspec()

When R is installing one of the dependencies, rzmq, on linux, you immediately run into Problem #1. It will complain:

missing header zmq.h

This happens quite often when installing R packages. It happens when the R package is a wrapper for a C library, and needs the development version of that C library to compile the wrapper. In general, the pattern to solve such a problem is simple. When you encounter:

Missing header xxx.h

the solution is to install the development version of the library with

apt-get install libxxx-dev

But then you get to Problem #2

sudo apt-get install libzmq-dev
... try installing the rzmq package again
interface.cpp:123:49: error: call to zmq::context_t::context_t(int, int) uses the default argument for parameter 2, which is not yet defined
zmq::context_t* context = new zmq::context_t(1);

Annoying! It turns out that there are multiple versions of zmq, and you need to install the right one:

sudo apt-get install libzmq3-dev

We’re not there yet. When trying to install rzmq again, we run into Problem #3:

g++ -I/usr/share/R/include -DNDEBUG -I../inst/cppzmq -fpic -O3 -pipe -g -c interface.cpp -o interface.o
interface.cpp:23:14: error: expected constructor, destructor, or type conversion before '(' token
static_assert(ZMQ_VERSION_MAJOR >= 3,"The minimum required version of libzmq is 3.0.0.");

Apparently there is a check for the version of libzmq, but it isn’t working. The installation doesn’t fail because we have the wrong version of libzmq. The installation fails because the R package doesn’t properly detect the version of libzmq. And the problem isn’t that libzmq is somehow misreporting its own version. The problem looks more like a syntax error, which is a weird internal error that shouldn’t occur in a published library. The syntax error is caused by the fact that rzmq uses C++11, (the 2011 version of C++), which is not the default version. We’ll have to fix rzmq. First get the source code:

git clone https://github.com/armstrtw/rzmq.git
cd rzmq

We have to edit one line of src/Makevars, as you can see from the following ‘patch’:

diff --git a/src/Makevars b/src/Makevars
index 7d6771c..c6fdce6 100644
--- a/src/Makevars
+++ b/src/Makevars
@@ -1,5 +1,5 @@
## -*- mode: makefile; -*-

CXX_STD = CXX11
-PKG_CPPFLAGS = -I../inst
+PKG_CPPFLAGS = -I../inst -std=c++11
PKG_LIBS = -lzmq

Now let’s install our own hacked package:

R CMD build .
R CMD INSTALL rzmq_0.8.0.tar.gz

(btw, why is build lowercase and INSTALL uppercase? this is exactly the type of thing why R is not my favourite scripting language)

We’re still not there.

Problem #4 – we installed the very latest rzmq fresh from source code, which now requires R version 3.1.0. I’m using a Long-Term-Support (LTS) version of Linux Mint and I don’t want to switch R versions as that could create a hassle elsewhere. Does rzmq really require 3.1.0, or does it merely say it does because it’s pretending to be cutting edge? Let’s hack it up some more.

git diff
diff --git a/DESCRIPTION b/DESCRIPTION
index ba50840..228e3e0 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -5,7 +5,7 @@ Maintainer: Whit Armstrong <armstrong.whit@gmail.com>
Author: Whit Armstrong <armstrong.whit@gmail.com>
Description: Interface to the ZeroMQ lightweight messaging kernel (see <http://www.zeromq.org/> for more information).
License: GPL-3
-Depends: R (>= 3.1.0)
+Depends: R (>= 3.0.0)
SystemRequirements: ZeroMQ >= 3.0.0 libraries and headers (see <http://www.zeromq.org/>; Debian packages libzmq3, libzmq3-dev, Fedora packages zeromq3, zeromq3-devel)
URL: http://github.com/armstrtw/rzmq
BugReports: http://github.com/armstrtw/rzmq/issues

And now the R kernel installs without further problems for me. I haven’t noticed any incompatibilities with earlier R versions yet, R 3.0.0 seems to be just fine.