Using virtual environments with Jupyter is broken

Why does it have to be so freaking complicated?


After trying to use Jupyter with virtual environments. (From Pexels by Cottonbro)


Jupyter notebook and the IDE-like Jupyter Lab are really popular among data scientists and researchers in general. It’s a great tool to explore datasets by code, do simple computations, plot data and results, and much more.


I like to use Jupyter notebooks for many reasons when trying new things or when looking at new data, but more often than not it drives me crazy with its many deficiencies.


One thing that really bothers me:

Why is it so cumbersome to use a virtual environment together with Jupyter?


Virtual environments (venvs) are one of the best things about python and they are commonly used by everyone to have some kind of reproducible runtime environment. Unfortunately, there are (at least) two different ways to use them in combination with Jupyter and both are needlessly convoluted. The complexity leads to some unexpected behavior if you are not careful.


In the next few paragraphs I will show you two ways to use Jupyter with virtual environments and one of the pitfalls to be aware of.


1. Register a virtual environment’s kernel

Image you have Jupyter installed in your system, because that is a very easy way to start experimenting and you don’t always require a virtual environment for starting a notebook that is given to you by someone else.


However, for your own projects, you always create a virtual environment, because it keeps your system and the scripts requirements clean:

$ python -m venv .venv
$ source .venv/bin/activate
(.venv) $ pip install ipykernel

Now you want to use this venv with your system’s Jupyter Lab. For this to work you have to register that ipykernel from your virtual environment to be used with your system Jupyter:

(.venv) $ python -m ipykernel install --user --name <internal venv name> --display-name <name that is displayed in Jupyter>

On Linux the kernel definition is saved in ~/.local/share/jupyter/kernels. If you don’t want to use that kernel anymore or the underlying venv doesn’t exist anymore you just delete the corresponding sub directory in the kernels directory.


That’s the point where I am at loss. Why isn’t this part a menu entry in the Jupyter notebook or lab environment? It’s not really complicated but cumbersome to do this every time you want to use a virtual environment.


2. Install Jupyter in the virtual environment

If you don’t have a Jupyter environment on your system, you can install Jupyter or in this case Jupyter Lab into your virtual environment:

(.venv) $ pip install jupyterlab

This works fine if you have only a few virtual environments. If you have many different environments, you will need to install Jupyter in each of them which takes up space on your hard drive. And even though hard drives became cheaper and bigger, many people working with data have large data sets that take up some space as well and in my experience, space is a valuable resource for many.


Anyway, back to business. After installing jupyterlab you have to deactivate and reactivate your virtual environment:

(.venv) $ deactivate
$ source .venv/bin/activate

If you don’t do that and want to start Jupyter Lab it will either not work or it will be super confusing if you installed Jupyter on your system before.


This is easy right? But what if you need to run notebooks that depend on different environments? You will need to install Jupyter in both virtual environments and start both of them. Unfortunately, you now have two browser tabs from different Jupyter instances that are hard to tell apart.


Co-existence of your system’s and your venv’s Jupyter

Your system’s Jupyter Lab can peacefully co-exist with Jupyter Labs in your virtual environments, but be careful because otherwise you can (and likely will) shoot yourself in the foot.


If you don’t do the step where you de- and reactivate your venv you will get in trouble. Even worse, everything inside your Jupyter Lab seems to tell you that you are in your virtual environment, but when you run a cell it isn’t run with the interpreter in your virtual environment. The behavior is hard to debug because of conflicting outputs:



One would think that the Jupyter Lab from the virtual environment is running, but it isn’t. Due to the exclamation marks in cells 1, 3, 4, and 5 the command is run in the shell that started the Jupyter Lab. The subprocess call also uses the underlying shell, because of the shell=True parameter. Only the last cell is run by the actual ipykernel in the notebook and that is the one from your system’s python.


The command line output actually tells you that it runs the system’s Jupyter but since most desktop environments automatically change the focus to the browser’s window, you just don’t notice it.


Conclusion


As you can see, working with Jupyter and virtual environments is not as straight-forward as it could (and should!) be. It certainly has some pitfalls that you should be aware of. It just took me a couple of hours to figure out why the second approach wasn’t working as I was expecting.


The first approach is probably the most sensible if you have many custom environments; however, it’s still a lot more cumbersome than I would like it to be.

5 views

Recent Posts

See All