Python notebooks are a wonderful tool for sharing and collaborating on code. Built on the open-source backbone of the Python programming language, JupyterLab notebooks (their formal name) allow you to include code, text, formulas and images all in a single sharable file. What’s more, the ecosystem for sharing and running these files has expanded over the years, making it easy for many more people to use them. And most of these services are available for free!
For those of us in education, this is a game-changer. Students now have access to a powerful tool with just a few clicks. As my friend Dax says, “with a web browser, on even on a clunky machine, you now have access to a supercomputer.”
However, there are a lot of options out there, and if you’re just getting started with Python, it’s probably a bit confusing. So here is a quick list of some of the most common ways to share and run Python notebooks.
Tools for Viewing Notebooks
The following tools allow you to share and view python notebooks.
GitHub is a code sharing and collaboration service, built on top of the git version control system. While you can’t actually run any code on GitHub, it is an essential tool for making your code available to others. In fact, GitHub can be used for far more than just code. Thanks to its collaborative workflow and history-tracking features, you use it to share small datasets or even use it to write a textbook with your friends.
When it comes to Python notebooks, GitHub provides a great way to post your notebooks to share with others, such as your fellow researchers or students. It also renders them, so they can see what the file looks like, without having to load the notebook in another program. It won’t show you everything (such as any interactive elements), and others can’t run the code directly on GitHub, but they will see all the text, code and images at once, which is a lot more helpful than digging through a bunch code-only files.
I will use this GitHub linked file in the the other examples as well, which is why GitHub’s ability to share files is such an essential tool.
The nbviewer service allows you to enter a URL and see a static version of a notebook. Like GitHub, you can’t actually run or edit the notebook on nbviewer, but at least you can get a good look. This is helpful for when you run across a python notebook file on the web, often with the .ipynb extension, that you would like to view. In addition, GitHub sometimes has issues rendering notebooks, but if you copy the GitHub URL over to nbviewer, you should be able to see what it looks like.
Tools for Running Notebooks
The following tools allow you to run notebooks yourself, in decreasing order of complexity.
1. The Cloud
Setting up servers in the cloud is not for the faint of heart. But if you have the resources and ability, you can gain supercomputer like power without having to buy an entire supercomputer yourself. Once you have a cloud system setup, you can install JupyterLab for running python notebooks as a single user, or JupyterHub, which provides interactive notebook support for multiple users.
Pangeo is one community effort that aims to help researchers setup and use cloud computing resources (such as those from Microsoft Azure, Amazon AWS or Google Cloud) for their projects. If you’re at a university, you may also have High Performance Computing (HPC) available as part of an Advanced Research Computing (ARC) effort. If you’re really lucky, your department may already have a server setup with JupyterHub.
If you find yourself with very large datasets (on the scale of hundreds of GBs to TBs), a complex resource-intensive calculation, a need to setup a dedicated server to support an entire research group or class, or if you’re hoping to run a complex model, you will likely need to look into this option. But before that, you should try the far-simpler options below.
2. Your “local” machine (i.e. your laptop)
It turns out that many machines come with python already installed, especially Mac and Linux systems. But while you can use your system’s default python, it’s unlikely to include many of the libraries you will need. For example, many oceanographic datasets are provided as NetCDF files over THREDDS, and in order to work with them you will need to install the xarray and netcdf4 libraries, which are rarely included by default.
To setup a custom python environment on your machine, you can use virtual environments, Anaconda or Miniconda (which is basically a bare bones version of Anaconda, and the one I prefer). These tools allow you to setup multiple environments, allowing you to install multiple versions of python and various libraries alongside each other. (They also prevent you from messing up your system installed version, which could cause other problems.) With this approach, you can easily switch between different environments to debug code or run different applications that may have different, and sometimes conflicting, dependancies. Once you have one of these environment managers installed, you can use pip or conda to install specific versions of jupyter, xarray, netcdf, and other libraries as needed.
This approach is very powerful, especially when used on a cloud computing environment, but installing and configuring python environments correctly takes some time to learn. And if you’re not careful, you may end up with this. That’s why I usually recommend starting with Google Colab (which we’re getting to).
Binder is basically a cloud-based Jupyter server which you can use for free. It was initially sponsored by the Moore Foundation, and is now maintained as a resource for the community to “share their interactive repositories publicly.” On the plus side, it’s free, and you can configure the specific python environment required to run your notebooks. On the down side, when you launch a notebook on Binder, it only exists for a short period of time, so any work you do will need to be exported before the server shuts down.
As such, Binder is a great tool for demonstrating notebooks in class or as an activity, because it allows others to interactively follow along using just their web browser. That said, it takes some time for Binder to first load a repository, because it needs to install any custom libraries you specify. This can take several minutes or more, depending on how bogged down the server is. So, if you plan to use Binder in a class, it’s best to have students open up the web page several minutes before you really need it.
But I wouldn’t recommend it for research work, as you will have to jump through a bunch of hoops to save your work each time you work on your notebook.
To use Binder, you need to first upload your python notebook to GitHub, and then tell Binder to open that URL. However, it’s unlikely to work properly if you haven’t also setup a requirements.txt file with the proper dependancies (see this doc on Sample Binder Repositories for more). But this is easy to add. For example, here is the requirements.txt file in the same repository as our earlier example.
4. Google Colab
If you want easy, I highly recommend checking out Google Colab. Under the hood, it’s essentially a JupyterLab server that lets you run python notebooks right in your browser, just like the other tools, but Google has done all the hard work of setting up and maintaining the server for you.
Colab also provides a similar set of features as other Google Drive products, like Docs and Sheets. Your notebook files will appear in your Drive (as Colab files). You can also share files with your colleagues. In theory you can interactively collaborate on them as well, but it is rather rough (reminiscent of Docs from 10 years ago) and error prone if more than one person tries to run code at the same time, so I don’t recommend it.
When you first open a notebook in Colab, it connects to a fresh server that doesn’t have many python libraries installed. Thus, if you need any specific libraries, you will have to run
!pip install libraryname at the top of your code to install them each time you open the notebook. This generally doesn’t take long.
Much like Binder, your connection to a processing server is temporary, so any libraries you install or files that you create will be lost when the server shuts down. But unlike Binder, your actual python notebook file is constantly being saved in your Google Drive (like any other doc), so you won’t loose your work.
In my opinion, this makes Colab an ideal tool for students to quickly start playing with python. As long as they have a Google account, they can a) open up a blank notebook, or b) upload a notebook file that you give them to their drive to open, or c) open up a notebook directly from a GitHub URL, and start working.
But the proof is in the pudding…
Bottom line, if you’d like your students to start playing with Python, so they can dive into some cool oceanographic datasets and start doing their own research, I’d suggest starting with Google Colab (and GitHub) while they learn the basics. And then, as they move onto more complex problems, they can look into the other tools, but they might not even need to if the datasets they hope to work on are simple enough.