EPI2ME Labs Tutorial

This meta-tutorial aims to introduce some of the advantages of using the EPI2ME Labs notebook environment over a vanilla JupyterLab notebook setup. We will walk through some of the unique features of the environment and how they can be leveraged to:

This tutorial does not aim to be a full introductory guide to JupyterLab notebooks (see References), but by the end of this tutorial you should have the knowledge to make full use of EPI2ME Labs to create and share reproducible analyses.

This notebook has no special compute requirements.

Introduction

EPI2ME Labs builds on JupyterLab to provide a working environment useful for common bioinformatic analyses. It is provided with "batteries included" for manipulating common bioinformatic file formats and Python modules for writing code.

Included software

The notebook server of EPI2ME Labs is built from Docker container templates provided by the Jupyter project. The base Jupyter container uses conda to install its the Jupyter notebook server and its pre-requisites. EPI2ME Labs builds out the base conda environment with common bioinformatics commandline tools. At the time of writing (11/01/2021) these include:

blast 2.9.0 | flye 2.8.1 | miniasm 0.3 | samtools 1.10
bedtools 2.29.2 | guppy_barcoder 4.2.2 | mosdpeth 0.2.9 | seqkit 0.13.2
bcftools 1.10.2 | medaka 1.2.1 | pomoxis 0.3.4 | sniffles 1.0.11
centrifuge 1.0.4 | minimap2 2.17 | racon 1.4.10 | tabix 0.2.6

Most of these tools can be run simply and directly in a notebook, for example the environment includes the samtools suite of tools for handling BAM and SAM files:

Here we have used the ! command prefix to intruct Jupyter to interpret the following text as a Linux shell command rather than python code. The standard output (and error) of the samtools index command is shown as the cell output.

Commandline tools outside the base conda environment

Some tools are undesirable to install into the case conda environment. These have been instead installed into their own conda or virtual environments. And example of this is medaka, if we try to run medaka directly we find:

For these tools, EPI2ME Labs includes a helper run script to locate and activate the enviroment in which the tool resides. For example to run medaka successfully it is sufficient to invoke:

The section Installating Additional Software has more information on how the !run command can be used with other software developers may wish to install into EPI2ME Labs.

Included Python packages

To provide a useful starting point for Python development, EPI2ME Labs includes also a number of well-known Python packages pre-installed (as of 11/01/2021):

bokeh 2.2.* | pyranges 0.0.76 | seaborn 0.10.*
ipywidgets 7.5.* | pysam 0.16.0.1 | xlrd 1.2.0
matplotlib 3.2.* | scikit-learn 0.23.* |
pandas 1.1.* | scipy 1.5.* |

These tools are immediately available to the Python interpreted used for notebook execution, for example we can import pandas and bokeh to start plotting data:

Although it is perfectly legitimate to use any plotting library within Jupyter, EPI2ME Labs notebooks use the aplanat package which acts as a convenience layer over bokeh to provide consistent styling and an unfussy, direct plotting interfact for developers. See the Plotting with aplanat section below for more details.

Installing additional software

While the pre-installed software in EPI2ME Labs should be sufficient for many notebook workflows, it may be desirable to install additional software. Software can be installed into either the base environment or a new conda environment. In the following sections we will discuss the benefits of each and how to choose which method to use

Software installed into the EPI2ME Labs environment is not persistent across restarts of the EPI2ME Labs server. This can be both useful and a hinderance. If there is software that you would like to see preinstalled into the environment, please create an issue on the Tutorials project on Github.

Installing software into the base environment

For the most part it should be possible to install software directly into the base conda environment hosting the preinstalled software. This is the simplest method and allows all software to be immediately available. If a piece of software has dependencies that conflict with other installed software the alternative method in the following section can be used.

In order to install software we recommend developers use mamba, a fast reimplementation of the conda package manager. The mamba command is a drop-in replacement for conda. The following will install Rebaler into the base conda environment:

In addition to installing new programs, we can also use mamba to install python packages to be used within notebook code. For example the following code cell will fail:

as snakemake is not installed into EPI2ME Labs by default. We can install it,

after which the import works:

Installing software into a new environment

Occasionally it may not be possible to install software into the base EPI2ME Labs environment. This can happen when a new piece of software (or its dependencies) conflict with software that is already installed. When attempting to install such software either the install will fail completely or change the software installed in the base environment so radically that EPI2ME Labs is not guaranteed to work correctly.

The resolution to the above issue is to install software into a distinct conda environment.

This method is not applicable to Python packages that are to be imported and used within the notebook, only software that is used with a command-line interface.

To install software into a distinct environment, simply create a new environment following the usual procedure:

Here we have created a new conda environment named my_environment and installed porechop into it. To use porechop we can use the !run command:

It is possible if required, to run commands within the new conda environment. This does require jumping through a small hoop however. The below recipe can be used to run arbitrary commands in a conda environment:

Here we have installed the Python implementation of the classic cowsay program into the my_environment environment created above. We can see that this is not installed in the base environment from the following:

But as before, we can run the program using the !run helper:

JupyterLab Extensions

To enhance the default JupyterLab experience, a small set of extensions have been included. For example you may have noticed our spiffing theme, which is based on the Mexico theme.

Cell play button

Toward the right-hand side of cells in EPI2ME Labs we have placed a Play button:

image.png

This allows users to execute a cell with moving the mouse cursor away from the cell to the top of the screen, or using keyboard shortcuts.

Code folding and hiding

JupyterLabs contains functionality to hide the contents of code cells by double clicking the blue rectangle to the left of the cell. For example the contents of the code cell below should be initially hidden and replace with ...:

The code can be revealed by clicking on ....

For notebooks of modest complexity these ... can become lost to the user and do not convey any information regarding the cell's contents. To improve this situation EPI2ME Labs allows to collapse a cell's contents except for a prelimary comment line:

The functionality is activate by clicking the grey box to the left of the cell. Naturally this behaviour is valid only for code cells containing an initial comment line.

Autorunning code cells

In is often required that to be useable to end users notebooks require some setup code to be run. A casual user may miss these cells and consequently the notebook fail to run subsequent code correctly. For this situation we EPI2ME Labs has functionality to automatically execute code cells when a notebook is initially opened.

To set a code cell to run automatically, right-click on the cell a select Toggle autorun cell at launch:

image.png

When active a purple dot image.png will be displayed in the top right-hand side of the cell.

This functionality should be used sparingly, it was developed in order to allow automatic initialisation of Jupyter Widgets (see Creating user interfaces)

Creating user interfaces

EPI2ME Labs contains the Jupyter Widgets python module in order to contruct user interfaces which do not rely on editing code to provide inputs.

Standard Jupyter Widgets

These can be imported and used directly, as in the documentation:

Form creation API

The epi2melabs Python module provides a template to construct rapidly complete forms with a consistent presentation:

Each InputSpec comprises three items:

The current values of the inputs are available as attributes of the form instance:

A summary report can be generated directly from the object:

This report contains a summary of invalid values according to the validation functions given; these information is available from the .validate method:

The system allows us to keep all the logic for accepting and checking inputs in once place.

Best practices for form elements

We recommend to apply two settings to forms to enhance the notebook experience for end users:

  1. Collapse the codecell containing the form contruction code by double-clicking the blue vertical bar to its left, to leave just the ... hint.
  2. Set the codecell to autorun when a notebook is opened such that the widget will render for the user immediately and default values are populated.

See the JupyterLab Extensions section for more information on these behaviours.

Advanced actions

Commonly we might want to provide more advanced actions and code to trigger from a form. This can be done through use of a Button:

The form above will populate the global cpuinfo variable with details from the input file. Here we just show the number found:

Plotting with aplanat

In the previous sections we have listed the common pre-installed software that is available within EPI2ME Labs for creating charts and reporting results. In this section we introduce aplanat, a Python library built on top of the popular bokeh charting library. aplanat was designed to simplify the use of bokeh to create interactive plots, particularly in a notebook setting.

Design philosophy

aplanat attempts to make constructing common plots as simple as possible by translating directly a users inputs into displayed data. Most plotting functions are of the form:

plot = plot_function(
    [series_1_x, series_2_x, ...], [series_1_y, series_2_y, ...],
    name=[series_1_name, series_2_name, ...],
    colors=[series_1_color, series_1_color, ...])

Data is provided to aplanat in a lowest-common-denominator fashion: as lists of vectors. This is distinct from other plotting libraries that rely on values being presented in specific data-structures.

Simple examples

To give an overview of aplanat let's start by sampling data from the normal distribution:

It is simple to create a scatter plot of these data:

Note of the x- and y-values are given within lists, this is because the arguments are lists of potentially more data series. For example to plot both y against x, and x against y:

In creating the plots above, aplanat has applied sensible default attributes to the plots and taken care of some bokeh boilerplate necessaries that the developer would otherwise have to include. These include:

In addition, note that the show() function accepts a background argument. This sets the colour of all background elements of the plot, here to the colour of the notebook cell output background.

For other plot types, developers are encouraged to peruse the source code, or review use cases presented in EPI2ME Labs notebooks. We don't currently have automatically generated documentation.

Infographics

Included in aplanat is the ability to create "infographics". These can be used to highlight key performance metrics in notebooks and reports:

The code above constructs an instance of an InfoGraphItems helper class. Each item added requires a label, value, icon, and optionally unit. Each of these should be self-explanatory. Numbers passed to the infographic are formatted using SI suffixes (k, M, G etc.) with the provided unit. To avoid any conversion developers can pass values of str type. Specification of the displayed icon is via the name of a font-awesome glyph (version 5.13.1).

Facetted plots

Most plotting functions within aplanat follow the "list of vectors" approach illustrated above. Although simple, this is not to everyone's taste. A particularly common plot to want to produce is a facet (or trellis) plot, displaying up to five dimensions of data. For this aplanat includes the facet_grid() function which accepts a structure array or pandas dataframe and a specification of columns names to plotting co-ordinates.

Creating standalone reports

Interactive notebooks are great, but sometimes it is desirable to create a report for posterity. It is possible to simply export a notebook to any number of formats from the >File menu of the EPI2ME Labs interface:

image.png

The result however is often filled with code when all we really want is a document containing tables and graphs displaying our data. aplanat makes this possible through its report API:

The above code creates a report with a title and lead statement, adds an introductory comment, and adds two sections to the report. Sections can be used to logically groups items together but otherwise serve no functional purpose. To add a plot to a section (or the main report), simply call the .plot() method with a bokeh Figure instance:

pandas dataframes can be added directly to reports as a formatted table using the .table() method:

It is also possible to add text commentary using markdown syntax:

The report can be saved as a self-contained HTML document using the .write() method:

Structuring report elements

In the above example we added report elements to report sections in an "in order" style: the report elements will appear in the report in the chronological order in which the code adding the elements was executed. Using report sections provides one method to control the order of elements. Another method is to use placeholder items:

Here we have constructed a named placeholder, before adding a markdown element. The placeholder is then filled with a markdown element by providing the key option to the .markdown() method.

Placeholders allow us to define the structure of a report ahead of time before any explicit report items are available; this can be useful in the context of writing fluent code in a notebook setting.

The placeholder functionality is possible as all form elements are given a unique key. Elements are assigned random keys by default but the .plot(), .table(), and .markdown() methods can all be passed an optional key argument. If it is required to replace a form element it is possible to add an element with the same key as a previously added item:

When initially creating a report it is possible to declare that all added items must be given an explicit key through the require_keys argument:

Feedback

Finally, EPI2ME Labs is a continually evolving product. If you have feedback or would like to suggest a feature, enhancement, or improvement for EPI2ME Labs please create an issue on the Tutorials project on Github.

References