Analysis of EPI2ME 16S CSV Output

Expected Duration: 10 minutes

The EPI2ME 16S (and WIMP) analyses allow the download of a summary table of the results. However this table does not contain full lineage information and so it is not immediately useful to create the Sankey tree diagrams that EPI2ME displays in its web interface.

The following short code fragments, demonstrate how to decorate the EPI2ME data table and aggregate the read counts.

An example 16S report can be found here.

Computational requirements for this tutorial:

Getting started

Before anything else we will create and set a working directory:

Install additional software

In order to decorate the EPI2ME results file with lineage information we will use the taxonkit tool. The codebox below will download this tool and also csvkit which we will use to convert the EPI2ME file from a comma-separated file to a tab-seperated file (which taxonkit requires)

Please note that the software installed is not persistent and this step will need to be re-run if you stop and restart the EPI2ME Labs server.

Taxonkit requires the NCBI taxonomy database to function, let's download and decompress that:

We now have all we need to get going:

Using your own data

If you wish to analyse your own data rather than the sample data, you can edit the value .fastq input variable below. To find the correct full path of a directory you can navigate to it in the Files browser to the left-hand side, right-click on the file and select Copy path:

image.png

The location shared with the EPI2ME labs server from your computer will show as /epi2melabs, for example a file located at /data/my_gridion_run/fastq_pass on your computer will appear as /epi2melabs/my_gridion_run/fastq_pass when it is the /data folder that is shared.

Data Entry

Please select to use the sample data, or enter your own.

After entering your inputs above and pressing Enter, select Run selected Cell and All Below from the Run menu at the top of the page. If you need to analyse a second file just come back here, change epi2me_results_file to a new value and then select Runtime > Run after.

Analysis

Let's first write a code function to read in and perform several conversions on the EPI2ME file. If you would like to use the methods of this notebook in your own code, this code cell is about all you will need:

We can now read the input file, annotate it with the lineage information, and write out the annotated table.

The output table is the same as the input EPI2ME results file with additional columns indicating the taxonomic ranks to with each read has been assigned. For example, here is the start of the table:

With this data table we can now extract counts of reads at any of the taxonomic ranks, for example:

Some notes

The EPI2ME table provides a taxid, species_taxid and a genus_taxid. EPI2ME provides some sanity checking on its classification. If the top hits of a read are from different genera the taxid will be empty, that is to say the read is "Unclassified". The code above is using the value of the taxid field to derive the lineage information.