Mpox is a double-stranded DNA virus. There is an ongoing outbreak of clade II (formerly West African clade) of the virus in multiple countries. Data has been generated for a number of these cases using Oxford Nanopore Technology sequencing and here we describe wf-mpx, a decentralised workflow to analyse this data on device, anywhere.

Data Analysis

A dive into the many excellent community posts on virological.org indicated that:

people were mapping to existing references,
creating a consensus based on this mapping,
or they were creating de novo assemblies;
and in either case, performing some manual review.

We wanted to empower those users who perhaps are keen to sequence mpox using Oxford Nanopore Technologies devices but don’t have the expertise or resources to throw together an analysis workflow. We have therefore released wf-mpx. By releasing this workflow in it’s nascent state anyone with ONT mpox data, be it metagenomics or something more targeted can get a draft consensus using EPI2ME Labs.

wf-mpx is by no means a comprehensive workflow for the creation of mpox consensus sequences or assemblies, but it might get you started analysing your data.

You should be particulalry careful using this workflow if you have amplicon or other targeted data, no trimming of adapters or primers is carried out by this workflow.

If you have any issues, thoughts, or suggestions please don’t hesitate to raise an issue for us on GitHub: epi2me-labs/wf-mpx.

Workflow Details

You can run the workflow in two ways:

In EPI2ME Labs - you can click the workflow and complete the path to your fastq files. You can download EPI2ME Labs from here
On the command line:
```
nextflow run epi2me-labs/wf-mpx --fastq <PATH_TO_FOLDER_OF_FASTQ_FILES>
```

Workflow Steps

The workflow takes a single folder of fastq files (more coming soon) and:

Maps the reads using minimap2 to a reference from a choice of:
- ON568298.1 - German sequence described here
- MT903344.1 - mpox virus isolate MPXV-UK_P2 NCBI
- MN648051.1 - mpox virus strain Israel_2018 NCBI
- ON563414.1 - USA Center for Disease Control sequence NCBI
Assesses coverage
Keep only reads mapping to reference to exclude potential human reads
Calls variants with respect to that reference using medaka
Filters variants with <20x depth
Creates a draft consensus using bcftools from the variants and reference:
- Coverage <20x is masked with ‘N’
- Deletions are represented by ’-’
- Insertions are in lowercase1
Produces an independent de-novo assembly using flye and medaka

Sample Report

The report contains a few useful plots to quality control your data which are described in more detail below. An example can be found here.

Read summary

This section contains two basic plots to show your read length distribution and the read quality scores. These are useful for troubleshooting your experiment.

Genome coverage

This plot shows the depth of coverage at each position along the mpox virus reference you chose to align or map read to. This plot also shows the location of:

SNPs: grey dots
Insertion/Deletions: blue bars

Variant Context

It has been noted that the mutations identified in the genome appear to be in a context that would suggest APOBEC3 host enzyme action. This plot categporises SNPs in their context to help highlight this observation. More information can be found in this excellent post by Áine O’Toole & Andrew Rambaut

All Variants

This is simply all of the variants called by medaka. This is filtered only by depth (>20x).

Flye Assembly

This plot shows the contigs produced by flye when attemping to assemble the reads.

Software Versions & Workflow Parameters

These sections details the versions of tools used in this workflow and also the parameters at execution.

Test Data

The git repository for wf-mpx includes test data provided by GSTT; Adela Medina, Luke Snell, Themis Charalampous, Rahul Batra, Jonathon Edgeworth. This can be found at wf-mpx/test_data/fastq/barcode01. The original source data can also be found on SRA here.