Mpox Workflow

By Matt Parker
Published in How Tos
June 01, 2022
2 min read
Mpox Workflow

Mpox is a double-stranded DNA virus. There is an ongoing outbreak of clade II (formerly West African clade) of the virus in multiple countries. Data has been generated for a number of these cases using Oxford Nanopore Technology sequencing and here we describe wf-mpx, a decentralised workflow to analyse this data on device, anywhere.

Data Analysis

A dive into the many excellent community posts on virological.org indicated that:

  • people were mapping to existing references,
  • creating a consensus based on this mapping,
  • or they were creating de novo assemblies;
  • and in either case, performing some manual review.

We wanted to empower those users who perhaps are keen to sequence mpox using Oxford Nanopore Technologies devices but don’t have the expertise or resources to throw together an analysis workflow. We have therefore released wf-mpx. By releasing this workflow in it’s nascent state anyone with ONT mpox data, be it metagenomics or something more targeted can get a draft consensus using EPI2ME Labs.

wf-mpx is by no means a comprehensive workflow for the creation of mpox consensus sequences or assemblies, but it might get you started analysing your data.

You should be particulalry careful using this workflow if you have amplicon or other targeted data, no trimming of adapters or primers is carried out by this workflow.

If you have any issues, thoughts, or suggestions please don’t hesitate to raise an issue for us on GitHub: epi2me-labs/wf-mpx.

Workflow Details

You can run the workflow in two ways:

  1. In EPI2ME Labs - you can click the workflow and complete the path to your fastq files. You can download EPI2ME Labs from here
  2. On the command line:
    nextflow run epi2me-labs/wf-mpx --fastq <PATH_TO_FOLDER_OF_FASTQ_FILES>

Workflow Steps

The workflow takes a single folder of fastq files (more coming soon) and:

  • Maps the reads using minimap2 to a reference from a choice of:
    • ON568298.1 - German sequence described here
    • MT903344.1 - mpox virus isolate MPXV-UK_P2 NCBI
    • MN648051.1 - mpox virus strain Israel_2018 NCBI
    • ON563414.1 - USA Center for Disease Control sequence NCBI
  • Assesses coverage
  • Keep only reads mapping to reference to exclude potential human reads
  • Calls variants with respect to that reference using medaka
  • Filters variants with <20x depth
  • Creates a draft consensus using bcftools from the variants and reference:
    • Coverage <20x is masked with ‘N’
    • Deletions are represented by ’-’
    • Insertions are in lowercase1
  • Produces an independent de-novo assembly using flye and medaka

Sample Report

The report contains a few useful plots to quality control your data which are described in more detail below. An example can be found here.

Read summary

This section contains two basic plots to show your read length distribution and the read quality scores. These are useful for troubleshooting your experiment.

Genome coverage

This plot shows the depth of coverage at each position along the mpox virus reference you chose to align or map read to. This plot also shows the location of:

  • SNPs: grey dots
  • Insertion/Deletions: blue bars

Variant Context

It has been noted that the mutations identified in the genome appear to be in a context that would suggest APOBEC3 host enzyme action. This plot categporises SNPs in their context to help highlight this observation. More information can be found in this excellent post by Áine O’Toole & Andrew Rambaut

All Variants

This is simply all of the variants called by medaka. This is filtered only by depth (>20x).

Flye Assembly

This plot shows the contigs produced by flye when attemping to assemble the reads.

Software Versions & Workflow Parameters

These sections details the versions of tools used in this workflow and also the parameters at execution.

Test Data

The git repository for wf-mpx includes test data provided by GSTT; Adela Medina, Luke Snell, Themis Charalampous, Rahul Batra, Jonathon Edgeworth. This can be found at wf-mpx/test_data/fastq/barcode01. The original source data can also be found on SRA here.


Tags

#workflows#nextflow

Share

Matt Parker

Matt Parker

Director, Clinical Bioinformatics Software

Table Of Contents

1
Data Analysis
2
Workflow Details
3
Sample Report
4
Test Data
5
Useful Links

Related Posts

Unexpected results, so now what?
July 02, 2024
3 min

Quick Links

TutorialsWorkflowsOpen DataContact

Social Media

© 2020 - 2024 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.