Monkeypox (MPX) is a double-stranded DNA virus. There is an ongoing outbreak of the West African clade of the virus in
multiple countries. Data has been generated for a number of these cases using Oxford Nanopore Technology sequencing
and here we describe
wf-mpx, a decentralised workflow to analyse this data on device, anywhere.
A dive into the many excellent community posts on virological.org indicated that:
We wanted to empower those users who perhaps are keen to sequence MPX using Oxford Nanopore Technologies
devices but don’t have the expertise or resources to throw together an analysis workflow. We have therefore released
wf-mpx. By releasing this workflow in it’s nascent state anyone with ONT Monkeypox data, be it metagenomics or something more targeted can get a draft consensus using EPI2ME Labs.
wf-mpx is by no means a comprehensive workflow for the creation of Monkeypox consensus sequences or assemblies,
but it might get you started analysing your data.
You should be particulalry careful using this workflow if you have amplicon or other targeted data, no trimming of adapters or primers is carried out by this workflow.
If you have any issues, thoughts, or suggestions please don’t hesitate to raise an issue for us on GitHub: epi2me-labs/wf-mpx.
You can run the workflow in two ways:
nextflow run epi2me-labs/wf-mpx --fastq <PATH_TO_FOLDER_OF_FASTQ_FILES>
The workflow takes a single folder of fastq files (more coming soon) and:
minimap2to a reference from a choice of:
bcftoolsfrom the variants and reference:
The report contains a few useful plots to quality control your data which are described in more detail below. An example can be found here.
This section contains two basic plots to show your read length distribution and the read quality scores. These are useful for troubleshooting your experiment.
This plot shows the depth of coverage at each position along the Monkeypox virus reference you chose to align or map read to. This plot also shows the location of:
It has been noted that the mutations identified in the genome appear to be in a context that would suggest APOBEC3 host enzyme action. This plot categporises SNPs in their context to help highlight this observation. More information can be found in this excellent post by Áine O’Toole & Andrew Rambaut
This is simply all of the variants called by
medaka. This is filtered only by depth (>20x).
This plot shows the contigs produced by
flye when attemping to assemble the reads.
These sections details the versions of tools used in this workflow and also the parameters at execution.
The git repository for
wf-mpx includes test data provided by GSTT; Adela Medina, Luke Snell, Themis Charalampous, Rahul Batra, Jonathon Edgeworth.
This can be found at wf-mpx/test_data/fastq/barcode01.
The original source data can also be found on SRA here.