Workflow Amplicon Sequencing report

Introduction

This report contains tables and plots to help interpret the results of wf-amplicon. The workflow was run in variant calling mode. The individual sections of the report summarize the outcomes of the different steps of the workflow (read filtering, mapping against the reference file containing the amplicon sequences, variant calling).

Note: If the sequence IDs in the reference file contained special characters, they were replaced with underscores.

The input data contained:

3 samples:
barcode01, barcode02, barcode03

2 amplicons:
katG_NC_000962_3_2154725_2155670, rpoB_NC_000962_3_760285_761376

Note: The data was downsampled to 1500 reads per sample.

At a glance

Key results for the individual samples are shown below. You can use the dropdown menu to view the results for a different sample.

Reads

165

Bases

162,744

Mean length

986.3

Mean quality

13.8

Amplicons detected

2 / 2

Mean coverage across all amplicons

68.9

Smallest mean coverage for any amplicon

66.8

SNVs

Indels

Reads

132

Bases

130,231

Mean length

986.6

Mean quality

13.8

Amplicons detected

2 / 2

Mean coverage across all amplicons

55.1

Smallest mean coverage for any amplicon

49.4

SNVs

Indels

Reads

Bases

83,668

Mean length

940.1

Mean quality

13.5

Amplicons detected

2 / 2

Mean coverage across all amplicons

30.5

Smallest mean coverage for any amplicon

3.9

SNVs

Indels

Preprocessing

Some basic stats covering the raw reads and the reads remaining after the initial filtering step (based on length and mean quality) as well as after downsampling and trimming are illustrated in the table below.

Condition	Reads	Bases	Min read length	Max read length	Mean quality
Raw	407	435.5 k	327	2,147	13.5
Filtered	407	435.5 k	327	2,147	13.5
Downsampled, trimmed	386	376.6 k	243	1,139	13.8

The following plots show the read quality and length distributions as well as the base yield after filtering (but before downsampling / trimming) for each sample (use the dropdown menu to view the plots for the individual samples).

Summary

The two tables below (one per tab) briefly summarize the main results of mapping the reads to the provided amplicon references and subsequent variant calling. Percentages of unmapped reads are relative to the number of reads for that particular sample. Other percentages are relative to the total number of reads / bases including all samples.

Sample alias	Reads	Bases	Median read length	Amplicons	Unmapped	Variants (indels)
barcode01	165 (43%)	162.7 k (43%)	985	2	21 (13%)	2 (0)
barcode02	132 (34%)	130.2 k (35%)	987	2	17 (13%)	2 (0)
barcode03	89 (23%)	83.7 k (22%)	952	2	22 (25%)	2 (0)

Amplicon	Reads	Bases	Median read length	Samples	Mean cov.	Mean acc.	Variants (indels)
katG_NC_000962_3_2154725_2155670	185 (48%)	172.5 k (46%)	954	3	97.4	95.3	3 (0)
rpoB_NC_000962_3_760285_761376	141 (37%)	150.7 k (40%)	1099	3	97.1	95.9	3 (0)
Unmapped	60 (16%)	53.4 k (14%)	951	3	0.0	0.0	0 (0)

The following table breaks the results down further (one sample–amplicon combination per row).

Sample	Amplicon	Reads	Bases	Median read length	Mean cov.	Mean acc.	Variants (indels)
barcode01	katG_NC_000962_3_2154725_2155670	70 (18%)	64.9 k (17%)	954	96.9	95.4	1 (0)
barcode01	rpoB_NC_000962_3_760285_761376	74 (19%)	79.2 k (21%)	1099	97.2	95.8	1 (0)
barcode01	Unmapped	21 (5%)	18.6 k (5%)	964	0.0	0.0	0 (0)
barcode02	katG_NC_000962_3_2154725_2155670	52 (13%)	48.0 k (13%)	953	96.5	95.3	1 (0)
barcode02	rpoB_NC_000962_3_760285_761376	63 (16%)	67.1 k (18%)	1099	96.7	95.9	1 (0)
barcode02	Unmapped	17 (4%)	15.1 k (4%)	963	0.0	0.0	0 (0)
barcode03	katG_NC_000962_3_2154725_2155670	63 (16%)	59.5 k (16%)	952	98.9	95.2	1 (0)
barcode03	rpoB_NC_000962_3_760285_761376	4 (1%)	4.4 k (1%)	1098	99.8	97.0	1 (0)
barcode03	Unmapped	22 (6%)	19.7 k (5%)	937	0.0	0.0	0 (0)

Depth of coverage

Coverage along the individual amplicon, (use the dropdown menu to view the plots for the individual amplicons).

Variants

Haploid variant calling was performed with Medaka. Variants with low depth (i.e. smaller than --min_coverage) are shown under the "Low depth" tab. The numbers in the "depth" column relate to the sequencing depth used to perform variant calling.

Sample	Amplicon	Position	Ref. allele	Alt. allele	Type	Depth
barcode01	katG_NC_000962_3_2154725_2155670	443	C	G	SNP	68
barcode01	rpoB_NC_000962_3_760285_761376	870	C	T	SNP	71
barcode02	katG_NC_000962_3_2154725_2155670	443	C	G	SNP	50
barcode02	rpoB_NC_000962_3_760285_761376	870	C	T	SNP	60
barcode03	katG_NC_000962_3_2154725_2155670	443	C	G	SNP	62

Sample	Amplicon	Position	Ref. allele	Alt. allele	Type	Depth
barcode03	rpoB_NC_000962_3_760285_761376	870	C	T	SNP	4

Software versions

Name	Version
python	3.8.19
fastcat	0.18.6
ezcharts	0.11.2
dominate	2.9.1
numpy	1.24.4
pandas	2.0.3
pysam	0.22.0
si-prefix	1.3.3
seqkit	v2.8.2
porechop	0.2.4
samtools	1.19.2
minimap2	2.28-r1209
mosdepth	0.3.7
miniasm	0.3-r179
racon	1.5.0
csvtk	0.27.2
medaka	1.12.0

Workflow parameters

Key	Value
fastq	test_data/fastq
reference	test_data/reference.fasta
out_dir	wf-amplicon
sample	None
sample_sheet	None
analyse_unclassified	False
combine_results	True
igv	False
min_read_length	300
max_read_length	None
min_read_qual	10
drop_frac_longest_reads	0.05
take_longest_remaining_reads	True
reads_downsampling_size	1500
min_n_reads	40
number_depth_windows	100
min_coverage	20
override_basecaller_cfg	None
medaka_target_depth_per_strand	150
force_spoa_length_threshold	2000
spoa_minimum_relative_coverage	0.15
spoa_max_allowed_read_length	5000
minimum_mean_depth	30
primary_alignments_threshold	0.7
threads	2
store_dir	wf-amplicon/store_dir