Introduction

This report contains tables and plots to help interpret the results of wf-amplicon. The workflow was run in variant calling mode. The individual sections of the report summarize the outcomes of the different steps of the workflow (read filtering, mapping against the reference file containing the amplicon sequences, variant calling).

Note: If the sequence IDs in the reference file contained special characters, they were replaced with underscores.

The input data contained:

3 samples:
barcode01, barcode02, barcode03

2 amplicons:
katG_NC_000962_3_2154725_2155670, rpoB_NC_000962_3_760285_761376

Note: The data was downsampled to 1500 reads per sample.

At a glance

Key results for the individual samples are shown below. You can use the dropdown menu to view the results for a different sample.

Reads

165

Bases

162,744

Mean length

986.3

Mean quality

13.8

Amplicons detected

2 / 2

Mean coverage across all amplicons

68.9

Smallest mean coverage for any amplicon

66.8

SNVs

2

Indels

0

Reads

132

Bases

130,231

Mean length

986.6

Mean quality

13.8

Amplicons detected

2 / 2

Mean coverage across all amplicons

55.1

Smallest mean coverage for any amplicon

49.4

SNVs

2

Indels

0

Reads

89

Bases

83,668

Mean length

940.1

Mean quality

13.5

Amplicons detected

2 / 2

Mean coverage across all amplicons

30.5

Smallest mean coverage for any amplicon

3.9

SNVs

2

Indels

0

Preprocessing

Some basic stats covering the raw reads and the reads remaining after the initial filtering step (based on length and mean quality) as well as after downsampling and trimming are illustrated in the table below.

Condition Reads Bases Min read length Max read length Mean quality
Raw 407 435.5 k 327 2,147 13.5
Filtered 407 435.5 k 327 2,147 13.5
Downsampled, trimmed 386 376.6 k 243 1,139 13.8

The following plots show the read quality and length distributions as well as the base yield after filtering (but before downsampling / trimming) for each sample (use the dropdown menu to view the plots for the individual samples).

Summary

The two tables below (one per tab) briefly summarize the main results of mapping the reads to the provided amplicon references and subsequent variant calling. Percentages of unmapped reads are relative to the number of reads for that particular sample. Other percentages are relative to the total number of reads / bases including all samples.

Sample alias Reads Bases Median read length Amplicons Unmapped Variants (indels)
barcode01
165 (43%)
162.7 k (43%)
985 2
21 (13%)
2 (0)
barcode02
132 (34%)
130.2 k (35%)
987 2
17 (13%)
2 (0)
barcode03
89 (23%)
83.7 k (22%)
952 2
22 (25%)
2 (0)
Amplicon Reads Bases Median read length Samples Mean cov. Mean acc. Variants (indels)
katG_NC_000962_3_2154725_2155670
185 (48%)
172.5 k (46%)
954 3 97.4 95.3 3 (0)
rpoB_NC_000962_3_760285_761376
141 (37%)
150.7 k (40%)
1099 3 97.1 95.9 3 (0)
Unmapped
60 (16%)
53.4 k (14%)
951 3 0.0 0.0 0 (0)

The following table breaks the results down further (one sample–amplicon combination per row).

Sample Amplicon Reads Bases Median read length Mean cov. Mean acc. Variants (indels)
barcode01 katG_NC_000962_3_2154725_2155670
70 (18%)
64.9 k (17%)
954 96.9 95.4 1 (0)
barcode01 rpoB_NC_000962_3_760285_761376
74 (19%)
79.2 k (21%)
1099 97.2 95.8 1 (0)
barcode01 Unmapped
21 (5%)
18.6 k (5%)
964 0.0 0.0 0 (0)
barcode02 katG_NC_000962_3_2154725_2155670
52 (13%)
48.0 k (13%)
953 96.5 95.3 1 (0)
barcode02 rpoB_NC_000962_3_760285_761376
63 (16%)
67.1 k (18%)
1099 96.7 95.9 1 (0)
barcode02 Unmapped
17 (4%)
15.1 k (4%)
963 0.0 0.0 0 (0)
barcode03 katG_NC_000962_3_2154725_2155670
63 (16%)
59.5 k (16%)
952 98.9 95.2 1 (0)
barcode03 rpoB_NC_000962_3_760285_761376
4 (1%)
4.4 k (1%)
1098 99.8 97.0 1 (0)
barcode03 Unmapped
22 (6%)
19.7 k (5%)
937 0.0 0.0 0 (0)

Depth of coverage

Coverage along the individual amplicon, (use the dropdown menu to view the plots for the individual amplicons).

Variants

Haploid variant calling was performed with Medaka. Variants with low depth (i.e. smaller than --min_coverage) are shown under the "Low depth" tab. The numbers in the "depth" column relate to the sequencing depth used to perform variant calling.

Sample Amplicon Position Ref. allele Alt. allele Type Depth
barcode01 katG_NC_000962_3_2154725_2155670 443 C G SNP 68
barcode01 rpoB_NC_000962_3_760285_761376 870 C T SNP 71
barcode02 katG_NC_000962_3_2154725_2155670 443 C G SNP 50
barcode02 rpoB_NC_000962_3_760285_761376 870 C T SNP 60
barcode03 katG_NC_000962_3_2154725_2155670 443 C G SNP 62
Sample Amplicon Position Ref. allele Alt. allele Type Depth
barcode03 rpoB_NC_000962_3_760285_761376 870 C T SNP 4

Software versions

Name Version
python 3.8.19
fastcat 0.18.6
ezcharts 0.11.2
dominate 2.9.1
numpy 1.24.4
pandas 2.0.3
pysam 0.22.0
si-prefix 1.3.3
seqkit v2.8.2
porechop 0.2.4
samtools 1.19.2
minimap2 2.28-r1209
mosdepth 0.3.7
miniasm 0.3-r179
racon 1.5.0
csvtk 0.27.2
medaka 1.12.0

Workflow parameters

Key Value
fastq test_data/fastq
reference test_data/reference.fasta
out_dir wf-amplicon
sample None
sample_sheet None
analyse_unclassified False
combine_results True
igv False
min_read_length 300
max_read_length None
min_read_qual 10
drop_frac_longest_reads 0.05
take_longest_remaining_reads True
reads_downsampling_size 1500
min_n_reads 40
number_depth_windows 100
min_coverage 20
override_basecaller_cfg None
medaka_target_depth_per_strand 150
force_spoa_length_threshold 2000
spoa_minimum_relative_coverage 0.15
spoa_max_allowed_read_length 5000
minimum_mean_depth 30
primary_alignments_threshold 0.7
threads 2
store_dir wf-amplicon/store_dir