Bonito basecalling with R9.4.1

By Chris Wright
Published in Data Releases
October 15, 2020
1 min read

Updated 2020-10-30: This page was edited to reflect the formal release of Bonito v0.3.0

We are please to announce the addition of bonito basecalling results to the GM24385 dataset. Bonito is a research-grade, open source basecaller utilising the PyTorch library; its development explores alternative basecalling frameworks to those use in the product-grade Guppy basecalling software.

The Bonito basecalling for the GM24385 dataset was performed using version 0.3.0, driven by the same katuali analysis pipeline as for the initial dataset release. The Bonito basecaller was provided as input the per-chromosome .fast5 files created in the initial pipeline via alignment of the Guppy 4.0.11 basecalls. This allows for easy comparison of results on subsets of the data (but may lead to subtle side-effects). For example the analysis data structure contains now entries of the form:

gm24385_2020.09/analysis/r9.4.1/{flowcell}/guppy_{suffix}/align_unfiltered/{chromosome}/bonito_v0.3.0/
├── align_unfiltered
│   ├── align_to_ref.log
│   ├── basecall_stats.log
│   ├── calls2ref.bam
│   ├── calls2ref.bam.bai
│   └── calls2ref_stats.txt
├── basecalls.fastq.gz
└── basecalls.fastq.gz_summary.tsv

The file basecalls.fastq.gz contains the basecalling results from Bonito. The quality scores in these files have been mocked as the pre-release build of Bonito used does not yet provide quality scores. Similar to the main folder structure the align_unfiltered directory contains unfiltered alignments of the basecalls to the reference sequence (calls2ref.bam) along with text files summarizing the properties of the alignments.

Comparison with Guppy 4.0.11 basecalls

As a basis for comparison with the current Guppy basecaller we can use the alignment summary files for both the Guppy and Bonito basecalls. To simplify the analysis we compare only chromosome 1 data for a single flowcell; we can download the files with:

aws s3 cp --no-sign-request s3://ont-open-data/gm24385_2020.09/analysis/r9.4.1/20200914_1354_6B_PAF27096_e7c9eae6/guppy_v4.0.11_r9.4.1_hac_prom/align_unfiltered/chr1/calls2ref_stats.txt guppy.stats
aws s3 cp --no-sign-request s3://ont-open-data/gm24385_2020.09/analysis/r9.4.1/20200914_1354_6B_PAF27096_e7c9eae6/guppy_v4.0.11_r9.4.1_hac_prom/align_unfiltered/chr1/bonito_v0.3.0/align_unfiltered/calls2ref_stats.txt bonito.stats

The following python code,

from concurrent.futures import ProcessPoolExecutor
import pandas as pd
import aplanat.util
from aplanat import lines
def read_data(args):
caller, filename = args
df = pd.read_csv(filename, sep='\t')
xs, ys = aplanat.util.kernel_density_estimate(df['acc'], step=0.05)
df = pd.DataFrame({'accuracy':xs, 'density':ys})
df['caller'] = caller
return df
data_sets = {
'bonito': 'bonito.stats',
'guppy': 'guppy.stats'}
with ProcessPoolExecutor() as executor:
dfs = list(executor.map(read_data, data_sets.items()))
plot = lines.line(
[df['accuracy'] for df in dfs],
[df['density'] for df in dfs],
colors=['red', 'blue'],
names=['bonito', 'guppy'],
xlim=(85,100),
x_axis_label='Alignment accuracy',
y_axis_label='Density')
plot.legend.location = 'top_left'

can be used to plot a kernel density estimate for the read alignment accuracy:

accuracy comparison
Basecalls: Bonito CTC-CRF vs. Guppy 4.0.11

The plot indicates a decrease of one-third in the modal error of reads.


Tags

#datasets#human cell-line#R9.4.1#basecalling

Share

Chris Wright

Chris Wright

Senior Director, Customer Workflows

Related Posts

Genome in a Bottle Data Release 2025.01
January 05, 2025
2 min

Quick Links

WorkflowsOpen DataContact

Social Media

© 2020 - 2025 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.