Q20 single-read accuracy with ultra-long CliveOME dataset
May 21, 2021
We are pleased to announce a fresh release of the CliveOME using the latest Q20 pre-release chemistry.

Data location

As with previous releases the new dataset is available for anonymous download from and Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data.

The data is located in the bucket at:


See the tutorials page for information on downloading the dataset.


The dataset comprises the direct output of the sequencing device software MinKNOW, along with basecalls computed post-run using the research-grade bonito basecaller with the “Q20 early access model” as follows:

pip install ont-bonito==0.4.0
bonito download --models
bonito basecaller dna_r10.3_q20ea <read directory> | bgzip -c > basecalls.fa.gz

Only reads passing the default quality filter (average Q-score > 10) were processed by bonito, i.e. only those .fast5 files located within the fast5_pass MinKNOW output folder.

Data summary

The sequencing runs here represent data from pre-release versions of the sequencing and analysis components. Data throughput and quality do not reflect that of a released product.

The dataset comprises eight PromethION sequencing runs from our R&D lab using pre-release chemistry components and R10.3 flowcells. A separately prepared sample was run on each flowcells. The flowcells yielded between 10Gbases and 18Gbases with N50 read lengths between 60-95kb.

Basecalling accuracy was assessed by aligning the reads to the GRCh38 human reference using minimap2, and alignment statistics calculated using the stats_from_bam program from the pomoxis software package.

Basecalling accuracy distribution for Q20 (early access) CliveOME dataset.
Single-molecule read lengths for each of the eight flowcells.



