As with previous releases the new dataset is available for anonymous download from and Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data.

The data is located in the bucket at:

s3://ont-open-data/Q20_ULK_Cliveome/

See the tutorials page for information on downloading the dataset.

Basecalling

The dataset comprises the direct output of the sequencing device software MinKNOW, along with basecalls computed post-run using the research-grade bonito basecaller with the “Q20 early access model” as follows:

pip install ont-bonito==0.4.0
bonito download --models
bonito basecaller dna_r10.3_q20ea <read directory> | bgzip -c > basecalls.fa.gz

Only reads passing the default quality filter (average Q-score > 10) were processed by bonito, i.e. only those .fast5 files located within the fast5_pass MinKNOW output folder.

Data summary

The sequencing runs here represent data from pre-release versions of the sequencing and analysis components. Data throughput and quality do not reflect that of a released product.

The dataset comprises eight PromethION sequencing runs from our R&D lab using pre-release chemistry components and R10.3 flowcells. A separately prepared sample was run on each flowcells. The flowcells yielded between 10Gbases and 18Gbases with N50 read lengths between 60-95kb.

Basecalling accuracy was assessed by aligning the reads to the GRCh38 human reference using minimap2, and alignment statistics calculated using the stats_from_bam program from the pomoxis software package.

Basecalling accuracy distribution for Q20 (early access) CliveOME dataset.

Single-molecule read lengths for each of the eight flowcells.

Chris Wright

Senior Director, Customer Workflows

Nanopore-only T2T assembly of a human genome

May 22, 2024

2 min

Updated Tumor Normal Pair Benchmark Dataset

Sarah Griffiths

March 07, 2024

1 min

An experimental extremely high-accuracy, ultra-long sequencing kit

December 06, 2023

1 min

Sequencing Genome in a Bottle samples

Andrea Talenti

May 26, 2023

2 min

Sequencing A Tumor Normal Pair with Ligation Sequencing Kit V14

Andrea Talenti

May 12, 2023

2 min

Genome in a Bottle Ashkenazi Trio with Ligation Sequencing Kit V14

Chris Wright

January 27, 2023

2 min

Quick Links

Tutorials Workflows Open Data Contact

Information

Social Media

github twitter

© 2020 - 2024 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.

Q20 single-read accuracy with ultra-long CliveOME dataset

Data location

Basecalling

Data summary

Tags

Share

Chris Wright

Senior Director, Customer Workflows

Related Posts

Q20 single-read accuracy with ultra-long CliveOME dataset

.css-3mxrie{box-sizing:border-box;margin:0;min-width:0;display:block;color:var(--theme-ui-colors-heading,#edf2f7);font-weight:bold;-webkit-text-decoration:none;text-decoration:none;margin-bottom:1rem;font-size:1.125rem;position:relative;}Data location

Basecalling

Data summary

Tags

Share

Chris Wright

Senior Director, Customer Workflows

Related Posts

Data location