An experimental extremely high-accuracy, ultra-long sequencing kit

Published in Data Releases
December 06, 2023
1 min read
An experimental extremely high-accuracy, ultra-long sequencing kit

We are pleased to announce a new experimental dataset comprising extremely high accuracy, ultra-long sequencing reads, shared during our Nanopore Community Meeting technical update.

The following cell line samples were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: GM24385

Data location

As with previous releases the new dataset is available for anonymous download from an Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data.

The data is located in the bucket at:

s3://ont-open-data/gm24385_2023.12/

See the tutorials page for information on downloading the dataset.

Sample preparation and analysis

Ultra-long libraries of native DNA from GM24385 (HG002) were prepared using a modified Ultra-Long DNA Sequencing Kit V14 motor protein and experimental high-accuracy run conditions. Sequencing was performed on a PromethION instrument to obtain 125 Gbp of sequencing data passing quality filters (read Q-score > Q10, see here), with a read length N50 of 91 kbp. This data was basecalled using a bespoke dorado model to yield a median accuracy of Q26.4.

The per-base quality scores produced by the experimental basecaller model have not been calibrated and may not reflect the empirical accuracy of the called bases.

Read alignment accuracy was measured against the telomere-to-telomere consortium’s HG002 reference sequence using the bamstats tool from the fastcat package. The histogram outputs of bamstats was used to produce the plots in Fig. 1.

Figure 1. Sequencing summary metrics for a new experimental Oxford Nanopore Technologies sequencing chemisty. Alignment accuracy was measure with the bamstats program for the fastcat suite.

Genome assembly

With this dataset, parental sequencing reads, and the Hifiasm and RAFT tools, a diploid human genome assembly was contructed. The assembly included 19 telomere-to-telomere chromosomes. We hope release of this dataset will spurn innovation from assembly algorithm developers as well as serve as a resource for researchers studying the most complicated and inaccessible reaches of the human genome.

Size / GbpScaffold N50 / Mbp
Maternal3.03135
Paternal2.94133

ncm_assembly
Assembly karyogram from high-accuracy ultralong nanopore sequencing data.

Further information

For additional information regarding these data please contact support@nanoporetech.com.


Tags

#datasets#human cell-line#R10.4.1#basecalling#dorado

Share

Related Posts

Updated Tumor Normal Pair Benchmark Dataset
March 07, 2024
1 min

Quick Links

TutorialsWorkflowsOpen DataContact

Social Media

© 2020 - 2024 Oxford Nanopore Technologies plc. All rights reserved. Registered Office: Gosling Building, Edmund Halley Road, Oxford Science Park, OX4 4DQ, UK | Registered No. 05386273 | VAT No 336942382. Oxford Nanopore Technologies, the Wheel icon, EPI2ME, Flongle, GridION, Metrichor, MinION, MinIT, MinKNOW, Plongle, PromethION, SmidgION, Ubik and VolTRAX are registered trademarks of Oxford Nanopore Technologies plc in various countries. Oxford Nanopore Technologies products are not intended for use for health assessment or to diagnose, treat, mitigate, cure, or prevent any disease or condition.