We are pleased to announce the release of a new addition to the Oxford Nanopore Open Data project: sequencing of the COLO829 / COLO829BL tumour/normal pair.
These reference samples were sequenced with three PromethION flow cells; two flowcells for the cancer sample, one for the normal. They should provide a valuable resource for cancer researchers.
As with previous releases the new dataset is available for anonymous download from an Amazon Web Services S3 bucket. The bucket is part of the Open Data on AWS project enabling sharing and analysis of a wide range of data.
The data is located in the bucket at:
See the tutorials page for information on downloading the dataset.
COLO829 melanoma fibroblasts (ATCC CRL-1974) and COLO829BL normal B lymphoblasts (ATCC CRL-1980) were cultured for three days in RPMI-1640 medium with 10% fetal bovine serum and 1% antibiotic-antimycotic incubated at 37°C with 5% CO2. Five million cells were extracted with the DNeasy Blood and Tissue Kit (Qiagen, Cat. No. 69504) following the manufacturer’s instructions. Sequencing libraries were prepared following Oxford Nanopore Ligation Sequencing Kit instructions, and 20 fmols were loaded onto R10.4.1 PromethION flow cells. Sequencing was performed on a PromethION 24 instrument with the 22.12.5 MinKNOW software.
As a special bonus we have also included data from a second sample preparation aimed at extracting ultra-high molecular weight DNA. Six million cells were extracted with the Monarch HMW DNA Extraction Kit for Tissue (New England Biolabs, T3060) following the modified instructions outlined in the Oxford Nanopore Ultra-long DNA Sequencing Kit (SQK-ULK114). Sequencing libraries were prepared and 1/3 of the final libraries were loaded onto R10.4.1 PromethION flow cells. The flow cells were flushed and reloaded twice as described in the section, “Reloading ultra-long DNA library on a PromethION flow cell.” Samples were sequenced on a PromethION 24 instrument with the 22.12.5 MinKNOW software.
The ultra-high molecular weight data were not used in the main analysis.
Three flowcells were used to sequence the samples to high depth:
|Genome||Description||Preparation||Flowcell||Yield / Gbase|
For each flowcell used in the sequencing the primary sequencer outputs are available
.pod files. We provide also sequencing reads in CRAM format produced by our
wf-basecalling workflow. Reads are
aligned to the GRCh38 human reference.
The data analyses presented here were performed using our workflows:
The somatic variant calling workflow uses ClairS to create calls for the tumour sample by eliminating variants found also in the paired-normal sample.
The variant calling workflow was run using data from both the COLO829 (tumour) flowcells and the single COLO829BL (normal). The results of the workflow are present at:
For additional information regarding these data please contact email@example.com.
We hope that these data and analyses provide a useful resource to the community.