Read summary

Single cell sample summary

This table summarises the number of input reads, and the number of cells, genes and transcripts identified within each sample.

sample ID reads cells genes transcripts
3prime_v4 16004 1962 220 201
test_3prime_5k 4870 1263 69 60
test_5prime_5k 4825 690 57 46
test_multiome_5k 4970 1112 24 19

Alignment summary

Summaries of genome read alignment per sample.

Note that reads aligned may be less than the total number of input reads if non-full-length reads were filtered. see option: full_length_only.
sample reads_aligned primary secondary supplementary unmapped
test_5prime_5k 3457 927 242 13 2530
3prime_v4 15397 15037 6628 290 360
test_multiome_5k 4329 1048 1701 80 3281
test_3prime_5k 4520 1314 833 28 3206

Read survival by stage

These plots detail the number of remaining reads at different stages of the workflow.

  • full length: Proportion of reads containing adapters in expected configurations.
  • total tagged: Proportion of reads that have been assigned corrected UMIs and barcodes.
  • gene tagged: Proportion of reads assigned to a gene.
  • transcripts tagged: Proportion of reads assigned a transcript.

Primer configuration

Full length reads are identified by locating read segments flanked by known primers in expected orientations: adapter1---full_length_read---adapter2.

These full length reads can then be oriented in the same way and are used in the next stages of the workflow.

Every library prep will contain some level of artifact reads including mis-primed reads and those without adapters. These are identified by non-standard primer configurations, and are not used for subsequent stages of the workflow. The plots here show the proportions of different primer configurations within each sample, which can help diagnosing library preparation issues. The majority of reads should be full_length.

The primers used to identify read segments vary slightly between the supported kits. They are:

3prime, multiome and visium kits:

  • Adapter1: Read1
  • Adapter2: TSO

5prime kit:

  • Adapter1: Read1
  • Adapter2: Non-Poly(dT) RT primer

Diagnostic plots

Knee plot

The knee plot is a quality control for RNA-seq data and illustrates the procedure used to filter invalid cells. The X-axis represents cells ranked by number of reads and the Y-axis reads per barcode. The vertical dashed line shows the cutoff. Cells to the right of this are assumed to be invalid cells, including dead cells and background from empty droplets.

Saturation plots

Sequencing saturation provides a view of the amount of library complexity that has been captured in the experiment. As read depth increases, the number of genes and distinct UMIs identified will increase at a rate that is dependent on the complexity of the input library. A steep slope indicates that new genes or UMIs could still be identified by increasing the read coverage. A slope which flattens towards higher read coverage indicates that the full library complexity is being well captured.

  • Gene saturation: Genes per cell as a function of depth.
  • UMI saturation: UMIs per cell as a function of read depth.
  • Sequencing saturation: This metric is a measure of the proportion of reads that come from a previously observed UMI, and is calculated with the following formula: 1 - (number of unique UMIs / number of reads).

UMAP projections

This section presents various UMAP projections of the data. UMAP is an unsupervised algorithm that projects the multidimensional single cell expression data into 2 dimensions. This could reveal structure in the data representing different cell types or cells that share common regulatory pathways, for example. The UMAP algorithm is stochastic; analysing the same data multiple times with UMAP, using identical parameters, can lead to visually different projections. In order to have some confidence in the observed results, it can be useful to run the projection multiple times and so a series of UMAP projections can be viewed below.

Software versions

Name Version
pysam 0.22.0
parasail 1.2.3
pandas 2.0.3
rapidfuzz 2.13.7
scikit-learn 1.3.2
fastcat 0.15.2
minimap2 2.24-r1122
samtools 1.20
bedtools v2.30.0
gffread 0.12.7
seqkit v2.8.0
stringtie 2.2.2

Workflow parameters

Key Value
fastq wf-single-cell/data/test_data/fastq/
bam None
out_dir wf-single-cell
sample_sheet None
sample None
single_cell_sample_sheet wf-single-cell/data/test_data/samples.test.csv
kit_config None
kit None
threads 4
full_length_only True
min_read_qual None
fastq_chunk 2500
ref_genome_dir wf-single-cell/data/test_data/refdata-gex-GRCh38-2020-A
barcode_adapter1_suff_length 10
barcode_min_quality 15
barcode_max_ed 2
barcode_min_ed_diff 2
gene_assigns_minqv 30
matrix_min_genes 1
matrix_min_cells 1
matrix_max_mito 100
matrix_norm_count 10000
genes_of_interest None
umap_n_repeats 3
expected_cells None
mito_prefix MT-
stringtie_opts -c 2
store_dir wf-single-cell/store_dir