Read summaries

Single cell sample summary

This table summarises the number of input reads, and the number of cells, genes and transcripts identified within each sample.

sample ID reads cells genes transcripts
test_5prime_5k 4878 201 53 43
test_multiome_5k 4999 272 20 16
test_3prime_5k 4944 472 61 52

Read survival by stage

These plots detail the number of remaining reads at different stages of the workflow.

  • full length: Proportion of reads containing adapters in expected configurations.
  • total tagged: Proportion of reads that have been assigned corrected UMIs and barcodes.
  • gene tagged: Proportion of reads assigned to a gene.
  • transcripts tagged: Proportion of reads assigned a transcript.

Primer configuration

Full length reads are identified by locating read segments flanked by known primers in expected orientations: adapter1---full_length_read---adapter2.

These full length reads can then be oriented in the same way and are used in the next stages of the workflow.

Every library prep will contain some level of artifact reads including mis-primed reads and those without adapters. These are identified by non-standard primer configurations, and are not used for subsequent stages of the workflow. The plots here show the proportions of different primer configurations within each sample, which can help diagnosing library preparation issues. The majority of reads should be full_length.

The primers used to identify read segments vary slightly between the supported kits. They are:

3prime and multiome kits:

  • Adapter1: Read1
  • Adapter2: TSO

5prime kit:

  • Adapter1: Read1
  • Adapter2: Non-Poly(dT) RT primer

Diagnostic plots

Knee plot

The knee plot is a quality control for RNA-seq data and illustrates the procedure used to filter invalid cells. The X-axis represents cells ranked by number of reads and the Y-axis reads per barcode. The vertical dashed line shows the cutoff. Cells to the right of this are assumed to be invalid cells, including dead cells and background from empty droplets.

Saturation plots

Sequencing saturation provides a view of the amount of library complexity that has been captured in the experiment. As read depth increases, the number of genes and distinct UMIs identified will increase at a rate that is dependent on the complexity of the input library. A steep slope indicates that new genes or UMIs could still be identified by increasing the read coverage. A slope which flattens towards higher read coverage indicates that the full library complexity is being well captured.

  • Gene saturation: Genes per cell as a function of depth.
  • UMI saturation: UMIs per cell as a function of read depth.
  • Sequencing saturation: This metric is a measure of the proportion of reads that come from a previously observed UMI, and is calculated with the following formula: 1 - (number of unique UMIs / number of reads).

Software versions

Name Version
pysam 0.21.0
parasail 1.2.3
pandas 2.0.3
rapidfuzz 2.13.7
scikit-learn 1.3.0
fastcat 0.13.2
minimap2 2.24-r1122
samtools 1.17
bedtools v2.30.0
gffread 0.12.7
seqkit v2.5.1
stringtie 2.2.1

Workflow parameters

Key Value
help False
version False
fastq test_data/fastq/
out_dir wf-single-cell
sample_sheet None
sample None
single_cell_sample_sheet test_data/samples.test.csv
aws_image_prefix None
aws_queue None
disable_ping False
kit_config None
max_threads 4
plot_umaps False
full_length_only True
ref_genome_dir test_data/refdata-gex-GRCh38-2020-A
barcode_adapter1_suff_length 10
barcode_min_quality 15
barcode_max_ed 2
barcode_min_ed_diff 2
gene_assigns_minqv 30
matrix_min_genes 1
matrix_min_cells 1
matrix_max_mito 100
matrix_norm_count 10000
umap_plot_genes None
umap_n_repeats 3
resources_mm2_max_threads 4
resources_mm2_flags -I 16G
process_chunk_size 100000
adapter_scan_chunk_size 0
kit_name 3prime
kit_version v3
expected_cells 500
merge_bam False
mito_prefix MT-
stringtie_opts -c 2
monochrome_logs False
validate_params True
show_hidden_params False
schema_ignore_params show_hidden_params,validate_params,monochrome_logs,aws_queue,aws_image_prefix,wf
wf {'example_cmd': ['--expected_cells 100', '--fastq wf-single-cell-demo/chr17.fq.gz', '--kit_name 3prime', '--kit_version v3', '--ref_genome_dir wf-single-cell-demo', '--umap_plot_genes wf-single-cell-demo/umap_plot_genes.csv'], 'container_sha': 'sha8e7d91013029ea8721743bd087583e5205cdc1dc', 'common_sha': 'sha2816439fd5aa81902836ea5794447b54947fa056', 'agent': 'cw-ci'}
params {'max_threads': 4}