Workflow Single Cell report

Read summary

Single cell sample summary

This table summarises the number of input reads, and the number of cells, genes and transcripts identified within each sample.

sample ID	reads	cells	genes	transcripts
3prime_v4	16004	1962	220	201
test_3prime_5k	4870	1263	69	60
test_5prime_5k	4825	690	57	46
test_multiome_5k	4970	1112	24	19

Alignment summary

Summaries of genome read alignment per sample.

Note that reads aligned may be less than the total number of input reads if non-full-length reads were filtered. see option: full_length_only.

sample	reads_aligned	primary	secondary	supplementary	unmapped
test_multiome_5k	4329	1048	1701	80	3281
test_5prime_5k	3457	927	242	13	2530
test_3prime_5k	4520	1314	833	28	3206
3prime_v4	15397	15037	6628	290	360

Read survival by stage

These plots detail the number of remaining reads at different stages of the workflow.

full length: Proportion of reads containing adapters in expected configurations.
total tagged: Proportion of reads that have been assigned corrected UMIs and barcodes.
gene tagged: Proportion of reads assigned to a gene.
transcripts tagged: Proportion of reads assigned a transcript.

Primer configuration

Full length reads are identified by locating read segments flanked by known primers in expected orientations: adapter1---full_length_read---adapter2.

These full length reads can then be oriented in the same way and are used in the next stages of the workflow.

Every library prep will contain some level of artifact reads including mis-primed reads and those without adapters. These are identified by non-standard primer configurations, and are not used for subsequent stages of the workflow. The plots here show the proportions of different primer configurations within each sample, which can help diagnosing library preparation issues. The majority of reads should be full_length.

The primers used to identify read segments vary slightly between the supported kits. They are:

3prime, multiome and visium kits:

Adapter1: Read1
Adapter2: TSO

5prime kit:

Adapter1: Read1
Adapter2: Non-Poly(dT) RT primer

Diagnostic plots

Knee plot

The knee plot is a quality control for RNA-seq data and illustrates the procedure used to filter invalid cells. The X-axis represents cells ranked by number of reads and the Y-axis reads per barcode. The vertical dashed line shows the cutoff. Cells to the right of this are assumed to be invalid cells, including dead cells and background from empty droplets.

Saturation plots

Sequencing saturation provides a view of the amount of library complexity that has been captured in the experiment. As read depth increases, the number of genes and distinct UMIs identified will increase at a rate that is dependent on the complexity of the input library. A steep slope indicates that new genes or UMIs could still be identified by increasing the read coverage. A slope which flattens towards higher read coverage indicates that the full library complexity is being well captured.

Gene saturation: Genes per cell as a function of depth.
UMI saturation: UMIs per cell as a function of read depth.
Sequencing saturation: This metric is a measure of the proportion of reads that come from a previously observed UMI, and is calculated with the following formula: 1 - (number of unique UMIs / number of reads).

UMAP projections

This section presents various UMAP projections of the data. UMAP is an unsupervised algorithm that projects the multidimensional single cell expression data into 2 dimensions. This could reveal structure in the data representing different cell types or cells that share common regulatory pathways, for example. The UMAP algorithm is stochastic; analysing the same data multiple times with UMAP, using identical parameters, can lead to visually different projections. In order to have some confidence in the observed results, it can be useful to run the projection multiple times and so a series of UMAP projections can be viewed below.

The following genes were not in the dataset / so have been filtered out: Fth1, Cox8a, mt-Co1, Gnaq, FTL, JSRP1, Armh3

Software versions

Name	Version
pysam	0.22.0
parasail	1.2.3
pandas	2.0.3
rapidfuzz	2.13.7
scikit-learn	1.3.2
fastcat	0.15.2
minimap2	2.24-r1122
samtools	1.20
bedtools	v2.30.0
gffread	0.12.7
seqkit	v2.8.0
stringtie	2.2.2

Workflow parameters

Key	Value
fastq	wf-single-cell/data/test_data/fastq/
bam	None
out_dir	wf-single-cell
sample_sheet	None
sample	None
single_cell_sample_sheet	wf-single-cell/data/test_data/samples.test.csv
kit_config	None
kit	None
threads	4
full_length_only	True
min_read_qual	None
fastq_chunk	2500
ref_genome_dir	wf-single-cell/data/test_data/refdata-gex-GRCh38-2020-A
barcode_adapter1_suff_length	10
barcode_min_quality	15
barcode_max_ed	2
barcode_min_ed_diff	2
gene_assigns_minqv	30
matrix_min_genes	1
matrix_min_cells	1
matrix_max_mito	100
matrix_norm_count	10000
genes_of_interest	None
umap_n_repeats	3
expected_cells	None
mito_prefix	MT-
stringtie_opts	-c 2
store_dir	wf-single-cell/store_dir