Introduction

The workflow aids with the quantification of the non-target depletion and provides information on mapping characteristics that highlight the cas9 targeted sequencing protocol performance. The figures plotted include depth-of-coverage over the target regions and strand bias over these regions. The location and peaks of coverage and local biases in strandedness may be used to assess the performance of guide-RNA sequences and may highlight guide RNAs that are not performing. A review of likely off-target regions over-represented within the sequence collection may inform of strategies to refine guide-RNA design.

Read summaries

Summary of on-target and off-target reads

On target reads are defined here as any read that contains at least 1pb overlap with a target region. Off target reads contain no target-overlapping bases.

mean_len mean_len num_reads num_reads kbases_mapped kbases_mapped
sample off_target on_target off_target on_target off_target on_target
sample_1 8249 5576 1495 186 12332 1037
mean_len mean_len num_reads num_reads kbases_mapped kbases_mapped
sample off_target on_target off_target on_target off_target on_target
sample_2 8249 5576 1495 186 12332 1037

Targets

Target coverage plots

Each of the following plots show the amount of coverage per target, for each strand in discretized bins of 100 bp.

Target summaries
This table provides summaries for each sample/target combination.
sample run_id chr start end target tsize kbases coverage_frac median_cov nreads mean_read_length strand_bias
sample_1 9b52beb8b4f9ec458eb28c28b35822acaff84952 chr19 13204400 13211100 SCA6 6700 422 1.0 53 62 6689 -0.35
sample_1 9b52beb8b4f9ec458eb28c28b35822acaff84952 chr22 45791500 45799400 SCA10 7900 624 1.0 109 124 5019 -0.06
sample run_id chr start end target tsize kbases coverage_frac median_cov nreads mean_read_length strand_bias
sample_2 9b52beb8b4f9ec458eb28c28b35822acaff84952 chr19 13204400 13211100 SCA6 6700 422 1.0 53 62 6689 -0.35
sample_2 9b52beb8b4f9ec458eb28c28b35822acaff84952 chr22 45791500 45799400 SCA10 7900 624 1.0 109 124 5019 -0.06
Column descriptions:
  • chr, start, end: target location.
  • target: the target name.
  • nreads: number of reads aligning.
  • coverage_frac: fraction of bases within target with non-zero coverage.
  • tsize: length of target (in bases).
  • median_cov: average read depth across target.
  • mean_read_length: average read length of reads aligning.
  • strand_bias: proportional difference of reads aligning to each strand.
  • A value or +1 or -1 indicates complete bias to the forward or reverse strand respectively.
  • kbases: number of bases in reads overlapping target.

Coverage distribution

These plots show the on-target / off-target coverage distribution of genomic regions binned by 100bp. Off-target regions are defined as any region not within 1kb of a target. The background histogram should naturally be be skewed heavily to the left, this noise being expected when many regions in the genome have a single read mapping. If the targeted sequencing approach has performed well, the on_target histogram should be skewed towards the right indicating a depletion of non-target reads.

Off-target hotspots

Off target regions are again defined here as all regions of the genome not within 1kb of a target region. An off-target hotspot is a off-target region with contiguous overlapping reads. These hotspots may indicate incorrectly- performing primers. Only regions with 10 reads or more are included

chr numReads start end hotspotLength
chr19 671 45770199 45770267 68
chr22 153 27798878 27798999 121
chr22 37 39857728 39859751 2023
chr22 37 36602417 36604784 2367
chr22 29 38375711 38376704 993
chr22 27 20564502 20564654 152
chr22 25 17687521 17688172 651
chr19 23 42036053 42036479 426
chr19 22 32247741 32248656 915
chr22 18 49340852 49343215 2363
chr22 16 11052776 11053084 308
chr22 16 17383972 17385468 1496
chr22 15 31361042 31361155 113
chr19 14 39613991 39615090 1099
chr22 13 50689429 50689736 307
chr22 12 36944343 36945138 795
chr22 11 46988442 46989726 1284
chr22 10 33211768 33213793 2025
chr numReads start end hotspotLength
chr19 671 45770199 45770267 68
chr22 153 27798878 27798999 121
chr22 37 39857728 39859751 2023
chr22 37 36602417 36604784 2367
chr22 29 38375711 38376704 993
chr22 27 20564502 20564654 152
chr22 25 17687521 17688172 651
chr19 23 42036053 42036479 426
chr19 22 32247741 32248656 915
chr22 18 49340852 49343215 2363
chr22 16 11052776 11053084 308
chr22 16 17383972 17385468 1496
chr22 15 31361042 31361155 113
chr19 14 39613991 39615090 1099
chr22 13 50689429 50689736 307
chr22 12 36944343 36945138 795
chr22 11 46988442 46989726 1284
chr22 10 33211768 33213793 2025

Software versions

Name Version
pysam 0.21.0
fastcat 0.10.2
betools v2.31.0

Workflow parameters

Key Value
analyse_unclassified False
fastq test_data/fastq
reference_genome test_data/grch38/grch38_chr19_22.fa.gz
targets test_data/targets.bed
threads 4
out_dir wf-cas9
full_report True
sample None
sample_sheet None
report_name report
store_dir wf-cas9/store_dir