The coverage analysis system needs to process the reads that were aligned to all areas of the genome, and then calculate metrics on the
reads that overlap with the features (e.g. "on target") and those that do not (e.g. "off target"). There are several capabilities to
control the quality and error corrections that will be applied to the reads as they are being processed.
If you enable Unique Molecular Id/Tag (UMI/UMT) processing the system will first group all of the reads at the same position
into consensus families before calculating the coverage at each position. You can optionally choose to filter out consensus families with a
smaller number of reads during the coverage analysis. This can prevent affecting the results with reads that don't have many duplicates
present with a matching UMI/UMT at any position.
Important Note: If you did not enable UMI/UMT processing when aligning the reads (available on the "Pre-Processing"
tab on the "Start Alignment" screen), then Curio automatically hides the UMI/UMT processing options on the coverage analysis screen.
When utilizing UMI/UMT processing, after calculating the consensus families at each position this setting can then be used to remove the
smaller families. E.g. if you set this to a value of "5 reads" then any consensus families that contain 4 reads or less would be excluded before
calculating the coverage metrics. Note that if you slide this setting all the way to the left (i.e. "Include all Families") then all consensus read
families will be included in the analysis, even if the family only contains one read.
As part of a coverage analysis UMI/UMT processing therefore provides a way to save reads that contain information
from an original molecule that would have otherwise been filtered out during de-duplication, by using the UMI/UMT of each
consensus family to determine the unique information that should be retained at each alignment position (instead of simply
removing duplicate reads that have a matching alignment position and "CIGAR" alignment string.) In addition, by using the
"Minimum Family Size" setting, you can get rid of reads that are potentially noise where there is no evidence of
other reads at the same position that had a corresponding UMI/UMT.
When the "De-duplication" option is enabled, Curio will attempt to get rid of reads that are potential PCR amplification duplicates. The algorithm
used finds all reads that have the same alignment position, orientation, and CIGAR alignment string. In the case of a paired-end read, those
aspects of both the read and the mate are taken account. Note that the "CIGAR" alignment string is how the aligner specifies areas of each read that
represent potential insertions or soft clipped regions (i.e. bases present in the read that are not in the reference) or deletions (i.e. missing
bases in the read that are in the reference). The read (or read pair) with the highest quality is then kept, and all other reads (or read
pairs) that have a matching alignment position, orientation, and CIGAR string are then removed before calculating coverage metrics. Note that to
calculate which reads have the highest quality, the best read (or read pair) is determined to be the one with the highest sum of Phred
base qualities that are greater than or equal to Q15.
If you disable this option all reads will be included in the analysis, even if they are potential PCR duplicates. When UMI/UMT processing is enabled,
this option will appear disabled since duplicate reads that have the same identifier will be automatically consolidated when UMI/UMT processing is in
effect. Note that the option to enable UMI/UMT processing during coverage analysis is only visible if you enabled UMI/UMT processing when first
aligning the sequence alignment file.
Some of the aligners (Bowtie, etc.) report a Phred-like quality score that is used to represent how likely the position chosen for the read
alignment is correct. If this type of alignment quality score is available for the reads, then this setting can be used to exclude reads whose
alignment quality is below the selected value.