After processing the data of the qualified reads available at each position, the variant detection system then needs to determine if one or
more potential alleles exist at each position. In order to make this determination, Curio allows you to configure several different thresholds
that will impact the specificity and sensitivity of the variants included in the analysis output.
By default, Curio will avoid making a variant call at any position where there aren't at least 10 reads present. If you want to make calls at
positions where only a few reads are present then this setting can be decreased, or you can increase to require a higher depth of coverage before
making a variant call at any position. Note that when PCR read de-duplication is enabled as well, this coverage filter is applied after duplicate
reads have been filtered out first.
When UMT/UMI (Unique Molecular Tag/Id) processing is also enabled, this setting instead sets the minimum number of families that need to be available
at the position before a variant will be called. So, for example, if there are 90 individual reads present at a given position but after UMT/UMI
processing is applied only 8 unique families are found to exist at that position, then with the default setting of a minimum coverage of "10" no
variants would be called at that position. Note that the option to enable UMT/UMI processing during variant analysis is only visible if you enabled
UMT/UMI processing when first aligning the sequence alignment file.
When Curio finds a potential variant at any given position in the genome it then needs to decide if the signal for the variant is strong enough to
consider including in the list of detected variants for the analysis. To determine this, it looks at the number of reads at the position that show the
variant and divides that by the total number of reads at the same position. This division results in the "variant frequency", which is then compared
to the minimum value you set for this option in order to decide if the variant will be included in the output report. E.g. if there are 30 reads at
some position that show a potential SNP and there are 70 other reads at the same position which simply matched the reference, than the variant's
frequency (whether homozygous or heterozygous) would be calculated as 30%.
When UMT/UMI (Unique Molecular Tag/Id) processing is also enabled, this setting instead sets the minimum frequency of the number of consensus families
that need to show the variant at the position before a variant will be called. Note that the option to enable UMT/UMI processing during variant
analysis is only visible if you enabled UMT/UMI processing when first aligning the sequence alignment file.
Note that when also enabling the "Include rare allele detection in the analysis" this setting has no effect on the rare alleles detected. Instead, use
the "Min Rare Allele Frequency" setting to exclusively control the lower threshold of the frequency of rare alleles that you are attempting to detect.
When Curio finds a location within the genome that has enough reads (or UMT/UMI families) at that position which show a nucleotide different than the
reference, then it will make a call that there is either a Homozygous or Heterozygous variant at that position. It first attempts to determine if the
variant is potentially heterozygous, and if not the position will be called as a homozygous SNP. Two different algorithms are available to control how
Curio determines if the position is potentially heterozygous: 1.) a basic nucleotide frequency algorithm or 2.) a more complex statistical algorithm
that takes into account the quality of base pair as reported by the sequencer.
If you choose the "Use nucleotide frequencies only in heterozygous calls" option, then Curio will use the following method for making the consensus
call (which is taken from Cavener, Nucleic Acids Res. 15, 1353-1361, 1987):
If the frequency of a single nucleotide at a specific position is greater than 50% and greater than twice the number of the second most frequent
nucleotide it is assigned as the consensus nucleotide (e.g. homozygous).
If the sum of the frequencies of two nucleotides is greater than 75% (but neither meet the criteria for a single nucleotide assignment) they are
assigned as co-consensus nucleotides (e.g. heterozygous).
If no single nucleotide or pair of nucleotides meets the above criteria, then it won't be called as a variant.
If you choose the "Use read and base quality data in heterozygous calls" option, then Curio will take into account a.) the frequency of the
individuals nucleotides, b.) the quality data of each base pair as reported by the sequencer, and c.) the quality of the read alignment that the base
pair is a part of (as reported by the aligner) in order to determine if the position should be called as heterozygous or homozygous. The algorithm
used is a statistical algorithm which is based on the approach used in the MAQ (Mapping and Assembly with Qualities) program, and described in detail
in the paper titled "Mapping short DNA sequencing reads and calling variants using mapping quality scores" (Heng Li, Jue Ruan, and Richard Durbin,
Genome Res. 2008 Nov; 18(11): 1851–1858.) The algorithm essentially attempts to calculate the statistical probability that a given set of nucleotides
could have occurred in the original DNA in a heterozygous form given the quality of each base pair and alignment, and then compares that to the
statistical probability that all of the base pairs were originally the same nucleotide.
If you enable rare allele detection then Curio will perform an additional analysis step on each position that is being analyzed. Specifically, it will
attempt to find whatever the minor nucleotide type is at any given position, and then determine if the frequency of that nucleotide falls within the
requested range. For example, if there are 3 reads at a position that show a nucleotide of 'C' and 97 that show a nucleotide of 'A', then the
nucleotide of 'A' would be considered the major call at that position and the nucleotide of 'C' would be consider the minor or rare call (which in
this case would have a frequency of 3%). Note that, unlike SNPs, it makes no difference what the reference shows at any given position when
determining if a potential rare allele exists at a position. So, in this example the nucleotide of 'C' would still be called as a rare allele, even if
the reference also showed a 'C' at that position.
When UMT/UMI (Unique Molecular Tag/Id) processing is also enabled, these settings instead set the minimum and maximum frequency of the number of
consensus families that need to show the rare allele at the position before a variant will be called. Note that the option to enable UMT/UMI
processing during variant analysis is only visible if you enabled UMT/UMI processing when first aligning the sequence alignment file.