sci_funcs module
- class TagSupportingReads[source]
Bases:
object
pc8’s Support Assessor
Confirm whether a read supports a variant and optionally mark supporting reads.
Used to tag reads as supporting a variant. This tag can then be used by other processes to include/exclude reads with that tag from their analysis.
- class SupportFlags(*values)[source]
Bases:
Flag
- CLEAR = 0
- SAMFLAG = 1
- FIELDS_MISSING = 2
- NOT_ALIGNED = 4
- NOT_ALT = 8
- SHORT = 16
- BAD_OP = 32
- classmethod check_read_supporting(read: ExtendedRead, mut_type: MutTypes, alt: str, vcf_start: int, vcf_stop: int, mark: bool = True) bool [source]
- class TagLowQualReads[source]
Bases:
object
pc8’s Quality Assessor
Determine if a read is of low quality and optionally mark low-quality reads.
The result of this process is used to mark reads as low quality. This tag can then be used by other processes to include/exclude reads with that tag from their analysis. The fraction of reads tagged as low quality, or as duplicates by the duplicate marking function below, is used to call the LQF flag on variants
- classmethod check_low_qual_read(read: ExtendedRead, vcf_start: int, alt: str, mut_type: MutTypes, min_basequal: int, min_mapqual: int, min_clipqual: int, mark: bool = True) bool [source]
- class TagFragmentReads[source]
Bases:
object
ab63 Overlap Assessor
Mark one read from a fragment pair if both members of the fragment pair are present in the input reads.
Used to tag reads as an overlapping mate. This tag can then be used by other processes to include/exclude reads with that tag from their analysis
- static check_for_mates(reads: Iterable[ExtendedRead])[source]
Mark lower mean qual member of a read pair as overlapping
- class TagStutterDuplicateReads[source]
Bases:
object
pc8’s Stutter Duplicates Assessor
Tag hidden duplicate reads which have shifted endpoints due to PCR stutter (hence evading normal dupmarking).
The result of this process is used to mark reads as duplicates. This tag can then be used by other processes to include/exclude reads with that tag from their analysis. The fraction of reads tagged as duplicates by this process is used to assess a variant for the DVF flag and, partly (with the low-quality tag), the LQF flag
- class FragmentEndpoints(read: ExtendedRead, endpoints: tuple[int, int, int, int])[source]
Bases:
object
- read: ExtendedRead
- endpoints: tuple[int, int, int, int]
- classmethod check_stutter_duplicates(reads: Iterable[ExtendedRead], duplication_window_size: int = 6)[source]
- final class AlignmentScoreTest[source]
Bases:
object
Pass/Fail a variant on the average alignment score of supporting reads.
- class ResultPack(outcome: TestOutcomes, avg_as: float | None, reason: Info)[source]
Bases:
object
- class Info(*values)[source]
Bases:
Flag
- NODATA = 0
- NO_READS = 1
- INSUFFICIENT_AS_TAGS = 2
- ON_THRESHOLD = 4
- outcome: TestOutcomes
- avg_as: float | None
- classmethod test_variant_reads(reads: Iterable[ExtendedRead], avg_AS_threshold: float) ResultPack [source]
pc8’s Alignment Score Assessor
- final class AnomalousDistributionTest[source]
Bases:
object
Pass/Fail a variant on the distribution of the mutant base position on the supporting reads
Uses the conditions given by Ellis et al. in doi.org/10.1038/s41596-020-00437-6, and one additional condition devised by al35
The full text of the conditional is reproduced below, with editorials in []. There is a point of ambiguity in the original conditional. The interpretation that this tool has opted for is indicated by [] and is expanded upon subsequently.
For each variant, if the number of variant-supporting reads determined [IN PRIOR STEPS] is low (i.e. 0–1 reads) for one strand, follow Option A. For each variant, if both strands have sufficient variant-supported reads (i.e. ≥2 reads), follow Option B.
- Low number of variant-supporting reads on one strand
(i) For each variant, if one strand had too few variant-supporting reads, the other strand must conform to:
Fewer than 90% of variant-supporting reads [ON THE STRAND] have the variant located within the first 15% of the read measured from the alignment start position.
MAD >0 and s.d. >4 for that strand.
- Sufficient variant-supporting reads on both strands
(i) For each variant, if both strands have sufficient variant-supporting reads (i.e., ≥2 reads), then one of the following must be true:
Fewer than 90% of variant-supporting reads should have the variant located within the first 15% of the read measured from the alignment start position.
MAD >2 and s.d. >2 for both strands.
MAD >1 and s.d. >10 for one strand (i.e., strong evidence of high variability in variant position in variant-supporting reads).
The point of ambiguity is whether, on path A, to include the single read from the strand which does not sufficiently support the variant in the test of positional distribution across the supporting reads. The present interpretation is that the test should be applied only to the reads on the “other strand”, since the phrasing “the other strand must conform to” implies the exclusion of the single read on the low-support strand.
The additional condition to be checked is simply whether at least N supporting reads express away from the read edge. This condition is experimental at this time and can be disabled by setting min_non_edge_reads to 0
- class ResultPack(outcome: TestOutcomes, strand: Strand, reason: Info)[source]
Bases:
object
- class Info(*values)[source]
Bases:
Flag
- NODATA = 0
- NO_READS = 1
- INSUFFICIENT_READS = 2
- EDGE_CLUSTERING = 4
- ONE_STRAND_DISTRIB = 8
- BOTH_STRAND_DISTRIB_BOTH = 16
- BOTH_STRAND_DISTRIB_ONE = 32
- MIN_NON_EDGE = 64
- outcome: TestOutcomes
- classmethod test_variant_reads(reads: Iterable[ExtendedRead], record_start: int, edge_definition: float, edge_clustering_threshold: float, min_MAD_one_strand: float, min_sd_one_strand: float, min_MAD_both_strand_weak: float, min_sd_both_strand_weak: float, min_MAD_both_strand_strong: float, min_sd_both_strand_strong: float, low_n_supporting_reads_boundary: int, min_non_edge_reads: int) ResultPack [source]
- class ProportionBasedTest[source]
Bases:
object
Pass/Fail a variant on a proportion of supporting reads with/without a given property.
Used by the LQF flag test to test a proportion of stutter duplicate and low-quality supporting reads. Used by the DVF flag test to test a proportion of stutter duplicate reads.
- class ResultPack(outcome: TestOutcomes, reason: Info, prop_loss: float)[source]
Bases:
object
- outcome: TestOutcomes
- prop_loss: float
- classmethod test_variant_reads(reads: Sequence[ExtendedRead], tags_to_check: Iterable[Tags], read_loss_threshold: float, min_without: int = 0) ResultPack [source]