sci_funcs module

class TagSupportingReads[source]

Bases: object

pc8’s Support Assessor

Confirm whether a read supports a variant and optionally mark supporting reads.

Used to tag reads as supporting a variant. This tag can then be used by other processes to include/exclude reads with that tag from their analysis.

class SupportFlags(*values)[source]

Bases: Flag

CLEAR = 0

FIELDS_MISSING = 1

NOT_ALIGNED = 2

NOT_ALT = 4

SHORT = 8

BAD_OP = 16

classmethod check_read_supporting(read: ExtendedRead, mut_type: MutTypes, alt: str, vcf_start: int, vcf_stop: int, mark: bool = True) → bool[source]

class TagLowQualReads[source]

Bases: object

pc8’s Quality Assessor

Determine if a read is of low quality and optionally mark low-quality reads.

The result of this process is used to mark reads as low quality. This tag can then be used by other processes to include/exclude reads with that tag from their analysis. The fraction of reads tagged as low quality, or as duplicates by the duplicate marking function below, is used to call the LQF flag on variants

class QualFlags(*values)[source]

Bases: Flag

CLEAR = 0

MAPQUAL = 1

CLIPQUAL = 2

BASEQUAL = 4

classmethod check_low_qual_read(read: ExtendedRead, vcf_start: int, alt: str, mut_type: MutTypes, min_basequal: int, min_mapqual: int, min_clipqual: int, mark: bool = True) → bool[source]

class TagFragmentReads[source]

Bases: object

ab63 Overlap Assessor

Mark one read from a fragment pair if both members of the fragment pair are present in the input reads.

Used to tag reads as an overlapping mate. This tag can then be used by other processes to include/exclude reads with that tag from their analysis

static check_for_mates(reads: Iterable[ExtendedRead])[source]: Mark lower mean qual member of a read pair as overlapping

class TagStutterDuplicateReads[source]

Bases: object

pc8’s Stutter Duplicates Assessor

Tag hidden duplicate reads which have shifted endpoints due to PCR stutter (hence evading normal dupmarking).

The result of this process is used to mark reads as duplicates. This tag can then be used by other processes to include/exclude reads with that tag from their analysis. The fraction of reads tagged as duplicates by this process is used to assess a variant for the DVF flag and, partly (with the low-quality tag), the LQF flag

class FragmentEndpoints(read: ExtendedRead, endpoints: tuple[int, int, int, int])[source]

Bases: object

read: ExtendedRead

endpoints: tuple[int, int, int, int]

classmethod check_stutter_duplicates(reads: Iterable[ExtendedRead], duplication_window_size: int = 6)[source]

final class AlignmentScoreTest[source]

Bases: object

Pass/Fail a variant on the average alignment score of supporting reads.

class ResultPack(outcome: TestOutcomes, avg_as: float | None, reason: Info)[source]

Bases: object

class Info(*values)[source]

Bases: Flag

NODATA = 0

NO_READS = 1

INSUFFICIENT_AS_TAGS = 2

ON_THRESHOLD = 4

outcome: TestOutcomes

avg_as: float | None

reason: Info

classmethod test_variant_reads(reads: Iterable[ExtendedRead], avg_AS_threshold: float) → ResultPack[source]: pc8’s Alignment Score Assessor

final class AnomalousDistributionTest[source]

Bases: object

Pass/Fail a variant on the distribution of the mutant base position on the supporting reads

Uses the conditions given by Ellis et al. in doi.org/10.1038/s41596-020-00437-6, and one additional condition devised by al35

The full text of the conditional is reproduced below, with editorials in []. There is a point of ambiguity in the original conditional. The interpretation that this tool has opted for is indicated by [] and is expanded upon subsequently.

For each variant, if the number of variant-supporting reads determined [IN PRIOR STEPS] is low (i.e. 0–1 reads) for one strand, follow Option A. For each variant, if both strands have sufficient variant-supported reads (i.e. ≥2 reads), follow Option B.

Low number of variant-supporting reads on one strand
(i) For each variant, if one strand had too few variant-supporting reads, the other strand must conform to:
Fewer than 90% of variant-supporting reads [ON THE STRAND] have the variant located within the first 15% of the read measured from the alignment start position.

MAD >0 and s.d. >4 for that strand.
Sufficient variant-supporting reads on both strands
(i) For each variant, if both strands have sufficient variant-supporting reads (i.e., ≥2 reads), then one of the following must be true:
Fewer than 90% of variant-supporting reads should have the variant located within the first 15% of the read measured from the alignment start position.

MAD >2 and s.d. >2 for both strands.

MAD >1 and s.d. >10 for one strand (i.e., strong evidence of high variability in variant position in variant-supporting reads).

The point of ambiguity is whether, on path A, to include the single read from the strand which does not sufficiently support the variant in the test of positional distribution across the supporting reads. The present interpretation is that the test should be applied only to the reads on the “other strand”, since the phrasing “the other strand must conform to” implies the exclusion of the single read on the low-support strand.

The additional condition to be checked is simply whether at least N supporting reads express away from the read edge. This condition is experimental at this time and can be disabled by setting min_non_edge_reads to 0

class ResultPack(outcome: TestOutcomes, strand: Strand, reason: Info)[source]

Bases: object

class Info(*values)[source]

Bases: Flag

NODATA = 0

NO_READS = 1

INSUFFICIENT_READS = 2

EDGE_CLUSTERING = 4

ONE_STRAND_DISTRIB = 8

BOTH_STRAND_DISTRIB_BOTH = 16

BOTH_STRAND_DISTRIB_ONE = 32

MIN_NON_EDGE = 64

outcome: TestOutcomes

strand: Strand

reason: Info

classmethod test_variant_reads(reads: Iterable[ExtendedRead], record_start: int, edge_definition: float, edge_clustering_threshold: float, min_MAD_one_strand: float, min_sd_one_strand: float, min_MAD_both_strand_weak: float, min_sd_both_strand_weak: float, min_MAD_both_strand_strong: float, min_sd_both_strand_strong: float, low_n_supporting_reads_boundary: int, min_non_edge_reads: int) → ResultPack[source]

class ProportionBasedTest[source]

Bases: object

Pass/Fail a variant on a proportion of supporting reads with/without a given property.

Used by the LQF flag test to test a proportion of stutter duplicate and low-quality supporting reads. Used by the DVF flag test to test a proportion of stutter duplicate reads.

class ResultPack(outcome: TestOutcomes, reason: Info, prop_loss: float)[source]

Bases: object

class Info(*values)[source]

Bases: Flag

NODATA = 0

NO_READS = 1

THRESHOLD = 2

MIN_PASS = 4

outcome: TestOutcomes

reason: Info

prop_loss: float

classmethod test_variant_reads(reads: Sequence[ExtendedRead], tags_to_check: Iterable[Tags], read_loss_threshold: float, min_without: int = 0) → ResultPack[source]