DVF module

class FixedParamsDupmark(*, duplication_window_size: ~typing.Annotated[int, ~pydantic.functional_validators.AfterValidator(func=~hairpin2.process_wrappers.DVF.FixedParamsDupmark.<lambda>)] = 6)[source]

Bases: FixedParams

duplication_window_size: <lambda>)]
model_config: ConfigDict = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'strict': True}

parent dataclass to be be inherited from to store specific fixed parameters for a particular subclass of FilterTester, or in other words for a particular filtering test. Using subclasses of this class for the fixed parameters provides type-safety and a consistent interface for implementing filters

tag_dups(run_params: RunParamsShared, fixed_params: FixedParamsDupmark)[source]
class TaggerDupmark(engine_fixed_params: FixedParams | None, require_marks: Sequence[str], exclude_marks: Sequence[str])[source]

Bases: ReadAwareProcess

AddsMarks: ClassVar[set[str] | None] = {Tags.STUTTER_DUP_TAG}
EngineFactory() ProcessEngineProtocol[RunParams_T, None]
FixedParamClass

alias of FixedParamsDupmark

ProcessNamespace: ClassVar[str | None] = 'mark-duplicates'
ProcessType: ClassVar[ProcessKindEnum | None] = <class 'hairpin2.infrastructure.process_engines.ReadTaggerEngine'>
class ResultDVF(variant_flagged: TestOutcomes, info_flag: enum.Flag | None, reads_seen: int, loss_ratio: float)[source]

Bases: FlagResult

reads_seen: int
loss_ratio: float
getinfo(alt: str) str[source]

Return basic filter info in a string formatted for use in the VCF INFO field - “<flag>|<code>”.

Each filter must return INFO as it should be formatted for the VCF INFO field, or None if not applicable. Subclasses must override this method to return more specific info.

FlagName: ClassVar[str] = 'DVF'
InfoFlags

alias of Info

InfoFlagsAllSet: ClassVar[Flag | None] = 7
class FixedParamsDVF(*, read_loss_threshold: ~typing.Annotated[float, ~pydantic.functional_validators.AfterValidator(func=~hairpin2.process_wrappers.DVF.FixedParamsDVF.<lambda>)], min_pass_reads: ~typing.Annotated[int, ~pydantic.functional_validators.AfterValidator(func=~hairpin2.process_wrappers.DVF.FixedParamsDVF.<lambda>)], nsamples_threshold: int)[source]

Bases: FixedParams

read_loss_threshold - percent threshold of N lq reads compared to N input reads for a given variant and sample, above which we flag DVF min_pass_reads - the absolute minimum number of reads required for a variant not to be flagged DVF

read_loss_threshold: <lambda>)]
min_pass_reads: <lambda>)]
nsamples_threshold: int
model_config: ConfigDict = {'arbitrary_types_allowed': True, 'extra': 'forbid', 'frozen': True, 'strict': True}

parent dataclass to be be inherited from to store specific fixed parameters for a particular subclass of FilterTester, or in other words for a particular filtering test. Using subclasses of this class for the fixed parameters provides type-safety and a consistent interface for implementing filters

test_DVF(run_params: RunParamsShared, fixed_params: FixedParamsDVF)[source]
class FlaggerDVF(engine_fixed_params: FixedParams | None, require_marks: Sequence[str], exclude_marks: Sequence[str])[source]

Bases: ReadAwareProcess

duplication variant filter - a portion of the reads supporting the variant are suspected to arise from duplicated reads that have escaped dupmarking.

In regions of low complexity, short repeats and homopolymer tracts can cause PCR stuttering. Leading to, for example, an additional A on the read when amplifying a tract of As. If duplicated reads contain stutter, this can lead to variation of read length and alignment to reference between reads that are in fact duplicates. Because of this, these duplicates then evade dupmarking and give rise to spurious variants when calling.

min_boundary_deviation sets the minimum deviation start/end coordinates, above which reads are assumed not to be duplicated read_number_difference_threshold sets the the threshold for absolute difference between the number of reads supporting the variant with and without duplicates removed. If this threshold is exceeded, the flag will be set.

AddsMarks: ClassVar[set[str] | None] = None
EngineFactory() ProcessEngineProtocol[RunParams, FlagResult]
FixedParamClass

alias of FixedParamsDVF

ProcessNamespace: ClassVar[str | None] = 'DVF'
ProcessType: ClassVar[ProcessKindEnum | None] = <class 'hairpin2.infrastructure.process_engines.VariantFlaggerEngine'>