Variant Filtration SOPs

This chapter contains SOPs directly related to the filtration, prioritization, and interpretation of variants. The first SOPs cover the filtration of variants for singleton and trio exomes in various modes of inheritance. When dealing with different case structures (e.g., siblings or only having one parent present), they can be handled with adjusted trio SOPs. This is followed with SOPs for assessing variants for pathogenicity and suitability as candidate variants.

Contents

SOP: Filtering Singletons for Autosomal Variants
SOP: Filtering Singletons for X-chromosomal Variants
SOP: Filtering Trios for Autosomal Variants
SOP: Filtering Trios for X-chromosomal variants
SOP: Prioritization with Phenotype and Pathogenicity Scores
SOP: Variant Assessment
SOP: Using Variant Link-Outs
SOP: Using Gene Link-Outs

SOP: Filtering Singletons for Autosomal Variants 

Aims and Scope

The aim of this SOP is the filtration of singleton data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the X chromosomes is described in SOP: Filtering Singletons for X-chromosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

de novo	dominant	hom. rec.	comp. rec.
0-80	100-500	0-30	TODO

Steps

Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting

de novo

dominant

hom. rec.

comp. rec.

presets

De Novo

Strict

Recessive

Recessive

genotype

0/1

0/1

1/1

c/h index
- For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.
Handle unexpected high and low number of variants.
- In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.
- Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
- The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

1(1,2,3,4): Check the First N of M records label on above the results table, potentially adjust the Result row limit setting you can find in the More … ‣ Miscellaneous tab.

Thresholds

SOP: Filtering Singletons for X-chromosomal Variants 

Aims and Scope

The aim of this SOP is the filtration of singleton data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the autosomes is described in SOP: Filtering Singletons for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

X de novo	X dominant	X hom. rec.	X comp. rec.
TODO	TODO	TODO	TODO

Steps

Note

The following needs work by a geneticists, also in terms of practicability

Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting

X de novo

X dominant

X hom. rec.

X comp. rec.

presets

De Novo

Strict

Recessive

Recessive

genotype (M)

1/1

1/1

N/A

N/A

genotype (F)

0/1

0/1

1/1

c/h index
- The genotype of the index is chosen based on its sex (male M, female F).
- For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the daughter.
Enter chrX into the field Gene Lists & Regions ‣ Genomic Region.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed1.
Handle unexpected high and low number of variants.
- In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2.
- Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
- The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

SOP: Filtering Trios for Autosomal Variants 

Aims and Scope

The aim of this SOP is the filtration of trio data for variants on the autosomal chromosomes. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for de novo, dominant, homozygous recessive, and compound recessive variants.

Filtration for variants on the X chromosomes is described in SOP: Filtering Trios for X-chromosomal variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

de novo	dominant	hom. rec.	comp. rec.
0-3	50-150	2-75	2-20

Steps

Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).
Configure the Genotype according to the table below.
setting

de novo

dominant

hom. rec.

comp. rec.

presets

Strict

Strict

Recessive

Recessive

genotype

index

0/1

0/1

1/1

c/h index

parents

0/0, 0/0

0/0, 0/1

0/1, 0/1

–
- For dominant mode of inheritance, set the genotypes of the affected parent to 0/1 and the unaffected parent to 0/0.
- For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records1.
Handle unexpected high and low number of variants.
- Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).
- In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.
- Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
- The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

TODO

SOP: Filtering Trios for X-chromosomal variants 

Aims and Scope

The aim of this SOP is the filtration of trio data for variants on the X chromosome. Depending on the hypothesis on the mode of inheritance the steps differ slightly. Alternative actions are given for X-linked de novo, dominant, recessive.

Filtration for variants on the autosomes is described in SOP: Filtering Trios for Autosomal Variants. The evaluation of variants is described in SOP: Variant Assessment, the use of phenotype and pathogenicity scores is described in SOP: Prioritization with Phenotype and Pathogenicity Scores.

Result

The result is a list of variants in compatible mode of inheritance with appropriate population frequency. These can then be assessed as described in SOP: Variant Assessment. A typical WES data set yields the following variant counts (numbers will vary depending on the enrichment kit):

X de novo	X dominant	X hom. rec.	X comp. rec.
TODO	TODO	TODO	TODO

Steps

Note

The following needs work by a geneticists, also in terms of practicability

Use the Load Preset button to load filter presets (according to the table below and your mode of inheritance).

Configure the Genotype according to the table below.

setting

X de novo

X dominant

X hom. rec.

X comp. rec.

presets

Strict

Strict

Recessive

Recessive

genotype

index (M)

1/1

1/1

N/A

c/h index

index (F)

0/1

0/1

1/1

c/h index

mother

0/0

0/1 or 0/0

0/1

–

father

0/0

1/1 or 0/0

1/1

–

The genotype of the index is chosen based on its sex (male M, female F).

For dominant mode of inheritance, set the genotypes of the affected parent to variant (0/1 or 1/1 according to the table) and of the unaffected to 0/0.

For compound recessive mode of inheritance, selecting “c/h index” as mode of inheritance for the child enables the comp. het. mode and the parents’ genotype does have to be selected.

Enter chrX into the field Gene Lists & Regions ‣ Genomic Region.
Click Filter & Display.
Compare the resulting variant count with the numbers from the table above. Also check that all query result records are displayed (check the First N of M records label on above the results table, potentially adjust the Result row limit setting you can find in the More … ‣ Miscellaneous tab).
Handle unexpected high and low number of variants.
- Too many de novo and too few variants in the other modes of inheritance can be an indicator of issues with the sample relatedness (cf. SOP: Quality Control).
- In case of too few variants try relaxing the Quality settings, e.g., by setting DP het. to 8 and min AAB to 0.2. In the case of too few de novo variants, try setting the max AD setting of the parents to 2.
- Try adjusting the Frequency settings (keep in mind incidence rates of the case’s disorder).
- The presets Relaxed and Super Strict can be used for non-recessive modes of inheritance to adjust multiple thresholds at once.

Thresholds

SOP: Prioritization with Phenotype and Pathogenicity Scores 

Aims and Scope

The aim of this SOP is to use scores for prioritizing a list of candidate variants. Phenotype scores can be used for ranking variants by their affected gene’s match to the patient’s phenotypes. Pathogenicity scores can be used for estimating the impact of a variant.

The filtration of variants is described in the SOPs above. For guidelines on interpreting the scores see SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation.

Result

The result is a list of variants annotated with phenotype and/or pathogenicity scores that can be used for sorting and ranking variants. Further, by putting thresholds on the largest rank to consider or thresholds on the scores, the list of variants to be assessed can be shortened.

Steps

Open the More … ‣ Prioritization tab.
For using phenotype-based prioritization
- tick the Enable phenotype-based prioritization box,
- select an appropriate prioritization Algorithms, and
- enter (or paste) the HPO terms into the HPO Terms field.
For using variant pathogenicity prioritization
- tick the Enable variant pathogenicity-based prioritization box, and
- select the scoring method2 to use.
Click Filter & Display to trigger the filtration.
- Also check that all query result records are displayed1. The limit is applied to the variants sent for prioritization. You will not see the N top-ranking records but you will see a ranking of an arbitrary selection of N records in the case that the limit of records to display is smaller than the query result size N.
Click on the score and rank heading below the phenotype, pathogenicity, and/or pheno. & patho. columns to sort the table by phenotype, pathogenicity, or a combination of both scores.
Consider the top variants by one of the sorting methods from above, stop based on the rank or score:
- Rank: Consider the top N (e.g., =20) variants only.
  If you are in a time-limited setting, you should pick the number N in advance of your study to get reproducible results in terms of diagnostic yield.
- Score: (Note that the distribution of the different scores varies significantly).
  Consider the top-scoring variants until the score drops by a factor of 2 from one variant to the next.
  
  Consider the top-scoring variants until the score drops below a threshold T.

See SOP: Phenotype Score Interpretation and SOP: Pathogencity Score Interpretation for more information in score interpretation.

2(1,2): For using the UMD Predictor score you have to obtain a API token from https://umd-predictor.eu/ and enter it in VarFish in your user profile. You can reach the user profile by clicking on the person icon on the top left, then User Profile ‣ Settings ‣ Update ‣ UMD Predictor API Token. Note that UMD Predictor can only score SNVs.

Thresholds

SOP: Variant Assessment 

Aims and Scope

This SOP describes how to assess variants with the information integrated into VarFish. Clicking the little “>” on the left of the result table folds out the details of the given variant.

Result

The result is a better understanding of the variant and gene.

Steps

Note

The following needs refinement. Actually, it does not read like a SOP but rather an extended manual.

Consider the Gene information box.
- The Name, Gene Family, and NCBI Summary give a first impression about the gene and its molecular functional and implication in diseases. Genes with missing or very short NCBI Summary are often not well-characterized and such genes are hard to link to diseases.
- ClinVar for Gene gives the number of pathogenic and likely pathogenic variants in the gene and shows how often the gene has been implicated in disease in ClinVar.
- HPO Terms displays all HPO terms associated with a gene and, if present, the annotated modes of inheritance of diseases linked to this gene.
- OMIM Phenotypes gives the OMIM diseases linked to the gene.
- Gene RIFs displays short “reference into function” notes on PubMed articles that report on the gene.
- Constraints shows gene contraint scores from ExAc and gnomAD for this gene.
- The remaining fields provide link-outs into NCBI Entrez, ENSEMBL, and OMIM.
The ClinVar for Variant table shows ClinVar annotations for the given variant, if any.
The Frequency Details table provides detailed information about the frequency of the variant in different populations given in the different population databases.
The Transcript Information table shows the impact of the variant on all transcripts of the gene.
The Genotype and Call Infos provides detailed information about the variant call.
The UCSC 100 Vertebrate Conservation box shows the alignment of the corresponding amino acid in the UCSC 100 vertebrate alignment (the evoluationary distance to human decreases from left to right), if available. This information can be used for getting a feeling on how conserved the location is in the gene.

SOP: Using Variant Link-Outs 

Aims and Scope

This SOP describes how to use the most relevant link-out features of VarFish for estimating the pathogenicity and relevance of a given variant for a case’s disorder. Note that this is an non-comprehensive list of pragmatic points that fit on two pages of paper. The ACMG and ACGS guidlines.

Result

The result is a better understanding of the variant’s pathogenic potential.

Steps

Use the IGV button on the right of the variant result table row. If IGV is running and configured properly then IGV will jump to the given position such that you can inspect the variant in the raw data.
Use the MT button on the right of the variant result table row. This will run MutationTaster (MT) on your variant. The result page displays the analysis summary for each affected transcript and then details for each affected transcript.
- The prediction disease causing (automatic) and polymorphism (automatic) ist most important, followed by the probability given by the MT classifier.
- The splice sites analysis gives interesting information about whether splicing is predicted to be affected.
- The conservation provides information about conservation.

The following link-outs are shown when clicking on the little downward arrow next to IGV.

Use Locus @UCSC to consider the locus in UCSC genome browser.
Use Human Splicing Finder (HSF) for estimating the effect of a variant on the splicing of a gene’s transcripts. The link-out will open a new tab showing the results of the HSF (which will also give predictions for deepl intronic variants).
Use Query varSEAK Splicing for also estimating the effect of a variant on splicing of a gene’s transcripts. varSEAK does not show results of deep intronic variants.
Use Query PolyPhen 2 for obtaining PolyPhen 2 scores of missense variants.
Use Query UMD Predictor2 for querying the UMD Predictor (note that this only works for SNVs.
Use: Query Varsome for looking up the variant in Varsome

SOP: Using Gene Link-Outs 

Aims and Scope

This SOP describes how to use the most relevant link-out features of VarFish for estimating the relevance of a given for a case’s disorder. Note that this is a highly non-comprehensive list that only highlights selected aspects of some databases that fits on two pages of paper.

Result

The result is a better understanding on whether a defect in the gene can be responsible for the case’s disorder.

Steps

Note

The following needs to be done.