Medical Biology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Detecting individuals with rare disease variants by identifying shared haplotypes using SNP genotyping data
    Robertson, Erandee Kasunjalee ( 2023-06)
    Identifying disease-causing variants (DCVs) provides a genetic diagnosis for patients. There is a chance for prevention, or the treatment can start while the patient is asymptomatic. Subsequent genetic counselling may lead to opportunities for personalised patient care and may lead to more prolonged survival. Many approaches are available to detect DCVs in individuals. DCVs can be directly sequenced using next-generation sequencing or long-read sequencing. These types of genetic sequencing are expensive and may only be affordable to some people. The most commonly available genetic data for patient cohorts are single nucleotide polymorphism (SNP) genotyping arrays. Millions of individuals have been and continue to be genotyped with SNP genotyping arrays, mainly driven by genome-wide association studies, making it a ubiquitous platform for analysis. However, rare variants are generally not identifiable on SNP genotyping arrays as they are not captured. Rare variants not captured by the SNP genotyping arrays can be recovered with established, genome-wide imputation and phasing algorithms. However, imputation accuracy is known to be low when imputing rare variants. Individuals who inherit the same genetic mutation from a common ancestor or a founder also share genomic regions or haplotypes surrounding the shared DCV. Such inherited genomic regions from a common ancestor are referred to as identity by descent (IBD) segments. IBD tracts can be used as surrogates for the presence of DCVs. Bioinformatic algorithms to detect IBD tracts (inherited segments in individuals) have previously been developed but do not use knowledge of specific DCV haplotypes. This thesis describes a novel bioinformatic screening algorithm called FoundHaplo, an IBD-based hidden Markov model, to infer the presence of rare inherited DCVs in individuals by identifying the inheritance of the associated disease-causing haplotype. The FoundHaplo algorithm uses commonly available SNP genotyping data by leveraging known disease-causing haplotypes with founder effects. The algorithm is designed to accommodate genotype, imputation, and phasing errors and is available as an R package from https://www.github.com/bahlolab/FoundHaplo. Statistical and bioinformatics analyses were carried out to assess FoundHaplo’s performance. The FoundHaplo algorithm performs best when the disease-case individual pairs have a more recent common ancestor, allowing the preservation of a larger IBD segment of the ancestral disease haplotype. The algorithm also displays increased power when more unique disease haplotypes are included in predicting a single DCV. This allows the algorithm to accumulate evidence from all the available known disease haplotypes. A database schema is developed and described in the thesis to efficiently accumulate known disease haplotypes for use with FoundHaplo and to maintain the confidentiality of one’s own data. Using the accumulated disease haplotypes, the algorithm was applied to large cohorts to search for individuals with rare variants associated with epilepsy. In summary, FoundHaplo enables the use of ubiquitous SNP genotyping array data to screen patients for known DCVs. Such individuals can then be validated with standard sequencing approaches. FoundHaplo should increase the diagnostic rate of rare diseases with strong founder effects. Lastly, even though the thesis details the development and application of the FoundHaplo algorithm in the context of human diseases and cohorts, the algorithm can be applied to predict inherited genetic variants in any recombining genome.