Biochemistry and Pharmacology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Measuring intolerance to missense variation within the human genome and proteome
    Silk, Michael Aanand ( 2021)
    This thesis summarises an experimental investigation of the measurement and application of intolerance to missense variation in the human genome, and its use in predicting the functional consequences of variants, as well as its ability to identify novel functionally relevant protein features. Using gnomAD, currently the largest publicly available dataset of sequenced human exomes, as well as UK Biobank and DiscovEHR population variation databases, I have systematically measured the proportion of missense variation across over 18,000 human genes and 80,000 gene transcripts over a sliding window of 31 codons, named the Missense Tolerance Ratio (MTR), and observed that known pathogenic variants in epilepsy patients preferentially exist in regions estimated as intolerant. We further validated the MTR using the set of known pathogenic variants in ClinVar and observed a significant difference in the MTR distribution between these and novel control missense variant datasets. Intolerant regions within a gene’s sequence have also been observed to cluster within the protein tertiary structure. We anticipate that the MTR therefore has extraordinary potential in identifying important functional domains within protein structures. Current methods of estimating the functional importance of regions within structures largely rely on conservation, however this is heavily dependent on the depth and appropriateness of the alignment where functionality is often not fully preserved between species. Missense intolerance within tertiary structures was measured using the MTR3D and shown to provide complementary information to the MTR. By combining the MTR and MTR3D with additional structural properties such as residue depth, we also developed the MTRX, a combined measure of intolerance that incorporates the predictions from the different scores. This was shown to have high predictive power towards known pathogenic variants. To assist in the prediction of variant consequences and inform on research in drug design, protein biochemistry and gene analyses, we are providing these intolerance estimates in their sequence-based and structure-based formulations to be freely available as user-friendly web-servers.