Medical Biology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 6 of 6
  • Item
    No Preview Available
    Identification of molecular pathways associated with susceptibility and immunity to severe dengue and malaria
    Studniberg, Stephanie Irene ( 2023-12)
    In this era of increasing globalisation, urbanisation, and worsening climate change, the geographical range of transmission-competent mosquito vectors is shifting. Mosquito-borne diseases such as malaria and dengue are gradually emerging in previously unaffected areas, and re-emerging in areas where they had once previously subsided. With alarming increases in dengue case incidence, and for the first time, a chapter addressing the influence of climate change on malaria transmission in the World Malaria Report, it is clear that these two important vector-borne diseases are of utmost global relevance. As per the World Health Organization (WHO) guidelines, individuals presenting with warning signs signifying progression to severe dengue are required to remain under hospital observation. However, these warning signs appear late in the disease course and are non-specific. Consequentially, hospitals become overwhelmed with patients admitted for in-patient observation, many of whom do not progress to severe dengue. Biomarkers to detect progression to severe dengue upon hospital presentation are much needed to improve patient triage and resource allocation. In malaria, despite the great achievement of the recommendation by the WHO for the use of the RTS,S and R21/Matrix-M vaccines in children living in endemic areas, reductions in malaria case incidence remain at a prolonged stall. It is clear that efficacious vaccines approved for children to adults are required to reduce the global malaria burden. Further elucidation of the molecular mechanisms underlying the immune response to dengue and malaria is imperative if these outcomes are to be achieved. To address these outstanding concerns, an integrative systems immunology approach was utilised to identify molecular pathways associated with susceptibility and immunity to severe dengue and malaria. The studies within this thesis have integrated data from single-cell mass cytometry, serology, and transcriptional profiling of peripheral blood mononuclear cells from individuals progressing to either dengue fever (DF) or dengue haemorrhagic fever (DHF), as well as individuals living in a malaria-endemic regions of Indonesia with either symptomatic or asymptomatic Plasmodium falciparum and Plasmodium vivax malaria. Integrative data analysis identified frequencies and transcriptional profiles of effector CD4+ and CD8+ T cells as important components of dengue immunity in individuals progressing to DF. Furthermore, high frequencies of defined populations of CD4+ non-classical monocytes were associated with increased odds of developing DHF. Our approach discovered a strong transcriptional phenotype of immunosuppression underlying asymptomatic P. falciparum malaria, suggesting that the carriage of these infections could preclude complete parasite clearance. Lastly, unlike symptomatic P. falciparum malaria that induced a highly inflammatory response, clinical P. vivax infection featured the upregulation of anti-inflammatory pathways and checkpoint receptors, providing a feedback loop to ameliorate symptomatic infection. Furthermore, gene set enrichment analysis revealed profound dysfunction of the blood monocyte compartment in both symptomatic and asymptomatic P. vivax malaria. Together, the findings in this thesis have critical implications for the deployment and efficacy of malaria vaccines, and for the development of diagnostic tools to predict disease outcomes for dengue patients at point-of-care.
  • Item
    No Preview Available
    Statistical models for pre-processing and simulating single cell RNA-seq data
    Wang, Jianan ( 2023-10)
    Since the first protocol published in the year of 2009, single-cell RNA sequencing (scRNA-seq) has become one of the most popular technologies in the omics world. scRNA-seq has been broadly applied in different areas, including understanding tumor microenvironment, inferring embryonic development, and discovering regulation pathways involved in plant seeding. The diverse applications facilitate the development of the scRNA-seq, and new protocols tend to sequence numerous of cells at low cost. With rapid growth of scRNA-seq technology, many issues arise and wait to be resolved when processing the data, such as removing batch effects, addressing dropout events, and annotation of cell populations. However, the requirements to tackle the above issues bring opportunities to apply statistical methods and devise new computational tools for analysing the data. In this thesis, we focus on pre-processing and simulation of single cell data. In the first part of the thesis, we mainly discuss different demultiplexing methods in the pre-processing step. The first method, CMDdemux, utilizes Mahalanobis distance to distinguish cells from different samples. This method performs well in low-quality data with hashtag contaminations during the cell hashing experiment. The second method is LCADemux, which uses latent class analysis (LCA) to combine cell hashing- and single nucleotide polymorphism (SNP)-based demultiplexing results. This hybrid framework is advantageous for analysing low input cell hashing data. The third method, LCAdoublet, applies the LCA method to combine doublet information from cell hashing and VDJ data. LCAdoublet is better at identifying inter-sample doublets and T-cell doublets than methods solely using transcriptomics data. In summary, our three pre-processing methods enable the accurate identification of doublets and assigning cells to their samples of origins, which contributes to cleaner data for downstream analysis. In the second part of the thesis, we talk about a novel single cell simulation method and its applications for data analysis. Our method is named GLMsim, and it applies a generalized linear model to simulate the batch and biological effects simultaneously. Compared to other methods, our method is able to simulate single cell data resembling the original data, especially for the data collected under complex conditions. Our single cell simulators have multiple applications, such as benchmarking single cell integration methods, providing guidance for differential expression analysis and checking the assumptions of models. In short, our simulation method should help researchers develop better tools for downstream analysis. Overall, we applied multiple statistical methods to fill gaps existing in dealing with single cell RNA-seq data. Our methods have been demonstrated to overcome challenges when analysing single cell RNA-seq data. We hope that they can be applied in the medical field to extend our knowledge of human health and disease.
  • Item
    Thumbnail Image
    Detecting individuals with rare disease variants by identifying shared haplotypes using SNP genotyping data
    Robertson, Erandee Kasunjalee ( 2023-06)
    Identifying disease-causing variants (DCVs) provides a genetic diagnosis for patients. There is a chance for prevention, or the treatment can start while the patient is asymptomatic. Subsequent genetic counselling may lead to opportunities for personalised patient care and may lead to more prolonged survival. Many approaches are available to detect DCVs in individuals. DCVs can be directly sequenced using next-generation sequencing or long-read sequencing. These types of genetic sequencing are expensive and may only be affordable to some people. The most commonly available genetic data for patient cohorts are single nucleotide polymorphism (SNP) genotyping arrays. Millions of individuals have been and continue to be genotyped with SNP genotyping arrays, mainly driven by genome-wide association studies, making it a ubiquitous platform for analysis. However, rare variants are generally not identifiable on SNP genotyping arrays as they are not captured. Rare variants not captured by the SNP genotyping arrays can be recovered with established, genome-wide imputation and phasing algorithms. However, imputation accuracy is known to be low when imputing rare variants. Individuals who inherit the same genetic mutation from a common ancestor or a founder also share genomic regions or haplotypes surrounding the shared DCV. Such inherited genomic regions from a common ancestor are referred to as identity by descent (IBD) segments. IBD tracts can be used as surrogates for the presence of DCVs. Bioinformatic algorithms to detect IBD tracts (inherited segments in individuals) have previously been developed but do not use knowledge of specific DCV haplotypes. This thesis describes a novel bioinformatic screening algorithm called FoundHaplo, an IBD-based hidden Markov model, to infer the presence of rare inherited DCVs in individuals by identifying the inheritance of the associated disease-causing haplotype. The FoundHaplo algorithm uses commonly available SNP genotyping data by leveraging known disease-causing haplotypes with founder effects. The algorithm is designed to accommodate genotype, imputation, and phasing errors and is available as an R package from https://www.github.com/bahlolab/FoundHaplo. Statistical and bioinformatics analyses were carried out to assess FoundHaplo’s performance. The FoundHaplo algorithm performs best when the disease-case individual pairs have a more recent common ancestor, allowing the preservation of a larger IBD segment of the ancestral disease haplotype. The algorithm also displays increased power when more unique disease haplotypes are included in predicting a single DCV. This allows the algorithm to accumulate evidence from all the available known disease haplotypes. A database schema is developed and described in the thesis to efficiently accumulate known disease haplotypes for use with FoundHaplo and to maintain the confidentiality of one’s own data. Using the accumulated disease haplotypes, the algorithm was applied to large cohorts to search for individuals with rare variants associated with epilepsy. In summary, FoundHaplo enables the use of ubiquitous SNP genotyping array data to screen patients for known DCVs. Such individuals can then be validated with standard sequencing approaches. FoundHaplo should increase the diagnostic rate of rare diseases with strong founder effects. Lastly, even though the thesis details the development and application of the FoundHaplo algorithm in the context of human diseases and cohorts, the algorithm can be applied to predict inherited genetic variants in any recombining genome.
  • Item
    Thumbnail Image
    Computational tools for long-read DNA methylation analysis and benchmarking complex single-cell genomics pipelines
    Su, Shian ( 2023-03)
    Developing new high-throughput assaying techniques necessitates the development of novel bioinformatics software that can not only extract insight from newly generated data types, but also evaluate the efficacy of newly developed tools. To this end, I created the NanoMethViz package to enhance the exploratory data analysis of DNA methylation data obtained from ONT long-read sequencing through the provision of data management and visualisation tools. The application of this software to female mouse placenta and neural stem cell samples enabled the study of methylation patterns in the context of X-inactivation. Additionally, the proliferation of single-cell analysis techniques and the need for comprehensive pipeline-level benchmarking led me to create the CellBench package, which can automatically execute complete combinations of methods to fully characterise the performance of single-cell analysis pipelines. This package establishes a benchmarking framework for combinations of methods that promotes modular code without duplication, resulting in readable, reproducible, and extensible pipeline benchmarking code. Both packages are open source and available through the R/Bioconductor repository, providing useful support for researchers who are working with these emerging and quickly advancing genomic technologies.
  • Item
    Thumbnail Image
    Understanding Information Flow in Signalling Pathways Using Network-Based Analysis of Phosphoproteomic Data
    Huckstep, Hannah ( 2022)
    Protein phosphorylation is a post-translational mechanism playing a key role in the regulation of almost all cellular processes. Currently is it estimated that up to a third of the human proteome is phosphorylated at any one time, and over 100,000 distinct human phosphosites have been recorded to date. However, a bottleneck is forming where phosphorylation sites are continually being discovered and recorded yet, to date, less than 5% have functional annotations. Typically, analysis of phosphoproteomics experimental results involve literature searching, pathway or ontology enrichment analysis, and database mining. However, no previous research existed that evaluated the popular public databases providing these analyses capabilities. To address this, I evaluated seven different knowledgebases comprising four literature derived pathway databases, two protein-protein interaction databases, and one phosphoproteomic focused database. I first compared each databases coverage of the human proteome and phosphoproteome, followed by an assessment of the consistency of the phosphorylation annotations held in each to a global standard. Finally, I compared each databases coverage of six experimental datasets. This enabled me to identify the strengths and weaknesses of the most common knowledge bases for the analysis of phosphoproteomics results. Although my research identified which knowledgebases are best suited to analyse phosphoproteomics data, the current methods they employ still throw away fundamental information and summarise multiple phosphopeptides into single protein-level entities. The lack of specific downstream phosphoproteomics analysis tools capable of assigning mechanistic insight to phosphorylations is a major contributor to the phosphoproteomic bottleneck described above. I therefore developed maph, a command line network-based tool built specifically to leverage previous knowledge in the downstream analysis of phosphoproteomic data. maph employs a novel scoring method that calculates how much an experimental phosphopeptide supports a protein in one phosphorylation state over another in the database, allowing for guidance in hypothesis generation on which phosphorylated proteins are involved in signalling cascades. In addition to three other analysis functions, maph contains a novel neighbourhood analysis method. As phosphoproteomics experiments only measure phosphorylated proteins, this analysis is able to highlight proteins and/or complex’s that may have a higher-than-expected number of measured nodes close to it in the signalling network. I demonstrate the power of this method in understanding signalling through both canonical and novel mechanisms by using maph to analyse a novel phosphoproteomics dataset characterising a time course of MPL signalling in HPC7 cells. I analysed the raw data using MaxQuant, performed normalisation, variance removal and differential abundance analysis using limma and data imputation using msImpute. I then used maph to generate novel insights into MPL signalling and highlight multiple previously unexplored signalling mechanisms evidenced in this signalling network analysis. In summary, my PhD developed methodology to enable the detailed exploration of phosphoproteomic data without sacrificing peptide-level phosphosite information. This was done by first evaluating which literature-derived knowledgebases gave the best coverage of data, then developing an integrated network analysis tool and novel methodologies to analyse data on the integrated network. This work highlights the need for downstream phosphoproteomics analyses to address the current interpretation bottleneck in phosphoproteomics, and the importance of retaining phosphosite information for analysis outside the more limited kinase-substrate framework.
  • Item
    Thumbnail Image
    The Purification, Identification, and Measurement Of RNA-Binding Proteins
    Smith, Jeffrey Michael ( 2021)
    RNA-binding proteins (RBPs) are classically regarded as facilitators of gene expression. In recent years, however, RNA-protein interactions have also emerged as a pervasive force in the regulation of homeostasis. The compendium of proteins with provable RNA-binding function has swelled from the hundreds to the thousands astride the partnership of MS-based proteomics and RNA Sequencing. At the foundation of these advances is the adaptation of RNA-centric capture methods that extract protein that has been crosslinked in its native environment. These methods reveal snapshots in time displaying an extensive network of regulation and a wealth of data that can be used for both the discovery of RNA-binding function and the molecular interfaces at which these interactions occur. This thesis describes the development of an extraction method that purifies RBP-RNA complexes. This method differentiates itself from other RBP-discovery protocols in that it 1) purifies these complexes so completely that RBP identification can be conducted qualitatively and without differential abundance analysis, 2) permits transcript-targeted capture with sequence-specific oligos, 3) permits global, sequence-agnostic capture 4) both RBP and its bound RNA are isolated intact and 5) can reliably interrogate RBPs at depths that exceed present methods without metabolic or molecular labelling. The performance of this method is first assessed with a census of proteins that directly interact with global, or targeted, RNA transcripts from model cell lines. These efforts are then extended to investigate how protein-RNA interactions change during transition from quiescence to proliferation and then contraction in primary murine CD8+ T cells. Finally, these studies demonstrate how cellular responses provoke different proteins to moonlight as RNA binders and sheds light on a network of complex, co-evolved molecular machines.