Clinical Pathology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Statistical and Functional Genomics of Host-Microbial Interactions
    Fachrul, Muhamad ( 2022)
    The continuous refinement of sequencing technologies has allowed bioinformatics to evolve as a field, as new data types enable the analysis of multiple levels of biological dimensions. Yet, an issue shared between these “omic” types persists in the form of variations introduced by either technical or biological confounders that are ultimately unwanted, as they may conflate true biological signals from the condition of interest and result in misleading conclusions. Technical confounders are introduced from inconsistencies during experimental design, while biological confounders are inherent and a product of the genetic structure of individuals. This thesis aims to capture the complexities of host-microbial dynamics in a manner that accurately captures biological variance of interest, describing separate approaches to minimizing the impact of unwanted variations from technical and biological confounders in metagenomic and transcriptomic data, respectively. This thesis addresses confounders in two different aspects of the host-microbial dynamics: the presence of microbial life in the host, as well as the host molecular response towards microbial life in the body. Chapter 2 touches on the first aspect as it delves into the method of removing variations introduced by technical batches in metagenomic data. Using a method originally designed for single-cell RNAseq called RUV-III-NB, unwanted technical variations were assessed and removed from microbiome profiles of pig faecal samples that underwent various storage options and sample treatments. The study identified storage conditions and freeze-thaw cycle among the highest contributors to unwanted variations in microbiome abundance, particularly affecting high-abundance bacterial taxa. RUV-III-NB’s consistently robust corrective performance was also shown when benchmarked with other popular batch correction methods. This chapter describes the importance of preventative measures during experimental design and at the same time offers a robust corrective measure in-silico. In Chapter 3, biological confounding in the form of population stratification is addressed in transcriptomic data analysis using a dataset from a Salmonella Typhi infection study from Nepal. A bioinformatics pipeline to capture genetic structure named RGStraP (RNAseq-based Genetic Stratification PCs) was developed for this study to capture genetic structure solely based on RNAseq samples. This chapter demonstrates RGStraP’s capability as a robust alternative for capturing genetic structure based on its performance when compared to paired array genotypes, as shown by SNP-level genetic concordance and canonical correlations between two sets of genetic principal components. The effect of population stratification on gene expression data is also shown, as the lack of adjustment based on genetic structure in downstream RNAseq analysis may result in possible exaggeration of significant associations. Using the same RNAseq dataset from Nepal, Chapter 4 profiles the host gene expression signature towards S. Typhi infection. Both technical and biological confounders were addressed in downstream RNAseq analyses using results from RUVg and RGStraP, respectively. This chapter presents a distinct gene expression signature between confirmed cases and healthy controls, from which subclades of samples could be determined using unsupervised hierarchical clustering. A typhoid disease classifier was constructed from the S. Typhi-specific gene signature and was tested on external validation sets, showing promising potential in diagnosing S. Typhi infection from patient’s gene expression. Finally, Chapter 5 discusses the overarching takeaway from the studies, current limitations, as well as future directions for the studies involved.
  • Item
    Thumbnail Image
    Clinical outcome prediction using biomedical data and machine learning approaches
    Liu, Yang ( 2022)
    Identifying asymptomatic individuals with increased susceptibility to disease provides substantial opportunities for preventative interventions. Over the past few years, the advances in sequencing and computing technologies have enabled omics-driven disease prediction modelling which may aid in the exploration of new biomarkers and future clinical utility. Recent studies have revealed evidence linking human gut microbiota with the pathogenesis of various complex diseases. However, previous studies have been limited by cross-sectional study design and there are limited data regarding the longitudinal association between baseline gut microbiome and incident diseases. In addition, there are few published studies on incident disease prediction combing genetic risk and gut microbial risk factors. To address this, we designed a longitudinal study to examine the predictive utility of clinical metadata, the gut metagenomics and genomics data for a series of complex diseases, using statistical and machine learning approaches in a large population-based cohort with ~15 years of electronic health records follow-up. Chapter 1 provides a comprehensive review on advances and challenges in complex disease prediction. Emerging prediction methods and novel biomarkers are highlighted, including the polygenic risk scores, gut metagenomics, and machine learning approaches in the context of disease prediction. Recent progress in clinical utility of the advancements in multi-omics-based prediction, and future challenges and potential opportunities for clinical translation are discussed. In Chapter 2, the potential of gut microbiota for prospective risk prediction of liver disease was investigated using machine learning approaches. The predictive capacity of the baseline gut microbiota was evaluated individually and in combination with conventional risk factors. The results demonstrated that the microbiome augmentation of conventional risk factors using gradient boosting classifiers significantly improved prediction performance. Investigation of predictive microbial signatures revealed previously unknown bacterial taxa for incident liver disease, as well as those previously associated with hepatic function and disease. In Chapter 3, the associations with baseline gut microbiome were tested for incident respiratory diseases, including COPD and adult-onset asthma. The gut microbial alterations and variations at each taxonomic level were compared between disease cases and non-cases. Machine learning models demonstrated moderate predictive capacities of baseline gut microbiome for incident asthma/COPD. Subgroup analyses indicated gut microbiome was significantly associated with incident COPD in both current smokers and non-smokers, as well as in individuals who reported never smoking. In Chapter 4, the predictive utility of genetic risk factors, gut microbial risk factors, and lifestyle risk factors was investigated for multiple complex diseases, including myocardial infarction, coronary heart disease, prostate cancer, Type 2 diabetes and Alzheimer’s disease. Since the gut microbiome is involved in numerous host physiological processes and linked to all vital organs, it was hypothesized that the gut microbiome can reflect host environmental risk factors for relevant diseases. It was also hypothesized that the inclusion of genetic susceptibility could improve the prediction performance over clinical risk factors for complex diseases. The findings demonstrated the individual and combined impact of polygenic predisposition and variations in baseline gut microbiota on disease incidence. This thesis presents a comprehensive investigation of the integrative use of clinical metadata and multi-omics data, the human gut metagenomics in particular, for incident disease prediction. The findings of this work provide an evidence base for the translation of omics and machine learning to risk prediction of multiple diseases, and support further investigation into identification of new biomarkers for disease risk assessment and prevention.