Clinical Pathology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Statistical and Functional Genomics of Host-Microbial Interactions
    Fachrul, Muhamad ( 2022)
    The continuous refinement of sequencing technologies has allowed bioinformatics to evolve as a field, as new data types enable the analysis of multiple levels of biological dimensions. Yet, an issue shared between these “omic” types persists in the form of variations introduced by either technical or biological confounders that are ultimately unwanted, as they may conflate true biological signals from the condition of interest and result in misleading conclusions. Technical confounders are introduced from inconsistencies during experimental design, while biological confounders are inherent and a product of the genetic structure of individuals. This thesis aims to capture the complexities of host-microbial dynamics in a manner that accurately captures biological variance of interest, describing separate approaches to minimizing the impact of unwanted variations from technical and biological confounders in metagenomic and transcriptomic data, respectively. This thesis addresses confounders in two different aspects of the host-microbial dynamics: the presence of microbial life in the host, as well as the host molecular response towards microbial life in the body. Chapter 2 touches on the first aspect as it delves into the method of removing variations introduced by technical batches in metagenomic data. Using a method originally designed for single-cell RNAseq called RUV-III-NB, unwanted technical variations were assessed and removed from microbiome profiles of pig faecal samples that underwent various storage options and sample treatments. The study identified storage conditions and freeze-thaw cycle among the highest contributors to unwanted variations in microbiome abundance, particularly affecting high-abundance bacterial taxa. RUV-III-NB’s consistently robust corrective performance was also shown when benchmarked with other popular batch correction methods. This chapter describes the importance of preventative measures during experimental design and at the same time offers a robust corrective measure in-silico. In Chapter 3, biological confounding in the form of population stratification is addressed in transcriptomic data analysis using a dataset from a Salmonella Typhi infection study from Nepal. A bioinformatics pipeline to capture genetic structure named RGStraP (RNAseq-based Genetic Stratification PCs) was developed for this study to capture genetic structure solely based on RNAseq samples. This chapter demonstrates RGStraP’s capability as a robust alternative for capturing genetic structure based on its performance when compared to paired array genotypes, as shown by SNP-level genetic concordance and canonical correlations between two sets of genetic principal components. The effect of population stratification on gene expression data is also shown, as the lack of adjustment based on genetic structure in downstream RNAseq analysis may result in possible exaggeration of significant associations. Using the same RNAseq dataset from Nepal, Chapter 4 profiles the host gene expression signature towards S. Typhi infection. Both technical and biological confounders were addressed in downstream RNAseq analyses using results from RUVg and RGStraP, respectively. This chapter presents a distinct gene expression signature between confirmed cases and healthy controls, from which subclades of samples could be determined using unsupervised hierarchical clustering. A typhoid disease classifier was constructed from the S. Typhi-specific gene signature and was tested on external validation sets, showing promising potential in diagnosing S. Typhi infection from patient’s gene expression. Finally, Chapter 5 discusses the overarching takeaway from the studies, current limitations, as well as future directions for the studies involved.