Clinical Pathology - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Evaluating the clinical applicability of tumour mutational signatures in colorectal cancer and related syndromes
    Georgeson, Peter ( 2022)
    Colorectal cancer (CRC) poses a major health burden. It is the second most common cause of cancer death, with the impact of CRC incidence and mortality continuing to grow worldwide. Early detection of CRC substantially improves outcomes, motivating the adoption of screening programs aimed at identifying high-risk individuals for ongoing surveillance. However, efforts to identify individuals predisposed to developing CRC have been hampered by the complexity and heterogeneity of CRC. Recent advances in DNA sequencing technology enable the genome to be studied in high resolution, providing the ability to detect a wide array of somatic mutations and rearrangements in the DNA of cancer cells. Certain mutagenic processes leave identifiable mutational patterns in cancer genomes. The advent of cost-effective large-scale DNA sequencing enables systematic detection of these patterns, known collectively as tumour mutational signatures. To date the main application of mutational signatures has been research focused, where they have been used to determine cancer subtypes and categorise the underlying changes to DNA associated with those subtypes. However, their applicability to clinical contexts have not been sufficiently explored. A limitation to the adoption of mutational signatures clinically is the prevalence of FFPE-preserved tissue in conjunction with whole-exome and panel-sequenced data, in contrast to the use of fresh-frozen whole-genome sequenced data typical in research settings. Formalin is mutagenic, which can result in artefactual variants, while at least an order of magnitude fewer mutations are detected with whole-exome and panel-sequenced data compared to whole-genome. We assess the utility of mutational signatures generated from both whole-exome and panel-sequenced data derived from FFPE-preserved tissue. Specifically, we show that inherited predispositions to CRC, including Lynch syndrome and MUTYH-associated polyposis, can be accurately identified with whole-exome sequenced data from FFPE-preserved tumour tissue, and that, with the correct methodology, biallelic MUTYH carriers can be identified from panel-sequenced FFPE-preserved tumour tissue. Understanding the relationship between environmental exposures and CRC development has implications for both prevention and screening. We consider the ability of mutational signatures to detect mutation patterns arising from exposure to colibactin, the genotoxic compound synthesised by pathogenic E. coli and a potential cause of sporadic (non-inherited) CRC. Demonstrating distinct genomic, clinic-pathological and epidemiological characteristics, we show the potential existence of a distinct subtype of CRC based on the presence of the colibactin-associated mutational signature. The effectiveness of mutational signatures depends on the environment in which they are calculated. We analyse the impact of key analytical parameters and recommend specific filtering settings for variant allele fraction and sequencing depth. We identify situations where mutational signatures are less effective, recommending minimum mutation counts and maximum signature reconstruction error, enabling confidence in mutational signature results to be based on their specific application. The results presented in this thesis have clinical applications. We show that applying mutational signatures to individual tumours provides direct evidence suggesting a particular aetiology. More broadly, mutational signatures provide evidence indicating the likely pathogenicity of co-occurring mutations. Mutational signatures are an important technique for extracting information from sequencing data. This thesis demonstrates clinical applications of mutational signatures in CRC and related syndromes.
  • Item
    Thumbnail Image
    Statistical and Functional Genomics of Host-Microbial Interactions
    Fachrul, Muhamad ( 2022)
    The continuous refinement of sequencing technologies has allowed bioinformatics to evolve as a field, as new data types enable the analysis of multiple levels of biological dimensions. Yet, an issue shared between these “omic” types persists in the form of variations introduced by either technical or biological confounders that are ultimately unwanted, as they may conflate true biological signals from the condition of interest and result in misleading conclusions. Technical confounders are introduced from inconsistencies during experimental design, while biological confounders are inherent and a product of the genetic structure of individuals. This thesis aims to capture the complexities of host-microbial dynamics in a manner that accurately captures biological variance of interest, describing separate approaches to minimizing the impact of unwanted variations from technical and biological confounders in metagenomic and transcriptomic data, respectively. This thesis addresses confounders in two different aspects of the host-microbial dynamics: the presence of microbial life in the host, as well as the host molecular response towards microbial life in the body. Chapter 2 touches on the first aspect as it delves into the method of removing variations introduced by technical batches in metagenomic data. Using a method originally designed for single-cell RNAseq called RUV-III-NB, unwanted technical variations were assessed and removed from microbiome profiles of pig faecal samples that underwent various storage options and sample treatments. The study identified storage conditions and freeze-thaw cycle among the highest contributors to unwanted variations in microbiome abundance, particularly affecting high-abundance bacterial taxa. RUV-III-NB’s consistently robust corrective performance was also shown when benchmarked with other popular batch correction methods. This chapter describes the importance of preventative measures during experimental design and at the same time offers a robust corrective measure in-silico. In Chapter 3, biological confounding in the form of population stratification is addressed in transcriptomic data analysis using a dataset from a Salmonella Typhi infection study from Nepal. A bioinformatics pipeline to capture genetic structure named RGStraP (RNAseq-based Genetic Stratification PCs) was developed for this study to capture genetic structure solely based on RNAseq samples. This chapter demonstrates RGStraP’s capability as a robust alternative for capturing genetic structure based on its performance when compared to paired array genotypes, as shown by SNP-level genetic concordance and canonical correlations between two sets of genetic principal components. The effect of population stratification on gene expression data is also shown, as the lack of adjustment based on genetic structure in downstream RNAseq analysis may result in possible exaggeration of significant associations. Using the same RNAseq dataset from Nepal, Chapter 4 profiles the host gene expression signature towards S. Typhi infection. Both technical and biological confounders were addressed in downstream RNAseq analyses using results from RUVg and RGStraP, respectively. This chapter presents a distinct gene expression signature between confirmed cases and healthy controls, from which subclades of samples could be determined using unsupervised hierarchical clustering. A typhoid disease classifier was constructed from the S. Typhi-specific gene signature and was tested on external validation sets, showing promising potential in diagnosing S. Typhi infection from patient’s gene expression. Finally, Chapter 5 discusses the overarching takeaway from the studies, current limitations, as well as future directions for the studies involved.