Minerva Elements Records

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    The evolutionary history of genes and transcriptional networks reveals fundamental properties of cancer associated with the breakdown of multicellularity
    Trigos Gomez, Anna Sofia ( 2018)
    All biological systems follow the rules and constraints imposed during their evolution. Current-day gene phenotypes such as gene expression, gene essentiality, gene function and protein localization are linked with the time of evolutionary emergence of genes. In cancer, tumours rely on cellular processes that date back to unicellular ancestors (e.g., cell replication, glycolysis), while dysregulating key pathways linked to the emergence of multicellularity, suggesting that the transition from unicellularity to multicellularity left vulnerabilities in cells that act as guiding principles during cancer development. Therefore, in this thesis I integrate genomics, systems biology and evolutionary biology to investigate fundamental principles of tumourigenesis related to the evolutionary history of genes using gene expression and somatic mutation information across multiple tumour types. First, I coupled the evolutionary age of genes and cellular processes with their expression levels in tumour and normal samples, and found that tumours consistently activate genes from unicellular ancestors while switching off genes related to multicellularity. These consistent patterns were supported by a mutual exclusivity between the activity of genes and transcriptional networks of unicellular and multicellular ancestors, which promoted convergent evolution towards a state of loss of multicellularity. Second, I investigated how somatic mutations disrupted gene regulatory networks. Genes that emerged together with early metazoans were enriched in point mutations and copy- number alterations, indicating that gene innovations that took place at the onset of multicellularity play a fundamental role in cancer development. Importantly, the uncoupling of regulatory networks of unicellular and multicellular ancestors was mostly due to point mutations in gene regulators linking these networks. On the other hand, copy-number aberrations were directly involved in the activation and inactivation of unicellular and multicellular genes, suggesting point mutations and copy-number aberrations play complementary roles in the loss of regulation between unicellular and multicellular transcriptional networks in cancer. Third, I focused on novel transcriptional associations formed during tumourigenesis using gene co-expression module analysis. Significant levels of rewiring between unicellular and multicellular genes were found across tumours. This rewiring was mostly driven by gene amplifications, which promoted the formation of tumour-specific modules composed of novel transcriptional associations between unicellular and multicellular genes, once more linking the genes and regulatory associations evolved at the onset of multicellularity to cancer development. The findings of this work reveal fundamental principles driving cancer development associated with genes and transcriptional networks evolved during the transition from unicellularity to multicellularity. I propose a model whereby activation of programs that date back to unicellular ancestors and the deactivation of multicellular programs is driven by an inherent mutual exclusivity of these genes together with the breakage of regulation between unicellular and multicellular genes by point mutations, whereas the formation of novel transcriptional associations between these genes in tumours is driven by copy-number changes. Finally, I identify potential novel drivers based on their key role in uncoupling unicellular and multicellular transcriptional networks across tumours and suggest novel treatment strategies derived from this evolutionary approach. The results presented in this thesis contribute to our understanding of how past evolutionary events led to vulnerabilities in transcriptional networks that influence cancer development, and highlight the benefits of the integration of evolutionary concepts with genomics and network biology to identify fundamental principles of cancer.
  • Item
    Thumbnail Image
    Investigating the evolution of structural variation in cancer
    Cmero, Marek ( 2017)
    Cancers arise from single progenitor cells that acquire mutations, eventually dividing into mixed populations with distinct genotypes. These populations can be estimated by identifying common mutational profiles, using computational techniques applied to sequencing data from tumour tissue samples. Existing methods have largely focused on single nucleotide variants (SNVs), despite growing evidence of the importance of structural variation (SV) as drivers in certain subtypes of cancer. While some approaches use copy-number aberrant SVs, no method has incorporated balanced rearrangements. To address this, I developed a Bayesian inference approach for estimating SV cancer cell fraction called SVclone. I validated SVclone using in silico mixtures of real samples in known proportions and found that clonal deconvolution using SV breakpoints can yield comparable results to SNV-based clustering. I then applied the method to 2,778 whole-genomes across 39 distinct tumour types, uncovering a subclonal copy-number neutral rearrangement phenotype with decreased overall survival. This clinically relevant finding could not have been found using existing methods. To further expand the methodology, and demonstrate its application to low data quality contexts, I developed a novel statistical approach to test for clonal differences in high-variance, formalin-fixed, paraffin-embedded (FFPE) samples. Together with variant curation strategies to minimise FFPE artefact, I applied the approach to longitudinal samples from a cohort of neo-adjuvant treated prostate cancer patients to investigate whether clonal differences can be inferred in highly noisy data. This thesis demonstrates that characterising the evolution of structural variation, particularly balanced rearrangements, results in clinically relevant insights. Identifying the patterns and dynamics of structural variation in the context of tumour evolution will ultimately help improve understanding of common pathways of tumour progression. Through this knowledge, cancers driven by SVs will have clearer prognoses and clinical treatment decisions will ultimately be improved, leading to better patient outcomes.
  • Item
    Thumbnail Image
    Effective integration of diverse biological datasets for better understanding of cancer
    Gaire, Raj Kumar ( 2012)
    Cancer is a disease of malfunctioning cells. Nowadays, experiments in cancer research have been producing a large number of datasets that contain measurements of various aspects of cancer. Similarly, datasets in cellular biology are becoming better organised and increasingly available. An effective integration of these datasets to understand the mechanisms of cancers is a challenging task. In this research, we develop novel integration methods and apply them to some diverse datasets of cancer. Our analysis finds that subtypes of cancers share common features that may be useful to direct cancer biologists to find better cure of cancers. As our first contribution, we developed MIRAGAA, a statistical approach to assess the coordinated changes of genome copy numbers and microRNA (miRNA) expression. Genetic diseases like cancer evolve through microevolution where random lesions that provide the biggest advantage to the diseases can stand out in their frequent occurrence in multiple samples. At the same time, a gene function can be changed by aberration of the corresponding gene or modification of expression levels of microRNA which attenuates the gene. In a large number of disease samples, these two mechanisms might be distributed in a coordinated and almost mutually exclusive manner. Understanding this coordination may assist in identifying changes which significantly produce the same functional impact on cancer phenotype, and further identify genes that are universally required for cancer. MIRAGAA has been evaluated on the cancer genome atlas (TCGA) Glioblastoma Multiforme datasets. In these datasets, a number of genome regions coordinating with different miRNAs are identified. Although well known for their biological significance, these genes and miRNAs would be left undetected for being not significant enough if the two datasets were analysed individually. Genes can show significant changes in their expression levels when genetically diseased cells are compared with non-diseased cells. Biological networks are often used to analyse the genetic expression profiles to identify active subnetworks (ASNs) in the diseases. Existing methodologies for discovering ASNs mostly use node centric approaches and undirected PPI networks. This can limit their ability to find the most meaningful ASNs. As our second contribution, we developed Bionet which aims to identify better ASNs by using (i) integrated regulatory networks, (ii) directions of regulations of genes, and (iii) combined node and edge scores. We simplify and extend previous methodologies to incorporate edge evaluations and lessen their sensitivity to significance thresholds. We formulate our objective functions using mixed integer linear programming (MIP) and show that optimal solutions may be obtained. As our third contribution, we integrated and analysed the disease datasets of glioma, glioblastoma and breast cancer with pathways and biological networks. Our analysis of two independent breast cancer datasets finds that the basal subtype of this cancer contains positive feedback loops across 7 genes, AR, ESR1, MYC, E2F2, PGR, BCL2 and CCND1 that could potentially explain the aggressive nature of this cancer subtype. A comparison of the basal subtype of breast cancer and the mesenchymal subtype of glioblastoma ASNs shows that an ASN in the vicinity of IL6 is conserved across the two subtypes. CD44 is found to be the most outcome predictor gene in both glioblastoma and breast cancer and may be used as biomarker. Our analysis suggests that cancer subtypes from two different cancers can show molecular similarities that are identifiable by using integrated biological networks.
  • Item
    Thumbnail Image
    Detecting gene-cancer associations by analysing gene-expression microarrays
    Shi, Fan ( 2011)
    Modern bioinformatics studies have shown that many physiological and behavioral characteristics in organisms are influenced by the information encoded in genes. Cancer genomics is a subject that specifically studies the genetic mechanisms of the formation and progression of cancer in order to improve the diagnosis and prognosis of cancer. An important task in cancer genomics is to detect gene-cancer associations, which is the biological focus of this thesis. DNA microarrays are a high throughput bioinformatics technique to assay the expression levels for thousands of genes simultaneously. By comparing the gene expression levels between different cancer types or subtypes, we may discover potential gene-cancer associations. However, the large scale of gene expression microarrays requires effective and efficient computational approaches for their analysis. Thus, from a computational perspective, we focus on developing computational approaches to detect gene-cancer associations based on gene expression microarrays in this thesis. We summarise three key problems and the corresponding traditional methods in the analysis of gene expression microarrays. First, statistical tests can be used to identify differentially expressed genes, which show significantly different expression patterns between different cancer types or subtypes. Second, unsupervised clustering methods can be used to identify co-expressed genes, which are groups of differentially expressed genes that show similar expression levels in the same cancer types. Third, classification methods can be used to predict the types or subtypes of cancer based on differentially expressed genes. However, several challenges, such as the high dimensionality in microarrays and the small sample sizes in cancer studies, may lead to low accuracy and efficiency problems for analysis based on these traditional methods. In this thesis, we have proposed three computational approaches to address the above problems. First, we have developed a meta-analysis method, called Incomplete Gene Meta-analysis (IGM), to identify differentially expressed genes by integrating multiple studies. The IGM method is able to integrate datasets from different microarray platforms and incorporate the genes that are not measured in all datasets, which we refer to as incomplete genes. In our evaluation, we verify the importance of including incomplete genes, and the experimental results demonstrate that IGM identifies more significant genes by imputing the statistical significance of incomplete genes than traditional methods. Second, we have proposed an unsupervised Bi-ordering Analysis (BOA) method to detect local patterns, where a subset of genes are co-expressed under a subset of samples, called biclusters, in microarrays. The BOA method uses an iterative process to identify consistently over or under-expressed gene groups in specific samples. This approach addresses several challenges for detecting biclusters, including the identification of biologically meaningful patterns, the efficiency of biclustering algorithms and the stability of biclusters. Our statistical assessments demonstrate both the statistical and biological significance of the biclusters produced by our method. Third, we have proposed a method for making multiple predictions with an associated confidence level to classify the cancers of unknown primary origin (CUP). This classification method is able to identify a set of the most likely cancer types and assign a confidence level to the predictions for CUP samples. Our method for making multiple predictions takes into account the biological similarity in different cancer types at a gene expression level, and is thus more suitable for classifying multi-class cancer samples than making a single class prediction. Our evaluation verifies the importance of making multiple predictions and validates our method.