Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Network Architecture for Prediction of Emergence in Complex Biological Systems
    Ghosh Roy, Gourab ( 2022)
    Emergence of properties at the system level, where these properties are not observed at the individual entity level, is an important feature of complex systems. Biological system emergent properties have critical roles in the functioning of organisms and the disruptions to normal functioning, and are relevant to the treatment of diseases like cancer. Complex biological systems can be modeled by abstractions in the form of molecular networks like gene regulatory networks (GRNs) and signaling networks with nodes representing molecules like genes and edges representing molecular interactions. The thesis aims at exploring the use of the architecture of these networks to predict emergence of system properties. First, to better infer the network architecture with aspects that can be useful in predicting emergence, we propose a novel algorithm Polynomial Lasso Bagging or PoLoBag for signed GRN inference from gene expression data. The GRN edge signs represent the nature of the regulatory relationships, activating or inhibitory. Our algorithm gives more accurate signed inference compared to state-of-the-art algorithms, and overcomes their weaknesses by also inferring edge directions and cycles. We also show how combining signed GRN architecture with dynamical information in our proposed dynamical K-core method predicts emergent states of the gene regulatory system effectively. Second, we investigate the existence of the bow-tie architectural organization in the GRNs of species of widely varying complexity. Prior work has shown the existence of this bow-tie feature in the GRNs of only some eukaryotes. Our investigation covers GRNs of prokaryotes to unicellular and multicellular eukaryotes. We find that the observed bow-tie architecture is a characteristic feature of GRNs. Based on differences that we observe in the bow-tie architectures across species, we predict a trend in the emergence of the dynamical gene regulatory system property of controllability with varying species complexity. Third, from input genotype data we predict an emergent phenotype at the organism level – the cancer-specific survival risk. We propose a novel Mutated Pathway Visible Neural Network or MPVNN, designed using prior knowledge of signaling network architecture and additional mutation data-based edge randomization. This randomization models how known signaling network architecture changes for a particular cancer type, which is not modeled by state-of-the-art visible neural networks. We suggest that MPVNN performs cancer-specific risk prediction better than other similar sized NN and non-NN survival analysis methods, while also providing reliable interpretations of the predictions. These three research contributions taken together make significant advances towards our goal of using molecular network architecture for better prediction of emergence, which can inform treatment decisions and lead to novel therapeutic approaches and is of value to computational biologists and clinicians.
  • Item
    Thumbnail Image
    Understanding role of provenance in bioinformatics workflows and enabling interoperable computational analysis sharing
    Khan, Farah Zaib ( 2018)
    The automation of computational analyses in data-intensive domains such as genomics through scientific workflows is a widely adopted practice in many fields of research nowadays. Computationally driven data-intensive experiments using workflows enable Automation, Scaling, Adaption and Provenance support (ASAP). Provenance data collection is an essential factor for any computational workflow-centric research to achieve reproducibility, transparency and support trust in the published results. At present capture of provenance information across the plethora of workflow management systems and custom software platforms in the bioinformatics domain is not well supported and as such, there exist numerous challenges associated with the effective sharing, publication, understandability, reproducibility and repeatability of scientific workflows. This thesis focuses on providing a unified, interoperable and systematised view of provenance with specific focus on workflow environments in the bioinformatics domain. We identify and overcome the current disconnect between various workflows systems and their existing provenance representations. Through empirical analysis of complex genomic data analysis workflows using three exemplar workflow systems, we identify implicit assumptions that arise. These assumptions produce an incomplete view of provenance resulting in insufficient details that impact on workflow enactment requirements and ultimately on the reproducibility of the given analysis. We propose a set of recommendations to mitigate against such assumptions and enable workflow systems to document and capture complete provenance information that can subsequently be used for re-enacting workflows in other contexts and potentially using other workflow platforms. Based on this empirical case study and pragmatic analysis of related literature, we define a hierarchical provenance framework offering `Levels of Provenance and Resource Sharing''. Each level of this framework addresses specific provenance recommendations and supports the capture of rich provenance information, with the topmost layer enabling the sharing of comprehensive and executable workflows utilising retrospective provenance. To realise this framework, we leverage community-driven, domain-neutral, platform-independent and open-source standards to implement ``CWLProv'' - a format for the methodical representation of provenance supporting workflow enactment aggregating resources specific to the given enactment and associated workflow configuration settings. We realise CWLProv through the Common Workflow Language (CWL) for workflow definition and utilise Research Objects (ROs) for resource aggregation and PROV-Data Model (PROV-DM) to support the capture of retrospective provenance information as required for subsequent workflow enactments. To demonstrate the applicability of CWLProv, we extend an existing workflow executor (cwltool) to provide a reference implementation that generates metadata and provenance-rich interoperable workflow-centric ROs. This approach aggregates and preserves data and methods needed to support the coherent sharing of computational analyses and experiments. Evaluation of CWLProv using real-life bioinformatics pipelines is demonstrated to highlight the utility of the approach demonstrating the interoperability of workflow analyses and the benefits to research reproducibility more generally.