School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 13
  • Item
    No Preview Available
    A field guide to cultivating computational biology
    Way, GP ; Greene, CS ; Carninci, P ; Carvalho, BS ; de Hoon, M ; Finley, S ; Gosline, SJC ; Le Cao, K-A ; Lee, JSH ; Marchionni, L ; Robine, N ; Sindi, SS ; Theis, FJ ; Yang, JYH ; Carpenter, AE ; Fertig, EJ (PUBLIC LIBRARY SCIENCE, 2021-10)
    Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.
  • Item
    No Preview Available
    An integrated metagenomics and metabolomics approach implicates the microbiota-gut-brain axis in the pathogenesis of Huntington's disease
    Kong, G ; Ellul, S ; Narayana, VK ; Kanojia, K ; Ha, HTT ; Li, S ; Renoir, T ; Kim-Anh, LC ; Hannan, AJ (ACADEMIC PRESS INC ELSEVIER SCIENCE, 2021-01)
    BACKGROUND: Huntington's disease (HD) is an autosomal dominant neurodegenerative disorder with onset and severity of symptoms influenced by various environmental factors. Recent discoveries have highlighted the importance of the gastrointestinal microbiome in mediating the gut-brain-axis bidirectional communication via circulating factors. Using shotgun sequencing, we investigated the gut microbiome composition in the R6/1 transgenic mouse model of HD from 4 to 12 weeks of age (early adolescent through to adult stages). Targeted metabolomics was also performed on the blood plasma of these mice (n = 9 per group) at 12 weeks of age to investigate potential effects of gut dysbiosis on the plasma metabolome profile. RESULTS: Modelled time profiles of each species, KEGG Orthologs and bacterial genes, revealed heightened volatility in the R6/1 mice, indicating potential early effects of the HD mutation in the gut. In addition to gut dysbiosis in R6/1 mice at 12 weeks of age, gut microbiome function was perturbed. In particular, the butanoate metabolism pathway was elevated, suggesting increased production of the protective SCFA, butyrate, in the gut. No significant alterations were found in the plasma butyrate and propionate levels in the R6/1 mice at 12 weeks of age. The statistical integration of the metagenomics and metabolomics unraveled several Bacteroides species that were negatively correlated with ATP and pipecolic acid in the plasma. CONCLUSIONS: The present study revealed the instability of the HD gut microbiome during the pre-motor symptomatic stage of the disease which may have dire consequences on the host's health. Perturbation of the HD gut microbiome function prior to significant cognitive and motor dysfunction suggest the potential role of the gut in modulating the pathogenesis of HD, potentially via specific altered plasma metabolites which mediate gut-brain signaling.
  • Item
    Thumbnail Image
    Community-wide hackathons to identify central themes in single-cell multi-omics (vol 22, 220, 2021)
    Cao, K-AL ; Abadi, AJ ; Davis-Marcisak, EF ; Hsu, L ; Arora, A ; Coullomb, A ; Deshpande, A ; Feng, Y ; Jeganathan, P ; Loth, M ; Meng, C ; Mu, W ; Pancaldi, V ; Sankaran, K ; Righelli, D ; Singh, A ; Sodicoff, JS ; Stein-O'Brien, GL ; Subramanian, A ; Welch, JD ; You, Y ; Argelaguet, R ; Carey, VJ ; Dries, R ; Greene, CS ; Holmes, S ; Love, MI ; Ritchie, ME ; Yuan, G-C ; Culhane, AC ; Fertig, E (BMC, 2021-08-25)
  • Item
    Thumbnail Image
    multiomics: A user-friendly multi-omics data harmonisation R pipeline
    Chen, T ; Abadi, A ; Lê Cao, K-A ; Tyagi, S (F1000 Research Ltd, 2021)
    Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is a growing field of research as it has strong potential to unlock information on previously hidden biological relationships leading to early diagnosis, prognosis and expedited treatments. Many tools for multi-omics data integration are being developed. However, these tools are often restricted to highly specific experimental designs, and types of omics data. While some general methods do exist, they require specific data formats and experimental conditions. A major limitation in the field is a lack of a single or multi-omics pipeline which can accept data in an unrefined, information-rich form pre-integration and subsequently generate output for further investigation. There is an increasing demand for a generic multi-omics pipeline to facilitate general-purpose data exploration and analysis of heterogeneous data. Therefore, we present our R multiomics pipeline as an easy to use and flexible pipeline that takes unrefined multi-omics data as input, sample information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated application of the pipeline on two separate COVID-19 case studies. We enabled limited checkpointing where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. A seamless integration with the mixOmics R package is achieved, as the R data object can be loaded and manipulated with mixOmics functions. Our pipeline can be installed as an R package or from the git repository, and is accompanied by detailed documentation with walkthroughs on two case studies. The pipeline is also available as Docker and Singularity containers.
  • Item
    Thumbnail Image
    An integrated analysis of human myeloid cells identifies gaps in in vitro models of in vivo biology
    Rajab, N ; Angel, PW ; Deng, Y ; Gu, J ; Jameson, V ; Kurowska-Stolarska, M ; Milling, S ; Pacheco, CM ; Rutar, M ; Laslett, AL ; Cao, K-AL ; Choi, J ; Wells, CA (CELL PRESS, 2021-06-08)
    The Stemformatics myeloid atlas is an integrated transcriptome atlas of human macrophages and dendritic cells that systematically compares freshly isolated tissue-resident, cultured, and pluripotent stem cell-derived myeloid cells. Three classes of tissue-resident macrophage were identified: Kupffer cells and microglia; monocyte-associated; and tumor-associated macrophages. Culture had a major impact on all primary cell phenotypes. Pluripotent stem cell-derived macrophages were characterized by atypical expression of collagen and a highly efferocytotic phenotype. Myeloid subsets, and phenotypes associated with derivation, were reproducible across experimental series including data projected from single-cell studies, demonstrating that the atlas provides a robust reference for myeloid phenotypes. Implementation in Stemformatics.org allows users to visualize patterns of sample grouping or gene expression for user-selected conditions and supports temporary upload of your own microarray or RNA sequencing samples, including single-cell data, to benchmark against the atlas.
  • Item
    No Preview Available
    A multi-modal data harmonisation approach for discovery of COVID-19 drug targets
    Chen, T ; Philip, M ; Cao, K-AL ; Tyagi, S (OXFORD UNIV PRESS, 2021-11)
    Despite the volume of experiments performed and data available, the complex biology of coronavirus SARS-COV-2 is not yet fully understood. Existing molecular profiling studies have focused on analysing functional omics data of a single type, which captures changes in a small subset of the molecular perturbations caused by the virus. As the logical next step, results from multiple such omics analysis may be aggregated to comprehensively interpret the molecular mechanisms of SARS-CoV-2. An alternative approach is to integrate data simultaneously in a parallel fashion to highlight the inter-relationships of disease-driving biomolecules, in contrast to comparing processed information from each omics level separately. We demonstrate that valuable information may be masked by using the former fragmented views in analysis, and biomarkers resulting from such an approach cannot provide a systematic understanding of the disease aetiology. Hence, we present a generic, reproducible and flexible open-access data harmonisation framework that can be scaled out to future multi-omics analysis to study a phenotype in a holistic manner. The pipeline source code, detailed documentation and automated version as a R package are accessible. To demonstrate the effectiveness of our pipeline, we applied it to a drug screening task. We integrated multi-omics data to find the lowest level of statistical associations between data features in two case studies. Strongly correlated features within each of these two datasets were used for drug-target analysis, resulting in a list of 84 drug-target candidates. Further computational docking and toxicity analyses revealed seven high-confidence targets, amsacrine, bosutinib, ceritinib, crizotinib, nintedanib and sunitinib as potential starting points for drug therapy and development.
  • Item
    Thumbnail Image
    Variable selection in microbiome compositional data analysis
    Susin, A ; Wang, Y ; Cao, K-AL ; Calle, ML (Oxford University Press, 2020-06-01)
    Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.
  • Item
    Thumbnail Image
    Model-based joint visualization of multiple compositional omics datasets
    Hawinkel, S ; Bijnens, L ; Cao, K-AL ; Thas, O (Oxford University Press, 2020-09-01)
    The integration of multiple omics datasets measured on the same samples is a challenging task: data come from heterogeneous sources and vary in signal quality. In addition, some omics data are inherently compositional, e.g. sequence count data. Most integrative methods are limited in their ability to handle covariates, missing values, compositional structure and heteroscedasticity. In this article we introduce a flexible model-based approach to data integration to address these current limitations: COMBI. We combine concepts, such as compositional biplots and log-ratio link functions with latent variable models, and propose an attractive visualization through multiplots to improve interpretation. Using real data examples and simulations, we illustrate and compare our method with other data integration techniques. Our algorithm is available in the R-package combi.
  • Item
    Thumbnail Image
    Altered Repertoire Diversity and Disease-Associated Clonal Expansions Revealed by T Cell Receptor Immunosequencing in Ankylosing Spondylitis Patients
    Hanson, AL ; Nel, HJ ; Bradbury, L ; Phipps, J ; Thomas, R ; Cao, K-AL ; Kenna, TJ ; Brown, MA (WILEY, 2020-08)
    Objective Ankylosing spondylitis (AS) is a common spondyloarthropathy primarily affecting the axial skeleton and strongly associated with HLA–B*27 carriage. Genetic evidence implicates both autoinflammatory processes and autoimmunity against an HLA–B*27–restricted autoantigen in immunopathology. In addition to articular symptoms, up to 70% of AS patients present with concurrent bowel inflammation, suggesting that adverse interactions between a genetically primed host immune system and the gut microbiome contribute to the disease. Accordingly, this study aimed to characterize adaptive immune responses to antigenic stimuli in AS. Methods The peripheral CD4 and CD8 T cell receptor (TCR) repertoire was profiled in AS patients (n = 47) and HLA–B*27–matched healthy controls (n = 38). Repertoire diversity was estimated using the Normalized Shannon Diversity Entropy (NSDE) index, and univariate and multivariate statistical analyses were performed to characterize AS‐associated clonal signatures. Furthermore, T cell proliferation and cytokine production in response to immunogenic antigen exposure were investigated in vitro in peripheral blood mononuclear cells from AS patients (n = 19) and HLA–B*27–matched healthy controls (n = 14). Results Based on the NSDE measure of sample diversity across CD4 and CD8 T cell repertoires, AS patients showed increased TCR diversity compared to healthy controls (for CD4 T cells, P = 7.8 × 10−6; for CD8 T cells, P = 9.3 × 10−4), which was attributed to a significant reduction in the magnitude of peripheral T cell expansions globally. Upon in vitro stimulation, fewer T cells from AS patients than from healthy controls expressed interferon‐γ (for CD8 T cells, P = 0.03) and tumor necrosis factor (for CD4 T cells, P = 0.01; for CD8 T cells, P = 0.002). In addition, the CD8 TCR signature was altered in HLA–B*27+ AS patients compared to healthy controls, with significantly expanded Epstein‐Barr virus–specific clonotypes (P = 0.03) and cytomegalovirus‐specific clonotypes (P = 0.02). HLA–B*27+ AS patients also showed an increased incidence of “public” CD8 TCRs, representing identical clonotypes emerging in response to common antigen encounters, including homologous clonotypes matching those previously isolated from individuals with bacterial‐induced reactive arthritis. Conclusion The dynamics of peripheral T cell responses in AS patients are altered, suggesting that differential antigen exposure and disrupted adaptive immunity are underlying features of the disease.
  • Item
    Thumbnail Image
    Multi-Omic Data Integration Allows Baseline Immune Signatures to Predict Hepatitis B Vaccine Response in a Small Cohort
    Shannon, CP ; Blimkie, TM ; Ben-Othman, R ; Gladish, N ; Amenyogbe, N ; Drissler, S ; Edgar, RD ; Chan, Q ; Krajden, M ; Foster, LJ ; Kobor, MS ; Mohn, WW ; Brinkman, RR ; Le Cao, K-A ; Scheuermann, RH ; Tebbutt, SJ ; Hancock, RE ; Koff, WC ; Kollmann, TR ; Sadarangani, M ; Lee, AH-Y (FRONTIERS MEDIA SA, 2020-11-30)
    BACKGROUND: Vaccination remains one of the most effective means of reducing the burden of infectious diseases globally. Improving our understanding of the molecular basis for effective vaccine response is of paramount importance if we are to ensure the success of future vaccine development efforts. METHODS: We applied cutting edge multi-omics approaches to extensively characterize temporal molecular responses following vaccination with hepatitis B virus (HBV) vaccine. Data were integrated across cellular, epigenomic, transcriptomic, proteomic, and fecal microbiome profiles, and correlated to final HBV antibody titres. RESULTS: Using both an unsupervised molecular-interaction network integration method (NetworkAnalyst) and a data-driven integration approach (DIABLO), we uncovered baseline molecular patterns and pathways associated with more effective vaccine responses to HBV. Biological associations were unravelled, with signalling pathways such as JAK-STAT and interleukin signalling, Toll-like receptor cascades, interferon signalling, and Th17 cell differentiation emerging as important pre-vaccination modulators of response. CONCLUSION: This study provides further evidence that baseline cellular and molecular characteristics of an individual's immune system influence vaccine responses, and highlights the utility of integrating information across many parallel molecular datasets.