School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    multiomics: A user-friendly multi-omics data harmonisation R pipeline
    Chen, T ; Abadi, A ; Lê Cao, K-A ; Tyagi, S (F1000 Research Ltd, 2021)
    Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is a growing field of research as it has strong potential to unlock information on previously hidden biological relationships leading to early diagnosis, prognosis and expedited treatments. Many tools for multi-omics data integration are being developed. However, these tools are often restricted to highly specific experimental designs, and types of omics data. While some general methods do exist, they require specific data formats and experimental conditions. A major limitation in the field is a lack of a single or multi-omics pipeline which can accept data in an unrefined, information-rich form pre-integration and subsequently generate output for further investigation. There is an increasing demand for a generic multi-omics pipeline to facilitate general-purpose data exploration and analysis of heterogeneous data. Therefore, we present our R multiomics pipeline as an easy to use and flexible pipeline that takes unrefined multi-omics data as input, sample information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated application of the pipeline on two separate COVID-19 case studies. We enabled limited checkpointing where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. A seamless integration with the mixOmics R package is achieved, as the R data object can be loaded and manipulated with mixOmics functions. Our pipeline can be installed as an R package or from the git repository, and is accompanied by detailed documentation with walkthroughs on two case studies. The pipeline is also available as Docker and Singularity containers.
  • Item
    No Preview Available
    A multi-modal data harmonisation approach for discovery of COVID-19 drug targets
    Chen, T ; Philip, M ; Cao, K-AL ; Tyagi, S (OXFORD UNIV PRESS, 2021-11)
    Despite the volume of experiments performed and data available, the complex biology of coronavirus SARS-COV-2 is not yet fully understood. Existing molecular profiling studies have focused on analysing functional omics data of a single type, which captures changes in a small subset of the molecular perturbations caused by the virus. As the logical next step, results from multiple such omics analysis may be aggregated to comprehensively interpret the molecular mechanisms of SARS-CoV-2. An alternative approach is to integrate data simultaneously in a parallel fashion to highlight the inter-relationships of disease-driving biomolecules, in contrast to comparing processed information from each omics level separately. We demonstrate that valuable information may be masked by using the former fragmented views in analysis, and biomarkers resulting from such an approach cannot provide a systematic understanding of the disease aetiology. Hence, we present a generic, reproducible and flexible open-access data harmonisation framework that can be scaled out to future multi-omics analysis to study a phenotype in a holistic manner. The pipeline source code, detailed documentation and automated version as a R package are accessible. To demonstrate the effectiveness of our pipeline, we applied it to a drug screening task. We integrated multi-omics data to find the lowest level of statistical associations between data features in two case studies. Strongly correlated features within each of these two datasets were used for drug-target analysis, resulting in a list of 84 drug-target candidates. Further computational docking and toxicity analyses revealed seven high-confidence targets, amsacrine, bosutinib, ceritinib, crizotinib, nintedanib and sunitinib as potential starting points for drug therapy and development.
  • Item
    Thumbnail Image
    A simple, scalable approach to building a cross-platform transcriptome atlas
    Angel, PW ; Rajab, N ; Deng, Y ; Pacheco, CM ; Chen, T ; Le Cao, K-A ; Choi, J ; Wells, CA ; Fertig, EJ (PUBLIC LIBRARY SCIENCE, 2020-09)
    Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an example, we combine 850 samples from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows.