School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 7 of 7
  • Item
    Thumbnail Image
    Visualising associations between paired 'omics' data sets
    Gonzalez, I ; Le Cao, K-A ; Davis, MJ ; Dejean, S (BMC, 2012-11-13)
    BACKGROUND: Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities. RESULTS: The multivariate statistical approaches 'regularized Canonical Correlation Analysis' and 'sparse Partial Least Squares regression' were recently developed to integrate two types of highly dimensional 'omics' data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two 'omics' data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis. CONCLUSIONS: Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole. AVAILABILITY: The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.
  • Item
    Thumbnail Image
    Integrative mixture of experts to combine clinical factors and gene markers
    Le Cao, K-A ; Meugnier, E ; McLachlan, GJ (OXFORD UNIV PRESS, 2010-05-01)
    MOTIVATION: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature. RESULTS: We show on three cancer studies that prediction accuracy can be improved when combining both types of variables. Furthermore, the selected genes were found to be of high relevance and can be considered as potential biomarkers for the prognostic selection of cancer therapy. AVAILABILITY: Integrative ME is implemented in the R package integrativeME (http://cran.r-project.org/).
  • Item
    Thumbnail Image
    A novel approach for biomarker selection and the integration of repeated measures experiments from two assays
    Liquet, B ; Le Cao, K-A ; Hocini, H ; Thiebaut, R (BMC, 2012-12-06)
    BACKGROUND: High throughput 'omics' experiments are usually designed to compare changes observed between different conditions (or interventions) and to identify biomarkers capable of characterizing each condition. We consider the complex structure of repeated measurements from different assays where different conditions are applied on the same subjects. RESULTS: We propose a two-step analysis combining a multilevel approach and a multivariate approach to reveal separately the effects of conditions within subjects from the biological variation between subjects. The approach is extended to two-factor designs and to the integration of two matched data sets. It allows internal variable selection to highlight genes able to discriminate the net condition effect within subjects. A simulation study was performed to demonstrate the good performance of the multilevel multivariate approach compared to a classical multivariate method. The multilevel multivariate approach outperformed the classical multivariate approach with respect to the classification error rate and the selection of relevant genes. The approach was applied to an HIV-vaccine trial evaluating the response with gene expression and cytokine secretion. The discriminant multilevel analysis selected a relevant subset of genes while the integrative multilevel analysis highlighted clusters of genes and cytokines that were highly correlated across the samples. CONCLUSIONS: Our combined multilevel multivariate approach may help in finding signatures of vaccine effect and allows for a better understanding of immunological mechanisms activated by the intervention. The integrative analysis revealed clusters of genes, that were associated with cytokine secretion. These clusters can be seen as gene signatures to predict future cytokine response. The approach is implemented in the R package mixOmics (http://cran.r-project.org/) with associated tutorials to perform the analysis(a).
  • Item
    Thumbnail Image
    Uncoupled Embryonic and Extra-Embryonic Tissues Compromise Blastocyst Development after Somatic Cell Nuclear Transfer
    Degrelle, SA ; Jaffrezic, F ; Campion, E ; Le Cao, K-A ; Le Bourhis, D ; Richard, C ; Rodde, N ; Fleurot, R ; Everts, RE ; Lecardonnel, J ; Heyman, Y ; Vignon, X ; Yang, X ; Tian, XC ; Lewin, HA ; Renard, J-P ; Hue, I ; Akagi, T (PUBLIC LIBRARY SCIENCE, 2012-06-06)
    Somatic cell nuclear transfer (SCNT) is the most efficient cell reprogramming technique available, especially when working with bovine species. Although SCNT blastocysts performed equally well or better than controls in the weeks following embryo transfer at Day 7, elongation and gastrulation defects were observed prior to implantation. To understand the developmental implications of embryonic/extra-embryonic interactions, the morphological and molecular features of elongating and gastrulating tissues were analysed. At Day 18, 30 SCNT conceptuses were compared to 20 controls (AI and IVP: 10 conceptuses each); one-half of the SCNT conceptuses appeared normal while the other half showed signs of atypical elongation and gastrulation. SCNT was also associated with a high incidence of discordance in embryonic and extra-embryonic patterns, as evidenced by morphological and molecular "uncoupling". Elongation appeared to be secondarily affected; only 3 of 30 conceptuses had abnormally elongated shapes and there were very few differences in gene expression when they were compared to the controls. However, some of these differences could be linked to defects in microvilli formation or extracellular matrix composition and could thus impact extra-embryonic functions. In contrast to elongation, gastrulation stages included embryonic defects that likely affected the hypoblast, the epiblast, or the early stages of their differentiation. When taking into account SCNT conceptus somatic origin, i.e. the reprogramming efficiency of each bovine ear fibroblast (Low: 0029, Med: 7711, High: 5538), we found that embryonic abnormalities or severe embryonic/extra-embryonic uncoupling were more tightly correlated to embryo loss at implantation than were elongation defects. Alternatively, extra-embryonic differences between SCNT and control conceptuses at Day 18 were related to molecular plasticity (high efficiency/high plasticity) and subsequent pregnancy loss. Finally, because it alters re-differentiation processes in vivo, SCNT reprogramming highlights temporally and spatially restricted interactions among cells and tissues in a unique way.
  • Item
    Thumbnail Image
    Determinants of Body Fat in Infants of Women With Gestational Diabetes Mellitus Differ With Fetal Sex
    Lingwood, BE ; Henry, AM ; d'Emden, MC ; Fullerton, A-M ; Mortimer, RH ; Colditz, PB ; Le Cao, K-A ; Callaway, LK (AMER DIABETES ASSOC, 2011-12)
    OBJECTIVE: Neonatal adiposity is a well-recognized complication of gestational diabetes mellitus (GDM). This study aimed to identify factors influencing adiposity in male and female infants of women treated for GDM. RESEARCH DESIGN AND METHODS: This was a prospective study of 84 women with GDM. Daily blood glucose levels (BGLs) were retrieved from glucose meters, and overall mean fasting and mean 2-h postprandial BGLs were calculated for each woman. Infant body composition was measured at birth, and regression analysis was used to identify significant predictors of infant body fat separately in male and female infants. RESULTS: Maternal fasting BGL was the major predictor of adiposity in male infants but had little relationship to adiposity in female infants. In male infants, percent fat was increased by 0.44% for each 0.1 mmol/L increase in mean maternal fasting BGL. Maternal BMI was the primary predictor in female infants but had little effect in males. In female infants, percent fat was increased by 0.11% for each 1 kg/m(2) increase in maternal prepregnancy BMI. CONCLUSIONS: Fetal sex may influence the impact that treatment strategies for GDM have on infant adiposity.
  • Item
    Thumbnail Image
    Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets
    Yao, F ; Coquery, J ; Le Cao, K-A (BMC, 2012-02-03)
    BACKGROUND: A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data. RESULTS: We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA). CONCLUSIONS: On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment.IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface.
  • Item
    Thumbnail Image
    Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems
    Cao, K-AL ; Boitard, S ; Besse, P (BMC, 2011-06-22)
    BACKGROUND: Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. RESULTS: A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. CONCLUSIONS: sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.