Anatomy and Neuroscience - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.
    Grapotte, M ; Saraswat, M ; Bessière, C ; Menichelli, C ; Ramilowski, JA ; Severin, J ; Hayashizaki, Y ; Itoh, M ; Tagami, M ; Murata, M ; Kojima-Ishiyama, M ; Noma, S ; Noguchi, S ; Kasukawa, T ; Hasegawa, A ; Suzuki, H ; Nishiyori-Sueki, H ; Frith, MC ; FANTOM consortium, ; Chatelain, C ; Carninci, P ; de Hoon, MJL ; Wasserman, WW ; Bréhélin, L ; Lecellier, C-H (Springer Science and Business Media LLC, 2021-06-02)
    Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
  • Item
    Thumbnail Image
    A molecular classification of human mesenchymal stromal cells
    Rohart, F ; Mason, EA ; Matigian, N ; Mosbergen, R ; Korn, O ; Chen, T ; Butcher, S ; Patel, J ; Atkinson, K ; Khosrotehrani, K ; Fisk, NM ; Le Cao, K-A ; Wells, CA (PEERJ INC, 2016-03-24)
    Mesenchymal stromal cells (MSC) are widely used for the study of mesenchymal tissue repair, and increasingly adopted for cell therapy, despite the lack of consensus on the identity of these cells. In part this is due to the lack of specificity of MSC markers. Distinguishing MSC from other stromal cells such as fibroblasts is particularly difficult using standard analysis of surface proteins, and there is an urgent need for improved classification approaches. Transcriptome profiling is commonly used to describe and compare different cell types; however, efforts to identify specific markers of rare cellular subsets may be confounded by the small sample sizes of most studies. Consequently, it is difficult to derive reproducible, and therefore useful markers. We addressed the question of MSC classification with a large integrative analysis of many public MSC datasets. We derived a sparse classifier (The Rohart MSC test) that accurately distinguished MSC from non-MSC samples with >97% accuracy on an internal training set of 635 samples from 41 studies derived on 10 different microarray platforms. The classifier was validated on an external test set of 1,291 samples from 65 studies derived on 15 different platforms, with >95% accuracy. The genes that contribute to the MSC classifier formed a protein-interaction network that included known MSC markers. Further evidence of the relevance of this new MSC panel came from the high number of Mendelian disorders associated with mutations in more than 65% of the network. These result in mesenchymal defects, particularly impacting on skeletal growth and function. The Rohart MSC test is a simple in silico test that accurately discriminates MSC from fibroblasts, other adult stem/progenitor cell types or differentiated stromal cells. It has been implemented in the www.stemformatics.org resource, to assist researchers wishing to benchmark their own MSC datasets or data from the public domain. The code is available from the CRAN repository and all data used to generate the MSC test is available to download via the Gene Expression Omnibus or the Stemformatics resource.