School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 5 of 5
  • Item
    Thumbnail Image
    Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, R ; Foroutan, M ; Gagnon-Bartsch, JA ; Gandolfo, LC ; Jain, A ; Sinha, A ; Olshansky, G ; Dobrovic, A ; Papenfuss, AT ; Speed, TP (NATURE PORTFOLIO, 2023-01)
    Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
  • Item
    Thumbnail Image
    Evolution and comparative analysis of the MHC Class III inflammatory region
    Deakin, JE ; Papenfuss, AT ; Belov, K ; Cross, JGR ; Coggill, P ; Palmer, S ; Sims, S ; Speed, TP ; Beck, S ; Graves, JAM (BMC, 2006-11-02)
    BACKGROUND: The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution. RESULTS: Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised. CONCLUSION: Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BMC, 2017-08-07)
    Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data (vol 18, 150, 2017)
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BIOMED CENTRAL LTD, 2018-02-07)
    After publication of our article [1] it was brought to our attention that a line of code was missing from our program to combine the within-replicate variance and between-replicate variance. This led to an overestimation of the standard errors calculated using the Enrich2 random-effects model.
  • Item
    Thumbnail Image
    Analysis of the platypus genome suggests a transposon origin for mammalian imprinting
    Pask, AJ ; Papenfuss, AT ; Ager, EI ; Mccoll, KA ; Speed, TP ; Renfree, MB (BIOMED CENTRAL LTD, 2009)
    BACKGROUND: Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. Many hypotheses have been advanced to explain why genomic imprinting evolved in mammals, but few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large scale genomic resources between all extant classes. The recent release of the platypus genome has provided the first opportunity to perform comparisons between prototherian (monotreme; which appear to lack imprinting) and therian (marsupial and eutherian; which have imprinting) mammals. RESULTS: We compared the distribution of repeat elements known to attract epigenetic silencing across the entire genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. There is a significant accumulation of certain repeat elements within imprinted regions of therian mammals compared to the platypus. CONCLUSIONS: Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long terminal repeats and DNA elements, in therian imprinted genes and gene clusters is coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. These data provide strong support for the host defence hypothesis.