School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 10
  • Item
    Thumbnail Image
    Removing unwanted variation from large-scale RNA sequencing data with PRPS
    Molania, R ; Foroutan, M ; Gagnon-Bartsch, JA ; Gandolfo, LC ; Jain, A ; Sinha, A ; Olshansky, G ; Dobrovic, A ; Papenfuss, AT ; Speed, TP (NATURE PORTFOLIO, 2023-01)
    Accurate identification and effective removal of unwanted variation is essential to derive meaningful biological results from RNA sequencing (RNA-seq) data, especially when the data come from large and complex studies. Using RNA-seq data from The Cancer Genome Atlas (TCGA), we examined several sources of unwanted variation and demonstrate here how these can significantly compromise various downstream analyses, including cancer subtype identification, association between gene expression and survival outcomes and gene co-expression analysis. We propose a strategy, called pseudo-replicates of pseudo-samples (PRPS), for deploying our recently developed normalization method, called removing unwanted variation III (RUV-III), to remove the variation caused by library size, tumor purity and batch effects in TCGA RNA-seq data. We illustrate the value of our approach by comparing it to the standard TCGA normalizations on several TCGA RNA-seq datasets. RUV-III with PRPS can be used to integrate and normalize other large transcriptomic datasets coming from multiple laboratories or platforms.
  • Item
    No Preview Available
    Targeting histone acetylation dynamics and oncogenic transcription by catalytic P300/CBP inhibition
    Hogg, SJ ; Motorna, O ; Cluse, LA ; Johanson, TM ; Coughlan, HD ; Raviram, R ; Myers, RM ; Costacurta, M ; Todorovski, I ; Pijpers, L ; Bjelosevic, S ; Williams, T ; Huskins, SN ; Kearney, CJ ; Devlin, JR ; Fan, Z ; Jabbari, JS ; Martin, BP ; Fareh, M ; Kelly, MJ ; Dupere-Richer, D ; Sandow, JJ ; Feran, B ; Knight, D ; Khong, T ; Spencer, A ; Harrison, SJ ; Gregory, G ; Wickramasinghe, VO ; Webb, A ; Taberlay, PC ; Bromberg, KD ; Lai, A ; Papenfuss, AT ; Smyth, GK ; Allan, RS ; Licht, JD ; Landau, DA ; Abdel-Wahab, O ; Shortt, J ; Vervoort, SJ ; Johnstone, RW (CELL PRESS, 2021-05-20)
    To separate causal effects of histone acetylation on chromatin accessibility and transcriptional output, we used integrated epigenomic and transcriptomic analyses following acute inhibition of major cellular lysine acetyltransferases P300 and CBP in hematological malignancies. We found that catalytic P300/CBP inhibition dynamically perturbs steady-state acetylation kinetics and suppresses oncogenic transcriptional networks in the absence of changes to chromatin accessibility. CRISPR-Cas9 screening identified NCOR1 and HDAC3 transcriptional co-repressors as the principal antagonists of P300/CBP by counteracting acetylation turnover kinetics. Finally, deacetylation of H3K27 provides nucleation sites for reciprocal methylation switching, a feature that can be exploited therapeutically by concomitant KDM6A and P300/CBP inhibition. Overall, this study indicates that the steady-state histone acetylation-methylation equilibrium functions as a molecular rheostat governing cellular transcription that is amenable to therapeutic exploitation as an anti-cancer regimen.
  • Item
    Thumbnail Image
    Evolutionary analyses of the major variant surface antigen-encoding genes reveal population structure of Plasmodium falciparum within and between continents
    Tonkin-Hill, G ; Ruybal-Pesantez, S ; Tiedje, KE ; Rougeron, V ; Duffy, MF ; Zakeri, S ; Pumpaibool, T ; Harnyuttanakorn, P ; Branch, OH ; Ruiz-Mesia, L ; Rask, TS ; Prugnolle, F ; Papenfuss, AT ; Chan, Y-B ; Day, KP ; Buchrieser, C (PUBLIC LIBRARY SCIENCE, 2021-02)
    Malaria remains a major public health problem in many countries. Unlike influenza and HIV, where diversity in immunodominant surface antigens is understood geographically to inform disease surveillance, relatively little is known about the global population structure of PfEMP1, the major variant surface antigen of the malaria parasite Plasmodium falciparum. The complexity of the var multigene family that encodes PfEMP1 and that diversifies by recombination, has so far precluded its use in malaria surveillance. Recent studies have demonstrated that cost-effective deep sequencing of the region of var genes encoding the PfEMP1 DBLα domain and subsequent classification of within host sequences at 96% identity to define unique DBLα types, can reveal structure and strain dynamics within countries. However, to date there has not been a comprehensive comparison of these DBLα types between countries. By leveraging a bioinformatic approach (jumping hidden Markov model) designed specifically for the analysis of recombination within var genes and applying it to a dataset of DBLα types from 10 countries, we are able to describe population structure of DBLα types at the global scale. The sensitivity of the approach allows for the comparison of the global dataset to ape samples of Plasmodium Laverania species. Our analyses show that the evolution of the parasite population emerging out of Africa underlies current patterns of DBLα type diversity. Most importantly, we can distinguish geographic population structure within Africa between Gabon and Ghana in West Africa and Uganda in East Africa. Our evolutionary findings have translational implications in the context of globalization. Firstly, DBLα type diversity can provide a simple diagnostic framework for geographic surveillance of the rapidly evolving transmission dynamics of P. falciparum. It can also inform efforts to understand the presence or absence of global, regional and local population immunity to major surface antigen variants. Additionally, we identify a number of highly conserved DBLα types that are present globally that may be of biological significance and warrant further characterization.
  • Item
    No Preview Available
    Complementarity and redundancy of IL-22-producing innate lymphoid cells
    Rankin, LC ; Girard-Madoux, MJH ; Seillet, C ; Mielke, LA ; Kerdiles, Y ; Fenis, A ; Wieduwild, E ; Putoczki, T ; Mondot, S ; Lantz, O ; Demon, D ; Papenfuss, AT ; Smyth, GK ; Lamkanfi, M ; Carotta, S ; Renauld, J-C ; Shi, W ; Carpentier, S ; Soos, T ; Arendt, C ; Ugolini, S ; Huntington, ND ; Bez, GT ; Vivier, E (NATURE PUBLISHING GROUP, 2016-02)
    Intestinal T cells and group 3 innate lymphoid cells (ILC3 cells) control the composition of the microbiota and gut immune responses. Within the gut, ILC3 subsets coexist that either express or lack the natural cytoxicity receptor (NCR) NKp46. We identified here the transcriptional signature associated with the transcription factor T-bet-dependent differentiation of NCR(-) ILC3 cells into NCR(+) ILC3 cells. Contrary to the prevailing view, we found by conditional deletion of the key ILC3 genes Stat3, Il22, Tbx21 and Mcl1 that NCR(+) ILC3 cells were redundant for the control of mouse colonic infection with Citrobacter rodentium in the presence of T cells. However, NCR(+) ILC3 cells were essential for cecal homeostasis. Our data show that interplay between intestinal ILC3 cells and adaptive lymphocytes results in robust complementary failsafe mechanisms that ensure gut homeostasis.
  • Item
    Thumbnail Image
    Evolution and comparative analysis of the MHC Class III inflammatory region
    Deakin, JE ; Papenfuss, AT ; Belov, K ; Cross, JGR ; Coggill, P ; Palmer, S ; Sims, S ; Speed, TP ; Beck, S ; Graves, JAM (BMC, 2006-11-02)
    BACKGROUND: The Major Histocompatibility Complex (MHC) is essential for immune function. Historically, it has been subdivided into three regions (Class I, II, and III), but a cluster of functionally related genes within the Class III region has also been referred to as the Class IV region or "inflammatory region". This group of genes is involved in the inflammatory response, and includes members of the tumour necrosis family. Here we report the sequencing, annotation and comparative analysis of a tammar wallaby BAC containing the inflammatory region. We also discuss the extent of sequence conservation across the entire region and identify elements conserved in evolution. RESULTS: Fourteen Class III genes from the tammar wallaby inflammatory region were characterised and compared to their orthologues in other vertebrates. The organisation and sequence of genes in the inflammatory region of both the wallaby and South American opossum are highly conserved compared to known genes from eutherian ("placental") mammals. Some minor differences separate the two marsupial species. Eight genes within the inflammatory region have remained tightly clustered for at least 360 million years, predating the divergence of the amphibian lineage. Analysis of sequence conservation identified 354 elements that are conserved. These range in size from 7 to 431 bases and cover 15.6% of the inflammatory region, representing approximately a 4-fold increase compared to the average for vertebrate genomes. About 5.5% of this conserved sequence is marsupial-specific, including three cases of marsupial-specific repeats. Highly Conserved Elements were also characterised. CONCLUSION: Using comparative analysis, we show that a cluster of MHC genes involved in inflammation, including TNF, LTA (or its putative teleost homolog TNF-N), APOM, and BAT3 have remained together for over 450 million years, predating the divergence of mammals from fish. The observed enrichment in conserved sequences within the inflammatory region suggests conservation at the transcriptional regulatory level, in addition to the functional level.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BMC, 2017-08-07)
    Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.
  • Item
    Thumbnail Image
    Barcoding reveals complex clonal behavior in patient-derived xenografts of metastatic triple negative breast cancer
    Merino, D ; Weber, TS ; Serrano, A ; Vaillant, F ; Liu, K ; Pal, B ; Di Stefano, L ; Schreuder, J ; Lin, D ; Chen, Y ; Asselin-Labat, ML ; Schumacher, TN ; Cameron, D ; Smyth, GK ; Papenfuss, AT ; Lindeman, GJ ; Visvader, JE ; Naik, SH (NATURE PORTFOLIO, 2019-02-15)
    Primary triple negative breast cancers (TNBC) are prone to dissemination but sub-clonal relationships between tumors and resulting metastases are poorly understood. Here we use cellular barcoding of two treatment-naïve TNBC patient-derived xenografts (PDXs) to track the spatio-temporal fate of thousands of barcoded clones in primary tumors, and their metastases. Tumor resection had a major impact on reducing clonal diversity in secondary sites, indicating that most disseminated tumor cells lacked the capacity to 'seed', hence originated from 'shedders' that did not persist. The few clones that continued to grow after resection i.e. 'seeders', did not correlate in frequency with their parental clones in primary tumors. Cisplatin treatment of one BRCA1-mutated PDX model to non-palpable levels had a surprisingly minor impact on clonal diversity in the relapsed tumor yet purged 50% of distal clones. Therefore, clonal features of shedding, seeding and drug resistance are important factors to consider for the design of therapeutic strategies.
  • Item
    Thumbnail Image
    A statistical framework for analyzing deep mutational scanning data (vol 18, 150, 2017)
    Rubin, AF ; Gelman, H ; Lucas, N ; Bajjalieh, SM ; Papenfuss, AT ; Speed, TP ; Fowler, DM (BIOMED CENTRAL LTD, 2018-02-07)
    After publication of our article [1] it was brought to our attention that a line of code was missing from our program to combine the within-replicate variance and between-replicate variance. This led to an overestimation of the standard errors calculated using the Enrich2 random-effects model.
  • Item
    Thumbnail Image
    Mitochondrial Genome Sequence of the Scabies Mite Provides Insight into the Genetic Diversity of Individual Scabies Infections
    Mofiz, E ; Seemann, T ; Bahlo, M ; Holt, D ; Currie, BJ ; Fischer, K ; Papenfuss, AT ; Vinetz, JM (PUBLIC LIBRARY SCIENCE, 2016-02)
    The scabies mite, Sarcoptes scabiei, is an obligate parasite of the skin that infects humans and other animal species, causing scabies, a contagious disease characterized by extreme itching. Scabies infections are a major health problem, particularly in remote Indigenous communities in Australia, where co-infection of epidermal scabies lesions by Group A Streptococci or Staphylococcus aureus is thought to be responsible for the high rate of rheumatic heart disease and chronic kidney disease. We collected and separately sequenced mite DNA from several pools of thousands of whole mites from a porcine model of scabies (S. scabiei var. suis) and two human patients (S. scabiei var. hominis) living in different regions of northern Australia. Our sequencing samples the mite and its metagenome, including the mite gut flora and the wound micro-environment. Here, we describe the mitochondrial genome of the scabies mite. We developed a new de novo assembly pipeline based on a bait-and-reassemble strategy, which produced a 14 kilobase mitochondrial genome sequence assembly. We also annotated 35 genes and have compared these to other Acari mites. We identified single nucleotide polymorphisms (SNPs) and used these to infer the presence of six haplogroups in our samples, Remarkably, these fall into two closely-related clades with one clade including both human and pig varieties. This supports earlier findings that only limited genetic differences may separate some human and animal varieties, and raises the possibility of cross-host infections. Finally, we used these mitochondrial haplotypes to show that the genetic diversity of individual infections is typically small with 1-3 distinct haplotypes per infestation.
  • Item
    Thumbnail Image
    Analysis of the platypus genome suggests a transposon origin for mammalian imprinting
    Pask, AJ ; Papenfuss, AT ; Ager, EI ; Mccoll, KA ; Speed, TP ; Renfree, MB (BIOMED CENTRAL LTD, 2009)
    BACKGROUND: Genomic imprinting is an epigenetic phenomenon that results in monoallelic gene expression. Many hypotheses have been advanced to explain why genomic imprinting evolved in mammals, but few have examined how it arose. The host defence hypothesis suggests that imprinting evolved from existing mechanisms within the cell that act to silence foreign DNA elements that insert into the genome. However, the changes to the mammalian genome that accompanied the evolution of imprinting have been hard to define due to the absence of large scale genomic resources between all extant classes. The recent release of the platypus genome has provided the first opportunity to perform comparisons between prototherian (monotreme; which appear to lack imprinting) and therian (marsupial and eutherian; which have imprinting) mammals. RESULTS: We compared the distribution of repeat elements known to attract epigenetic silencing across the entire genome from monotremes and therian mammals, particularly focusing on the orthologous imprinted regions. There is a significant accumulation of certain repeat elements within imprinted regions of therian mammals compared to the platypus. CONCLUSIONS: Our analyses show that the platypus has significantly fewer repeats of certain classes in the regions of the genome that have become imprinted in therian mammals. The accumulation of repeats, especially long terminal repeats and DNA elements, in therian imprinted genes and gene clusters is coincident with, and may have been a potential driving force in, the development of mammalian genomic imprinting. These data provide strong support for the host defence hypothesis.