School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 319
  • Item
    Thumbnail Image
    Sincast: a computational framework to predict cell identities in single-cell transcriptomes using bulk atlases as references
    Deng, Y ; Choi, J ; Cao, K-AL (OXFORD UNIV PRESS, 2022-03-31)
    Characterizing the molecular identity of a cell is an essential step in single-cell RNA sequencing (scRNA-seq) data analysis. Numerous tools exist for predicting cell identity using single-cell reference atlases. However, many challenges remain, including correcting for inherent batch effects between reference and query data andinsufficient phenotype data from the reference. One solution is to project single-cell data onto established bulk reference atlases to leverage their rich phenotype information. Sincast is a computational framework to query scRNA-seq data by projection onto bulk reference atlases. Prior to projection, single-cell data are transformed to be directly comparable to bulk data, either with pseudo-bulk aggregation or graph-based imputation to address sparse single-cell expression profiles. Sincast avoids batch effect correction, and cell identity is predicted along a continuum to highlight new cell states not found in the reference atlas. In several case study scenarios, we show that Sincast projects single cells into the correct biological niches in the expression space of the bulk reference atlas. We demonstrate the effectiveness of our imputation approach that was specifically developed for querying scRNA-seq data based on bulk reference atlases. We show that Sincast is an efficient and powerful tool for single-cell profiling that will facilitate downstream analysis of scRNA-seq data.
  • Item
    Thumbnail Image
    Alterations in the Gut Fungal Community in a Mouse Model of Huntington's Disease
    Kong, G ; Cao, K-AL ; Hannan, AJ ; Shapiro, RS (AMER SOC MICROBIOLOGY, 2022-04-01)
    Huntington's disease (HD) is a neurodegenerative disorder caused by a trinucleotide expansion in the HTT gene, which is expressed throughout the brain and body, including the gut epithelium and enteric nervous system. Afflicted individuals suffer from progressive impairments in motor, psychiatric, and cognitive faculties, as well as peripheral deficits, including the alteration of the gut microbiome. However, studies characterizing the gut microbiome in HD have focused entirely on the bacterial component, while the fungal community (mycobiome) has been overlooked. The gut mycobiome has gained recognition for its role in host homeostasis and maintenance of the gut epithelial barrier. We aimed to characterize the gut mycobiome profile in HD using fecal samples collected from the R6/1 transgenic mouse model (and wild-type littermate controls) from 4 to 12 weeks of age, corresponding to presymptomatic through to early disease stages. Shotgun sequencing was performed on fecal DNA samples, followed by metagenomic analyses. The HD gut mycobiome beta diversity was significantly different from that of wild-type littermates at 12 weeks of age, while no genotype differences were observed at the earlier time points. Similarly, greater alpha diversity was observed in the HD mice by 12 weeks of age. Key taxa, including Malassezia restricta, Yarrowia lipolytica, and Aspergillus species, were identified as having a negative association with HD. Furthermore, integration of the bacterial and fungal data sets at 12 weeks of age identified negative correlations between the HD-associated fungal species and Lactobacillus reuteri. These findings provide new insights into gut microbiome alterations in HD and may help identify novel therapeutic targets. IMPORTANCE Huntington's disease (HD) is a fatal neurodegenerative disorder affecting both the mind and body. We have recently discovered that gut bacteria are disrupted in HD. The present study provides the first evidence of an altered gut fungal community (mycobiome) in HD. The genomes of many thousands of gut microbes were sequenced and used to assess "metagenomics" in particular the different types of fungal species in the HD versus control gut, in a mouse model. At an early disease stage, before the onset of symptoms, the overall gut mycobiome structure (array of fungi) in HD mice was distinct from that of their wild-type littermates. Alterations of multiple key fungi species were identified as being associated with the onset of disease symptoms, some of which showed strong correlations with the gut bacterial community. This study highlights the potential role of gut fungi in HD and may facilitate the development of novel therapeutic approaches.
  • Item
    Thumbnail Image
    Host Traits and Phylogeny Contribute to Shaping Coral-Bacterial Symbioses
    Ricci, F ; Tandon, K ; Black, JR ; Cao, K-AL ; Blackall, LL ; Verbruggen, H ; Raina, J-B (AMER SOC MICROBIOLOGY, 2022-03-07)
    The success of tropical scleractinian corals depends on their ability to establish symbioses with microbial partners. Host phylogeny and traits are known to shape the coral microbiome, but to what extent they affect its composition remains unclear. Here, by using 12 coral species representing the complex and robust clades, we explored the influence of host phylogeny, skeletal architecture, and reproductive mode on the microbiome composition, and further investigated the structure of the tissue and skeleton bacterial communities. Our results show that host phylogeny and traits explained 14% of the tissue and 13% of the skeletal microbiome composition, providing evidence that these predictors contributed to shaping the holobiont in terms of presence and relative abundance of bacterial symbionts. Based on our data, we conclude that host phylogeny affects the presence of specific microbial lineages, reproductive mode predictably influences the microbiome composition, and skeletal architecture works like a filter that affects bacterial relative abundance. We show that the β-diversity of coral tissue and skeleton microbiomes differed, but we found that a large overlapping fraction of bacterial sequences were recovered from both anatomical compartments, supporting the hypothesis that the skeleton can function as a microbial reservoir. Additionally, our analysis of the microbiome structure shows that 99.6% of tissue and 99.7% of skeletal amplicon sequence variants (ASVs) were not consistently present in at least 30% of the samples, suggesting that the coral tissue and skeleton are dominated by rare bacteria. Together, these results provide novel insights into the processes driving coral-bacterial symbioses, along with an improved understanding of the scleractinian microbiome.
  • Item
    Thumbnail Image
    Co-designing and building an expert-elicited non-parametric Bayesian network model: demonstrating a methodology using a Bonamia Ostreae spread risk case study
    Hanea, AM ; Hilton, Z ; Ben, K ; Robinson, AP (WILEY, 2022-02-20)
    The development and use of probabilistic models, particularly Bayesian networks (BN), to support risk-based decision making is well established. Striking an efficient balance between satisfying model complexity and ease of development requires continuous compromise. Codesign, wherein the structural content of the model is developed hand-in-hand with the experts who will be accountable for the parameter estimates, shows promise, as do so-called nonparametric Bayesian networks (NPBNs), which provide a light-touch approach to capturing complex relationships among nodes. We describe and demonstrate the process of codesigning, building, quantifying, and validating an NPBN model for emerging risks and the consequences of potential management decisions using structured expert judgment (SEJ). We develop a case study of the local spread of a marine pathogen, namely, Bonamia ostreae. The BN was developed through a series of semistructured workshops that incorporated extensive feedback from many experts. The model was then quantified with a combination of field and expert-elicited data. The IDEA protocol for SEJ was used in its hybrid (remote and face-to-face) form to elicit information about more than 100 parameters. This article focuses on the modeling and quantification process, the methodological challenges, and the way these were addressed.
  • Item
    Thumbnail Image
    An accurate method for identifying recent recombinants from unaligned sequences
    Feng, Q ; Tiedje, KE ; Ruybal-Pesantez, S ; Tonkin-Hill, G ; Duffy, MF ; Day, KP ; Shim, H ; Chan, Y-B ; Alkan, C (OXFORD UNIV PRESS, 2022-03-28)
    MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17,335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • Item
    Thumbnail Image
    Asymptotics for the critical level and a strong invariance principle for high intensity shot noise fields
    Lachieze-Rey, R ; Muirhead, S ( 2021-11-17)
    We study _ne properties of the convergence of a high intensity shot noise _eld towards the Gaussian _eld with the same covariance structure. In particular we (i) establish a strong invariance principle, i.e. a quantitative coupling between a high intensity shot noise _eld and the Gaussian limit such that they are uniformly close on large domains with high probability, and (ii) use this to derive an asymptotic expansion for the critical level above which the excursion sets of the shot noise _eld percolate.
  • Item
    Thumbnail Image
    A model for analyzing clustered occurrence data
    Hwang, W-H ; Huggins, R ; Stoklosa, J (WILEY, 2021-02-15)
    Spatial or temporal clustering commonly arises in various biological and ecological applications, for example, species or communities may cluster in groups. In this paper, we develop a new clustered occurrence data model where presence-absence data are modeled under a multivariate negative binomial framework. We account for spatial or temporal clustering by introducing a community parameter in the model that controls the strength of dependence between observations thereby enhancing the estimation of the mean and dispersion parameters. We provide conditions to show the existence of maximum likelihood estimates when cluster sizes are homogeneous and equal to 2 or 3 and consider a composite likelihood approach that allows for additional robustness and flexibility in fitting for clustered occurrence data. The proposed method is evaluated in a simulation study and demonstrated using forest plot data from the Center for Tropical Forest Science. Finally, we present several examples using multiple visit occupancy data to illustrate the difference between the proposed model and those of N-mixture models.
  • Item
    Thumbnail Image
    Information content of stepped wedge designs with unequal cluster-period sizes in linear mixed models: Informing incomplete designs.
    Kasza, J ; Bowden, R ; Forbes, AB (Wiley, 2021-03-30)
    In practice, stepped wedge trials frequently include clusters of differing sizes. However, investigations into the theoretical aspects of stepped wedge designs have, until recently, typically assumed equal numbers of subjects in each cluster and in each period. The information content of the cluster-period cells, clusters, and periods of stepped wedge designs has previously been investigated assuming equal cluster-period sizes, and has shown that incomplete stepped wedge designs may be efficient alternatives to the full stepped wedge. How this changes when cluster-period sizes are not equal is unknown, and we investigate this here. Working within the linear mixed model framework, we show that the information contributed by design components (clusters, sequences, and periods) does depend on the sizes of each cluster-period. Using a particular trial that assessed the impact of an individual education intervention on log-length of stay in rehabilitation units, we demonstrate how strongly the efficiency of incomplete designs depends on which cells are excluded: smaller incomplete designs may be more powerful than alternative incomplete designs that include a greater total number of participants. This also serves to demonstrate how the pattern of information content can be used to inform a set of incomplete designs to be considered as alternatives to the complete stepped wedge design. Our theoretical results for the information content can be extended to a broad class of longitudinal (ie, multiple period) cluster randomized trial designs.
  • Item
    Thumbnail Image
    NanoSplicer: accurate identification of splice junctions using Oxford Nanopore sequencing
    You, Y ; Clark, MB ; Shim, H ; Mathelier, A (OXFORD UNIV PRESS, 2022-05-27)
    MOTIVATION: Long read sequencing methods have considerable advantages for characterising RNA isoforms. Oxford nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilising matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. RESULTS: We developed "NanoSplicer" to identify splice junctions using raw nanopore signal (squiggles). For each splice junction the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using 1. synthetic mRNAs with known splice junctions 2. biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. AVAILABILITY AND IMPLEMENTATION: NanoSplicer is freely available at https://github.com/shimlab/NanoSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.6403849. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
  • Item
    No Preview Available
    Sex-Dependent Shared and Nonshared Genetic Architecture Across Mood and Psychotic Disorders
    Blokland, GAM ; Grove, J ; Chen, C-Y ; Cotsapas, C ; Tobet, S ; Handa, R ; St Clair, D ; Lencz, T ; Mowry, BJ ; Periyasamy, S ; Cairns, MJ ; Tooney, PA ; Wu, JQ ; Kelly, B ; Kirov, G ; Sullivan, PF ; Corvin, A ; Riley, BP ; Esko, T ; Milani, L ; Jonsson, EG ; Palotie, A ; Ehrenreich, H ; Begemann, M ; Steixner-Kumar, A ; Sham, PC ; Iwata, N ; Weinberger, DR ; Gejman, P ; Sanders, AR ; Buxbaum, JD ; Rujescu, D ; Giegling, I ; Konte, B ; Hartmann, AM ; Bramon, E ; Murray, RM ; Pato, MT ; Lee, J ; Melle, I ; Molden, E ; Ophoff, RA ; McQuillin, A ; Bass, NJ ; Adolfsson, R ; Malhotra, AK ; Martin, NG ; Fullerton, JM ; Mitchell, PB ; Schofield, PR ; Forstner, AJ ; Degenhardt, F ; Schaupp, S ; Comes, AL ; Kogevinas, M ; Guzman-Parra, J ; Reif, A ; Streit, F ; Sirignano, L ; Cichon, S ; Grigoroiu-Serbanescu, M ; Hauser, J ; Lissowska, J ; Mayoral, F ; Muller-Myhsok, B ; Schulze, TG ; Nothen, MM ; Rietschel, M ; Kelsoe, J ; Leboyer, M ; Jamain, S ; Etain, B ; Bellivier, F ; Vincent, JB ; Alda, M ; O'Donovan, C ; Cervantes, P ; Biernacka, JM ; Frye, M ; McElroy, SL ; Scott, LJ ; Stahl, EA ; Landen, M ; Hamshere, ML ; Smeland, OB ; Djurovic, S ; Vaaler, AE ; Andreassen, OA ; Baune, BT ; Air, T ; Preisig, M ; Uher, R ; Levinson, DF ; Weissman, MM ; Potash, JB ; Shi, J ; Knowles, JA ; Perlis, RH ; Lucae, S ; Boomsma, D ; Penninx, BWJH ; Hottenga, J-J ; de Geus, EJC ; Willemsen, G ; Milaneschi, Y ; Tiemeier, H ; Grabe, HJ ; Teumer, A ; Van der Auwera, S ; Volker, U ; Hamilton, SP ; Magnusson, PKE ; Viktorin, A ; Mehta, D ; Mullins, N ; Adams, MJ ; Breen, G ; McIntosh, AM ; Lewis, CM ; Hougaard, DM ; Nordentoft, M ; Mors, O ; Mortensen, PB ; Werge, T ; Als, TD ; Borglum, AD ; Petryshen, TL ; Smoller, JW ; Goldstein, JM (ELSEVIER SCIENCE INC, 2021-11-29)
    BACKGROUND: Sex differences in incidence and/or presentation of schizophrenia (SCZ), major depressive disorder (MDD), and bipolar disorder (BIP) are pervasive. Previous evidence for shared genetic risk and sex differences in brain abnormalities across disorders suggest possible shared sex-dependent genetic risk. METHODS: We conducted the largest to date genome-wide genotype-by-sex (G×S) interaction of risk for these disorders using 85,735 cases (33,403 SCZ, 19,924 BIP, and 32,408 MDD) and 109,946 controls from the PGC (Psychiatric Genomics Consortium) and iPSYCH. RESULTS: Across disorders, genome-wide significant single nucleotide polymorphism-by-sex interaction was detected for a locus encompassing NKAIN2 (rs117780815, p = 3.2 × 10-8), which interacts with sodium/potassium-transporting ATPase (adenosine triphosphatase) enzymes, implicating neuronal excitability. Three additional loci showed evidence (p < 1 × 10-6) for cross-disorder G×S interaction (rs7302529, p = 1.6 × 10-7; rs73033497, p = 8.8 × 10-7; rs7914279, p = 6.4 × 10-7), implicating various functions. Gene-based analyses identified G×S interaction across disorders (p = 8.97 × 10-7) with transcriptional inhibitor SLTM. Most significant in SCZ was a MOCOS gene locus (rs11665282, p = 1.5 × 10-7), implicating vascular endothelial cells. Secondary analysis of the PGC-SCZ dataset detected an interaction (rs13265509, p = 1.1 × 10-7) in a locus containing IDO2, a kynurenine pathway enzyme with immunoregulatory functions implicated in SCZ, BIP, and MDD. Pathway enrichment analysis detected significant G×S interaction of genes regulating vascular endothelial growth factor receptor signaling in MDD (false discovery rate-corrected p < .05). CONCLUSIONS: In the largest genome-wide G×S analysis of mood and psychotic disorders to date, there was substantial genetic overlap between the sexes. However, significant sex-dependent effects were enriched for genes related to neuronal development and immune and vascular functions across and within SCZ, BIP, and MDD at the variant, gene, and pathway levels.