School of Mathematics and Statistics - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 14
  • Item
    Thumbnail Image
    SSNIP-seq: A simple and rapid method for isolation of single-sperm nucleic acid for high-throughput sequencing
    Novakovic, S ; Tsui, V ; Semple, T ; Martelotto, L ; McCarthy, DJ ; Crismani, W ; Drevet, JR (PUBLIC LIBRARY SCIENCE, 2022-09-29)
    We developed a simple and reliable method for the isolation of haploid nuclei from fresh and frozen testes. The described protocol uses readily available reagents in combination with flow cytometry to separate haploid and diploid nuclei. The protocol can be completed within 1 hour and the resulting individual haploid nuclei have intact morphology. The isolated nuclei are suitable for library preparation for high-throughput DNA and RNA sequencing using bulk or single nuclei. The protocol was optimised with mouse testes and we anticipate that it can be applied for the isolation of mature sperm from other mammals including humans.
  • Item
    Thumbnail Image
    Trade-off between conservation of biological variation and batch effect removal in deep generative modeling for single-cell transcriptomics
    Li, H ; McCarthy, DJ ; Shim, H ; Wei, S (BMC, 2022-11-03)
    BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird's eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. RESULTS: A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard maximum mean discrepancy measure. CONCLUSION: The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.
  • Item
    Thumbnail Image
    sgcocaller and comapr: personalised haplotype assembly and comparative crossover map analysis using single-gamete sequencing data
    Lyu, R ; Tsui, V ; Crismani, W ; Liu, R ; Shim, H ; McCarthy, DJ (OXFORD UNIV PRESS, 2022-11-11)
    Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through the availability of high-throughput single-cell sequencing technologies. However, high-throughput single-gamete data commonly have low depth of coverage per gamete, which challenges existing gamete-based haplotype phasing methods. In addition, haplotyping a large number of single gametes from high-throughput single-cell DNA sequencing data and constructing meiotic crossover profiles using existing methods requires intensive processing. Here, we introduce efficient software tools for the essential tasks of generating personalised haplotypes and calling crossovers in gametes from single-gamete DNA sequencing data (sgcocaller), and constructing, visualising, and comparing individualised crossover landscapes from single gametes (comapr). With additional data pre-possessing, the tools can also be applied to bulk-sequenced samples. We demonstrate that sgcocaller is able to generate impeccable phasing results for high-coverage datasets, on which it is more accurate and stable than existing methods, and also performs well on low-coverage single-gamete sequencing datasets for which current methods fail. Our tools achieve highly accurate results with user-friendly installation, comprehensive documentation, efficient computation times and minimal memory usage.
  • Item
    Thumbnail Image
    Case Report: Hypoglycemia Due to a Novel Activating Glucokinase Variant in an Adult - a Molecular Approach
    Koneshamoorthy, A ; Seniveratne-Epa, D ; Calder, G ; Sawyer, M ; Kay, TWH ; Farrell, S ; Loudovaris, T ; Mariana, L ; McCarthy, D ; Lyu, R ; Liu, X ; Thorn, P ; Tong, J ; Chin, LK ; Zacharin, M ; Trainer, A ; Taylor, S ; MacIsaac, RJ ; Sachithanandan, N ; Thomas, HE ; Krishnamurthy, B (FRONTIERS MEDIA SA, 2022-03-17)
    We present a case of an obese 22-year-old man with activating GCK variant who had neonatal hypoglycemia, re-emerging with hypoglycemia later in life. We investigated him for asymptomatic hypoglycemia with a family history of hypoglycemia. Genetic testing yielded a novel GCK missense class 3 variant that was subsequently found in his mother, sister and nephew and reclassified as a class 4 likely pathogenic variant. Glucokinase enables phosphorylation of glucose, the rate-limiting step of glycolysis in the liver and pancreatic β cells. It plays a crucial role in the regulation of insulin secretion. Inactivating variants in GCK cause hyperglycemia and activating variants cause hypoglycemia. Spleen-preserving distal pancreatectomy revealed diffuse hyperplastic islets, nuclear pleomorphism and periductular islets. Glucose stimulated insulin secretion revealed increased insulin secretion in response to glucose. Cytoplasmic calcium, which triggers exocytosis of insulin-containing granules, revealed normal basal but increased glucose-stimulated level. Unbiased gene expression analysis using 10X single cell sequencing revealed upregulated INS and CKB genes and downregulated DLK1 and NPY genes in β-cells. Further studies are required to see if alteration in expression of these genes plays a role in the metabolic and histological phenotype associated with glucokinase pathogenic variant. There were more large islets in the patient's pancreas than in control subjects but there was no difference in the proportion of β cells in the islets. His hypoglycemia was persistent after pancreatectomy, was refractory to diazoxide and improved with pasireotide. This case highlights the variable phenotype of GCK mutations. In-depth molecular analyses in the islets have revealed possible mechanisms for hyperplastic islets and insulin hypersecretion.
  • Item
    Thumbnail Image
    splatPop: simulating population scale single-cell RNA sequencing data
    Azodi, CB ; Zappia, L ; Oshlack, A ; McCarthy, DJ (BMC, 2021-12-15)
    Population-scale single-cell RNA sequencing (scRNA-seq) is now viable, enabling finer resolution functional genomics studies and leading to a rush to adapt bulk methods and develop new single-cell-specific methods to perform these studies. Simulations are useful for developing, testing, and benchmarking methods but current scRNA-seq simulation frameworks do not simulate population-scale data with genetic effects. Here, we present splatPop, a model for flexible, reproducible, and well-documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci. splatPop can also simulate complex batch, cell group, and conditional effects between individuals from different cohorts as well as genetically-driven co-expression.
  • Item
    Thumbnail Image
    Key signaling networks are dysregulated in patients with the adipose tissue disorder, lipedema
    Ishaq, M ; Bandara, N ; Morgan, S ; Nowell, C ; Mehdi, AM ; Lyu, R ; McCarthy, D ; Anderson, D ; Creek, DJ ; Achen, MG ; Shayan, R ; Karnezis, T (SPRINGERNATURE, 2022-03)
    OBJECTIVES: Lipedema, a poorly understood chronic disease of adipose hyper-deposition, is often mistaken for obesity and causes significant impairment to mobility and quality-of-life. To identify molecular mechanisms underpinning lipedema, we employed comprehensive omics-based comparative analyses of whole tissue, adipocyte precursors (adipose-derived stem cells (ADSCs)), and adipocytes from patients with or without lipedema. METHODS: We compared whole-tissues, ADSCs, and adipocytes from body mass index-matched lipedema (n = 14) and unaffected (n = 10) patients using comprehensive global lipidomic and metabolomic analyses, transcriptional profiling, and functional assays. RESULTS: Transcriptional profiling revealed >4400 significant differences in lipedema tissue, with altered levels of mRNAs involved in critical signaling and cell function-regulating pathways (e.g., lipid metabolism and cell-cycle/proliferation). Functional assays showed accelerated ADSC proliferation and differentiation in lipedema. Profiling lipedema adipocytes revealed >900 changes in lipid composition and >600 differentially altered metabolites. Transcriptional profiling of lipedema ADSCs and non-lipedema ADSCs revealed significant differential expression of >3400 genes including some involved in extracellular matrix and cell-cycle/proliferation signaling pathways. One upregulated gene in lipedema ADSCs, Bub1, encodes a cell-cycle regulator, central to the kinetochore complex, which regulates several histone proteins involved in cell proliferation. Downstream signaling analysis of lipedema ADSCs demonstrated enhanced activation of histone H2A, a key cell proliferation driver and Bub1 target. Critically, hyperproliferation exhibited by lipedema ADSCs was inhibited by the small molecule Bub1 inhibitor 2OH-BNPP1 and by CRISPR/Cas9-mediated Bub1 gene depletion. CONCLUSION: We found significant differences in gene expression, and lipid and metabolite profiles, in tissue, ADSCs, and adipocytes from lipedema patients compared to non-affected controls. Functional assays demonstrated that dysregulated Bub1 signaling drives increased proliferation of lipedema ADSCs, suggesting a potential mechanism for enhanced adipogenesis in lipedema. Importantly, our characterization of signaling networks driving lipedema identifies potential molecular targets, including Bub1, for novel lipedema therapeutics.
  • Item
    Thumbnail Image
    Optimizing expression quantitative trait locus mapping workflows for single-cell studies
    Cuomo, ASE ; Alvari, G ; Azodi, CB ; McCarthy, DJ ; Bonder, MJ (BMC, 2021-06-24)
    BACKGROUND: Single-cell RNA sequencing (scRNA-seq) has enabled the unbiased, high-throughput quantification of gene expression specific to cell types and states. With the cost of scRNA-seq decreasing and techniques for sample multiplexing improving, population-scale scRNA-seq, and thus single-cell expression quantitative trait locus (sc-eQTL) mapping, is increasingly feasible. Mapping of sc-eQTL provides additional resolution to study the regulatory role of common genetic variants on gene expression across a plethora of cell types and states and promises to improve our understanding of genetic regulation across tissues in both health and disease. RESULTS: While previously established methods for bulk eQTL mapping can, in principle, be applied to sc-eQTL mapping, there are a number of open questions about how best to process scRNA-seq data and adapt bulk methods to optimize sc-eQTL mapping. Here, we evaluate the role of different normalization and aggregation strategies, covariate adjustment techniques, and multiple testing correction methods to establish best practice guidelines. We use both real and simulated datasets across single-cell technologies to systematically assess the impact of these different statistical approaches. CONCLUSION: We provide recommendations for future single-cell eQTL studies that can yield up to twice as many eQTL discoveries as default approaches ported from bulk studies.
  • Item
    Thumbnail Image
    Personalized genome structure via single gamete sequencing
    Lyu, R ; Tsui, V ; McCarthy, DJ ; Crismani, W (BMC, 2021-04-19)
    Genetic maps have been fundamental to building our understanding of disease genetics and evolutionary processes. The gametes of an individual contain all of the information required to perform a de novo chromosome-scale assembly of an individual's genome, which historically has been performed with populations and pedigrees. Here, we discuss how single-cell gamete sequencing offers the potential to merge the advantages of short-read sequencing with the ability to build personalized genetic maps and open up an entirely new space in personalized genetics.
  • Item
    Thumbnail Image
    Properties of structural variants and short tandem repeats associated with gene expression and complex traits.
    Jakubosky, D ; D'Antonio, M ; Bonder, MJ ; Smail, C ; Donovan, MKR ; Young Greenwald, WW ; Matsui, H ; i2QTL Consortium, ; D'Antonio-Chronowska, A ; Stegle, O ; Smith, EN ; Montgomery, SB ; DeBoever, C ; Frazer, KA (Nature Research (part of Springer Nature), 2020-06-10)
    Structural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we identify genomic features of SV classes and STRs that are associated with gene expression and complex traits, including their locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We identify a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and show that they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that are associated with gene expression and human traits.
  • Item
    Thumbnail Image
    Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats.
    Jakubosky, D ; Smith, EN ; D'Antonio, M ; Jan Bonder, M ; Young Greenwald, WW ; D'Antonio-Chronowska, A ; Matsui, H ; i2QTL Consortium, ; Stegle, O ; Montgomery, SB ; DeBoever, C ; Frazer, KA (Nature Research (part of Springer Nature), 2020-06-10)
    Structural variants (SVs) and short tandem repeats (STRs) are important sources of genetic diversity but are not routinely analyzed in genetic studies because they are difficult to accurately identify and genotype. Because SVs and STRs range in size and type, it is necessary to apply multiple algorithms that incorporate different types of evidence from sequencing data and employ complex filtering strategies to discover a comprehensive set of high-quality and reproducible variants. Here we assemble a set of 719 deep whole genome sequencing (WGS) samples (mean 42×) from 477 distinct individuals which we use to discover and genotype a wide spectrum of SV and STR variants using five algorithms. We use 177 unique pairs of genetic replicates to identify factors that affect variant call reproducibility and develop a systematic filtering strategy to create of one of the most complete and well characterized maps of SVs and STRs to date.