School of BioSciences - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Computational biology methods for identifying leaderless secretory proteins in Arabidopsis thaliana and other plant species
    Lonsdale, Andrew Gregory ( 2019)
    Leaderless secretory proteins (LSPs) are proteins that are secreted yet lack the classical canonical signal peptide sequences and, therefore, by definition are undergoing unconventional protein secretion (UPS). We cannot necessarily assume all such proteins found in plant secretomes are LSPs due to the high possibility of cellular contamination arising from cellular disruption, leading to a mix of LSPs and contaminant proteins that are both characterised by the absence of signal peptides. The aims of this Thesis are to use computational biology methods to identify LSPs. In Chapter 1 the current knowledge of secretory pathways (LSPs and UPS) in plants, motifs for secretion, the difficulties of isolating the plant secretome (cell wall/apoplastic fluids) without contamination, and the lack of appropriate bioinformatics tools for plants to distinguish them from LSPs is reviewed. Chapter 2 evaluates a commonly used computational tool, SecretomeP which was trained on mammalian data and is the most widely used prediction tool for LSPs, including in plant studies. Exploring whether this tool is applicable to plants required using conventionally secreted proteins as a proxy, and evaluations were made on SecretomePs premise that conventional and unconventional secretory proteins will share properties. By removing the signal peptide from sequence data, the research shows a bias in scores due to the signal peptide, and only a marginally higher true positive rate compared to false positives. The use of the tool on further plant studies was not recommended and suggested previous inferences of plant LSP status based solely on SecretomeP predictions needed to be re-evaluated. This work was published as ”Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants” (Lonsdale et al., 2016) Chapter 3 details the creation of a putative LSP database for Arabidopsis thaliana by taking the entire proteome and applying a workflow of collating annotation, literature observation and experimental relationships between proteins. This results in a framework to identify proteins without signal peptides that have been observed (unclassified set) or unobserved (non-secreted set) and compare them to conventionally secreted proteins that have been observed (secreted set) or unobserved (theoretically secreted sets). Protein-protein interactions, GO term and PFAM distributions that are similar between the secreted and unclassified protein sets are used to create confidence lists of putative LSPs. In Chapter 4 new prediction tools were created. Firstly, a SecretomeP-like tool is created using the properties of the observed secreted set, with signal peptides removed. Lessons from Chapter 2 on SecretomeP allowed the bias to be minimized. Secondly, tools were created based on the candidate list from the putative LSP database. Each tool is based on a Random Forest (RF) using protein-derived features and trained on subsets of the LSP database. Consensus predictions between them on new data was used to identify further LSP candidates. Additional reporting tools provide a convenient way to map data from the LSPDB onto new sequences, and the results of these tools on LSPs from plants and other organisms shown. In Chapter 5 additional putative candidates for LSPs in Arabidopsis and other species were identified by applying the tools and databases to additional data excluded from the original database created in Chapter 3, including recent cell wall proteomes, secretory pathway experiments and analysis of extracellular vesicles. In conclusion, Chapter 6 discusses the limitations of the computational methods developed, and suggestions for improvement to predict LSPs in plants for further experimental investigation and confirmation of location.
  • Item
    Thumbnail Image
    Tools and techniques for single-cell RNA sequencing data
    Zappia, Luke ( 2019)
    RNA sequencing of individual cells allows us to take a snapshot of the dynamic processes within a cell and explore differences between cell types. As this technology has developed over the last few years it has been rapidly adopted by researchers in areas such as developmental biology, and many single-cell RNA sequencing datasets are now available. Coinciding with the development of protocols for producing single-cell RNA sequencing data there has been a simultaneous burst in the development of computational analysis methods. My thesis explores the computational tools and techniques for analysing single-cell RNA sequencing data. I present a database that charts the release of analysis software, where it has been published and what it can be used for, as well as a website that makes this information publicly available. I also present two of my own tools and techniques including Splatter, a software package for easily simulating single-cell datasets from multiple models, and clustering trees, a visualisation approach for inspecting clustering at multiple resolutions. In the final part of my thesis I perform analysis of a dataset from kidney organoids to demonstrate and compare some current analysis methods. Taken together, my thesis covers many aspects of the tools and techniques for single-cell RNA sequencing by describing the approaches that are available, presenting software that can help in developing and evaluating methods, introducing an approach for aiding one of the most common analysis tasks, and showing how tools can be used to extract meaning from a real dataset.