School of BioSciences - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Computational biology methods for identifying leaderless secretory proteins in Arabidopsis thaliana and other plant species
    Lonsdale, Andrew Gregory ( 2019)
    Leaderless secretory proteins (LSPs) are proteins that are secreted yet lack the classical canonical signal peptide sequences and, therefore, by definition are undergoing unconventional protein secretion (UPS). We cannot necessarily assume all such proteins found in plant secretomes are LSPs due to the high possibility of cellular contamination arising from cellular disruption, leading to a mix of LSPs and contaminant proteins that are both characterised by the absence of signal peptides. The aims of this Thesis are to use computational biology methods to identify LSPs. In Chapter 1 the current knowledge of secretory pathways (LSPs and UPS) in plants, motifs for secretion, the difficulties of isolating the plant secretome (cell wall/apoplastic fluids) without contamination, and the lack of appropriate bioinformatics tools for plants to distinguish them from LSPs is reviewed. Chapter 2 evaluates a commonly used computational tool, SecretomeP which was trained on mammalian data and is the most widely used prediction tool for LSPs, including in plant studies. Exploring whether this tool is applicable to plants required using conventionally secreted proteins as a proxy, and evaluations were made on SecretomePs premise that conventional and unconventional secretory proteins will share properties. By removing the signal peptide from sequence data, the research shows a bias in scores due to the signal peptide, and only a marginally higher true positive rate compared to false positives. The use of the tool on further plant studies was not recommended and suggested previous inferences of plant LSP status based solely on SecretomeP predictions needed to be re-evaluated. This work was published as ”Better Than Nothing? Limitations of the Prediction Tool SecretomeP in the Search for Leaderless Secretory Proteins (LSPs) in Plants” (Lonsdale et al., 2016) Chapter 3 details the creation of a putative LSP database for Arabidopsis thaliana by taking the entire proteome and applying a workflow of collating annotation, literature observation and experimental relationships between proteins. This results in a framework to identify proteins without signal peptides that have been observed (unclassified set) or unobserved (non-secreted set) and compare them to conventionally secreted proteins that have been observed (secreted set) or unobserved (theoretically secreted sets). Protein-protein interactions, GO term and PFAM distributions that are similar between the secreted and unclassified protein sets are used to create confidence lists of putative LSPs. In Chapter 4 new prediction tools were created. Firstly, a SecretomeP-like tool is created using the properties of the observed secreted set, with signal peptides removed. Lessons from Chapter 2 on SecretomeP allowed the bias to be minimized. Secondly, tools were created based on the candidate list from the putative LSP database. Each tool is based on a Random Forest (RF) using protein-derived features and trained on subsets of the LSP database. Consensus predictions between them on new data was used to identify further LSP candidates. Additional reporting tools provide a convenient way to map data from the LSPDB onto new sequences, and the results of these tools on LSPs from plants and other organisms shown. In Chapter 5 additional putative candidates for LSPs in Arabidopsis and other species were identified by applying the tools and databases to additional data excluded from the original database created in Chapter 3, including recent cell wall proteomes, secretory pathway experiments and analysis of extracellular vesicles. In conclusion, Chapter 6 discusses the limitations of the computational methods developed, and suggestions for improvement to predict LSPs in plants for further experimental investigation and confirmation of location.