Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 38
  • Item
    No Preview Available
    Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease
    Serghini, A ; Portelli, S ; Troadec, G ; Song, C ; Pan, Q ; Pires, DE ; Ascher, DB (OXFORD UNIV PRESS, 2024-01-20)
    BACKGROUND: Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS: To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS: Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION: This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.
  • Item
    No Preview Available
    A Broad-Spectrum α-Glucosidase of Glycoside Hydrolase Family 13 from Marinovum sp., a Member of the Roseobacter Clade
    Li, J ; Mui, JW-Y ; da Silva, BM ; Pires, DEV ; Ascher, DB ; Soler, NM ; Goddard-Borger, ED ; Williams, SJ (SPRINGER, 2024-01-05)
    Glycoside hydrolases (GHs) are a diverse group of enzymes that catalyze the hydrolysis of glycosidic bonds. The Carbohydrate-Active enZymes (CAZy) classification organizes GHs into families based on sequence data and function, with fewer than 1% of the predicted proteins characterized biochemically. Consideration of genomic context can provide clues to infer possible enzyme activities for proteins of unknown function. We used the MultiGeneBLAST tool to discover a gene cluster in Marinovum sp., a member of the marine Roseobacter clade, that encodes homologues of enzymes belonging to the sulfoquinovose monooxygenase pathway for sulfosugar catabolism. This cluster lacks a gene encoding a classical family GH31 sulfoquinovosidase candidate, but which instead includes an uncharacterized family GH13 protein (MsGH13) that we hypothesized could be a non-classical sulfoquinovosidase. Surprisingly, recombinant MsGH13 lacks sulfoquinovosidase activity and is a broad-spectrum α-glucosidase that is active on a diverse array of α-linked disaccharides, including maltose, sucrose, nigerose, trehalose, isomaltose, and kojibiose. Using AlphaFold, a 3D model for the MsGH13 enzyme was constructed that predicted its active site shared close similarity with an α-glucosidase from Halomonas sp. H11 of the same GH13 subfamily that shows narrower substrate specificity.
  • Item
    No Preview Available
    AI-driven GPCR analysis, engineering, and targeting
    Velloso, JPL ; Kovacs, AS ; Pires, DEV ; Ascher, DB (ELSEVIER SCI LTD, 2024-02)
    This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
  • Item
    No Preview Available
    Understanding the complementarity and plasticity of antibody-antigen interfaces.
    Myung, Y ; Pires, DEV ; Ascher, DB ; Valencia, A (Oxford University Press (OUP), 2023-07-01)
    MOTIVATION: While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognize. Here, we have explored the structural landscapes of antibody-antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. RESULTS: We found that complementarity-determining regions utilized deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilized arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody-antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. AVAILABILITY AND IMPLEMENTATION: The data and scripts are available at: https://github.com/YoochanMyung/scripts.
  • Item
    No Preview Available
    LEGO-CSM: a tool for functional characterization of proteins
    Nguyen, TB ; de Sa, AGC ; Rodrigues, CHM ; Pires, DE ; Ascher, DB ; Valencia, A (OXFORD UNIV PRESS, 2023-07-01)
    MOTIVATION: With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterizing protein functions. Localization, EC numbers, and GO terms with the structure-based Cutoff Scanning Matrix (LEGO-CSM) is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localization, Enzyme Commission (EC) numbers, and Gene Ontology (GO) terms. RESULTS: We show our models perform as well as or better than alternative approaches, achieving area under the receiver operating characteristic curve of up to 0.93 for subcellular localization, up to 0.93 for EC, and up to 0.81 for GO terms on independent blind tests. AVAILABILITY AND IMPLEMENTATION: LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data.
  • Item
    No Preview Available
    Identifying the molecular drivers of ALS-implicated missense mutations
    Portelli, S ; Albanaz, A ; Pires, DEV ; Ascher, DB (BMJ PUBLISHING GROUP, 2023-05)
    BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a progressively fatal, neurodegenerative disease associated with both motor and non-motor symptoms, including frontotemporal dementia. Approximately 10% of cases are genetically inherited (familial ALS), while the majority are sporadic. Mutations across a wide range of genes have been associated; however, the underlying molecular effects of these mutations and their relation to phenotypes remain poorly explored. METHODS: We initially curated an extensive list (n=1343) of missense mutations identified in the clinical literature, which spanned across 111 unique genes. Of these, mutations in genes SOD1, FUS and TDP43 were analysed using in silico biophysical tools, which characterised changes in protein stability, interactions, localisation and function. The effects of pathogenic and non-pathogenic mutations within these genes were statistically compared to highlight underlying molecular drivers. RESULTS: Compared with previous ALS-dedicated databases, we have curated the most extensive missense mutation database to date and observed a twofold increase in unique implicated genes, and almost a threefold increase in the number of mutations. Our gene-specific analysis identified distinct molecular drivers across the different proteins, where SOD1 mutations primarily reduced protein stability and dimer formation, and those in FUS and TDP-43 were present within disordered regions, suggesting different mechanisms of aggregate formation. CONCLUSION: Using our three genes as case studies, we identified distinct insights which can drive further research to better understand ALS. The information curated in our database can serve as a resource for similar gene-specific analyses, further improving the current understanding of disease, crucial for the development of treatment strategies.
  • Item
    No Preview Available
    DDMut: predicting effects of mutations on protein stability using deep learning
    Zhou, Y ; Pan, Q ; Pires, DE ; Rodrigues, CHM ; Ascher, DB (OXFORD UNIV PRESS, 2023-07-05)
    Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.
  • Item
    No Preview Available
    toxCSM: comprehensive prediction of small molecule toxicity profiles
    de Sa, AGC ; Long, Y ; Portelli, S ; Pires, DE ; Ascher, DB (OXFORD UNIV PRESS, 2022-09-20)
    Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.
  • Item
    No Preview Available
    epitope3D: a machine learning method for conformational B-cell epitope prediction
    da Silva, BM ; Myung, Y ; Ascher, DB ; Pires, DE (OXFORD UNIV PRESS, 2022-01-17)
    The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction; however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew's Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.
  • Item
    No Preview Available
    CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function
    Thanh, BN ; Pires, DE ; Ascher, DB (OXFORD UNIV PRESS, 2022-01-17)
    Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.