Computing and Information Systems - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/350

Search Results

Now showing 1 - 10 of 53

Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease

Serghini, A ; Portelli, S ; Troadec, G ; Song, C ; Pan, Q ; Pires, DE ; Ascher, DB (OXFORD UNIV PRESS, 2024-01-20)

BACKGROUND: Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS: To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS: Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION: This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.
A Broad-Spectrum α-Glucosidase of Glycoside Hydrolase Family 13 from Marinovum sp., a Member of the Roseobacter Clade

Li, J ; Mui, JW-Y ; da Silva, BM ; Pires, DEV ; Ascher, DB ; Soler, NM ; Goddard-Borger, ED ; Williams, SJ (SPRINGER, 2024-01-05)

Glycoside hydrolases (GHs) are a diverse group of enzymes that catalyze the hydrolysis of glycosidic bonds. The Carbohydrate-Active enZymes (CAZy) classification organizes GHs into families based on sequence data and function, with fewer than 1% of the predicted proteins characterized biochemically. Consideration of genomic context can provide clues to infer possible enzyme activities for proteins of unknown function. We used the MultiGeneBLAST tool to discover a gene cluster in Marinovum sp., a member of the marine Roseobacter clade, that encodes homologues of enzymes belonging to the sulfoquinovose monooxygenase pathway for sulfosugar catabolism. This cluster lacks a gene encoding a classical family GH31 sulfoquinovosidase candidate, but which instead includes an uncharacterized family GH13 protein (MsGH13) that we hypothesized could be a non-classical sulfoquinovosidase. Surprisingly, recombinant MsGH13 lacks sulfoquinovosidase activity and is a broad-spectrum α-glucosidase that is active on a diverse array of α-linked disaccharides, including maltose, sucrose, nigerose, trehalose, isomaltose, and kojibiose. Using AlphaFold, a 3D model for the MsGH13 enzyme was constructed that predicted its active site shared close similarity with an α-glucosidase from Halomonas sp. H11 of the same GH13 subfamily that shows narrower substrate specificity.
AI-driven GPCR analysis, engineering, and targeting

Velloso, JPL ; Kovacs, AS ; Pires, DEV ; Ascher, DB (ELSEVIER SCI LTD, 2024-02)

This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Developing a deep learning natural language processing algorithm for automated reporting of adverse drug reactions

McMaster, C ; Chan, J ; Liew, DFL ; Su, E ; Frauman, AG ; Chapman, WW ; Pires, DEV (ACADEMIC PRESS INC ELSEVIER SCIENCE, 2023-01)

The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5%-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 1.1 million unlabelled clinical documents; secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. This model was compared to a version without the pre-training step, and a previously published RoBERTa model pretrained on MIMIC III, which has demonstrated strong performance on other pharmacovigilance tasks. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.955 (95% CI 0.933 - 0.978) for the task of identifying discharge summaries containing ADR mentions, significantly outperforming the two comparator models.
Understanding the complementarity and plasticity of antibody-antigen interfaces.

Myung, Y ; Pires, DEV ; Ascher, DB ; Valencia, A (Oxford University Press (OUP), 2023-07-01)

MOTIVATION: While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognize. Here, we have explored the structural landscapes of antibody-antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. RESULTS: We found that complementarity-determining regions utilized deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilized arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody-antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. AVAILABILITY AND IMPLEMENTATION: The data and scripts are available at: https://github.com/YoochanMyung/scripts.
LEGO-CSM: a tool for functional characterization of proteins

Nguyen, TB ; de Sa, AGC ; Rodrigues, CHM ; Pires, DE ; Ascher, DB ; Valencia, A (OXFORD UNIV PRESS, 2023-07-01)

MOTIVATION: With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterizing protein functions. Localization, EC numbers, and GO terms with the structure-based Cutoff Scanning Matrix (LEGO-CSM) is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localization, Enzyme Commission (EC) numbers, and Gene Ontology (GO) terms. RESULTS: We show our models perform as well as or better than alternative approaches, achieving area under the receiver operating characteristic curve of up to 0.93 for subcellular localization, up to 0.93 for EC, and up to 0.81 for GO terms on independent blind tests. AVAILABILITY AND IMPLEMENTATION: LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data.
Data-driven overdiagnosis definitions: A scoping review

Senevirathna, P ; Pires, DEV ; Capurro, D (ACADEMIC PRESS INC ELSEVIER SCIENCE, 2023-11)

INTRODUCTION: Adequate methods to promptly translate digital health innovations for improved patient care are essential. Advances in Artificial Intelligence (AI) and Machine Learning (ML) have been sources of digital innovation and hold the promise to revolutionize the way we treat, manage and diagnose patients. Understanding the benefits but also the potential adverse effects of digital health innovations, particularly when these are made available or applied on healthier segments of the population is essential. One of such adverse effects is overdiagnosis. OBJECTIVE: to comprehensively analyze quantification strategies and data-driven definitions for overdiagnosis reported in the literature. METHODS: we conducted a scoping systematic review of manuscripts describing quantitative methods to estimate the proportion of overdiagnosed patients. RESULTS: we identified 46 studies that met our inclusion criteria. They covered a variety of clinical conditions, primarily breast and prostate cancer. Methods to quantify overdiagnosis included both prospective and retrospective methods including randomized clinical trials, and simulations. CONCLUSION: a variety of methods to quantify overdiagnosis have been published, producing widely diverging results. A standard method to quantify overdiagnosis is needed to allow its mitigation during the rapidly increasing development of new digital diagnostic tools.
Identifying the molecular drivers of ALS-implicated missense mutations

Portelli, S ; Albanaz, A ; Pires, DEV ; Ascher, DB (BMJ PUBLISHING GROUP, 2023-05)

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a progressively fatal, neurodegenerative disease associated with both motor and non-motor symptoms, including frontotemporal dementia. Approximately 10% of cases are genetically inherited (familial ALS), while the majority are sporadic. Mutations across a wide range of genes have been associated; however, the underlying molecular effects of these mutations and their relation to phenotypes remain poorly explored. METHODS: We initially curated an extensive list (n=1343) of missense mutations identified in the clinical literature, which spanned across 111 unique genes. Of these, mutations in genes SOD1, FUS and TDP43 were analysed using in silico biophysical tools, which characterised changes in protein stability, interactions, localisation and function. The effects of pathogenic and non-pathogenic mutations within these genes were statistically compared to highlight underlying molecular drivers. RESULTS: Compared with previous ALS-dedicated databases, we have curated the most extensive missense mutation database to date and observed a twofold increase in unique implicated genes, and almost a threefold increase in the number of mutations. Our gene-specific analysis identified distinct molecular drivers across the different proteins, where SOD1 mutations primarily reduced protein stability and dimer formation, and those in FUS and TDP-43 were present within disordered regions, suggesting different mechanisms of aggregate formation. CONCLUSION: Using our three genes as case studies, we identified distinct insights which can drive further research to better understand ALS. The information curated in our database can serve as a resource for similar gene-specific analyses, further improving the current understanding of disease, crucial for the development of treatment strategies.
DDMut: predicting effects of mutations on protein stability using deep learning

Zhou, Y ; Pan, Q ; Pires, DE ; Rodrigues, CHM ; Ascher, DB (OXFORD UNIV PRESS, 2023-07-05)

Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.
toxCSM: comprehensive prediction of small molecule toxicity profiles

de Sa, AGC ; Long, Y ; Portelli, S ; Pires, DE ; Ascher, DB (OXFORD UNIV PRESS, 2022-09-20)

Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson's correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at http://biosig.lab.uq.edu.au/toxcsm.

Computing and Information Systems - Research Publications

Permanent URI for this collection

Filters

Date

Author

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results