Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 38
  • Item
    Thumbnail Image
    Overcoming challenges in extracting prescribing habits from veterinary clinics using big data and deep learning
    Hur, B ; Hardefeldt, LY ; Verspoor, K ; Baldwin, T ; Gilkerson, JR (WILEY, 2022-05)
    Understanding antimicrobial usage patterns and encouraging appropriate antimicrobial usage is a critical component of antimicrobial stewardship. Studies using VetCompass Australia and Natural Language Processing (NLP) have demonstrated antimicrobial usage patterns in companion animal practices across Australia. Doing so has highlighted the many obstacles and barriers to the task of converting raw clinical notes into a format that can be readily queried and analysed. We developed NLP systems using rules-based algorithms and machine learning to automate the extraction of data describing the key elements to assess appropriate antimicrobial use. These included the clinical indication, antimicrobial agent selection, dose and duration of therapy. Our methods were applied to over 4.4 million companion animal clinical records across Australia on all consultations with antimicrobial use to help us understand what antibiotics are being given and why on a population level. Of these, approximately only 40% recorded the reason why antimicrobials were prescribed, along with the dose and duration of treatment. NLP and deep learning might be able to overcome the difficulties of harvesting free text data from clinical records, but when the essential data are not recorded in the clinical records, then, this becomes an insurmountable obstacle.
  • Item
    No Preview Available
    Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study.
    Wang, S ; Šuster, S ; Baldwin, T ; Verspoor, K (JMIR Publications, 2022-12-23)
    BACKGROUND: Publication of registered clinical trials is a critical step in the timely dissemination of trial findings. However, a significant proportion of completed clinical trials are never published, motivating the need to analyze the factors behind success or failure to publish. This could inform study design, help regulatory decision-making, and improve resource allocation. It could also enhance our understanding of bias in the publication of trials and publication trends based on the research direction or strength of the findings. Although the publication of clinical trials has been addressed in several descriptive studies at an aggregate level, there is a lack of research on the predictive analysis of a trial's publishability given an individual (planned) clinical trial description. OBJECTIVE: We aimed to conduct a study that combined structured and unstructured features relevant to publication status in a single predictive approach. Established natural language processing techniques as well as recent pretrained language models enabled us to incorporate information from the textual descriptions of clinical trials into a machine learning approach. We were particularly interested in whether and which textual features could improve the classification accuracy for publication outcomes. METHODS: In this study, we used metadata from ClinicalTrials.gov (a registry of clinical trials) and MEDLINE (a database of academic journal articles) to build a data set of clinical trials (N=76,950) that contained the description of a registered trial and its publication outcome (27,702/76,950, 36% published and 49,248/76,950, 64% unpublished). This is the largest data set of its kind, which we released as part of this work. The publication outcome in the data set was identified from MEDLINE based on clinical trial identifiers. We carried out a descriptive analysis and predicted the publication outcome using 2 approaches: a neural network with a large domain-specific language model and a random forest classifier using a weighted bag-of-words representation of text. RESULTS: First, our analysis of the newly created data set corroborates several findings from the existing literature regarding attributes associated with a higher publication rate. Second, a crucial observation from our predictive modeling was that the addition of textual features (eg, eligibility criteria) offers consistent improvements over using only structured data (F1-score=0.62-0.64 vs F1-score=0.61 without textual features). Both pretrained language models and more basic word-based representations provide high-utility text representations, with no significant empirical difference between the two. CONCLUSIONS: Different factors affect the publication of a registered clinical trial. Our approach to predictive modeling combines heterogeneous features, both structured and unstructured. We show that methods from natural language processing can provide effective textual features to enable more accurate prediction of publication success, which has not been explored for this task previously.
  • Item
    No Preview Available
    Overview of ChEMU 2022 Evaluation Campaign: Information Extraction in Chemical Patents
    Li, Y ; Fang, B ; He, J ; Yoshikawa, H ; Akhondi, SA ; Druckenbrodt, C ; Thorne, C ; Afzal, Z ; Zhai, Z ; Baldwin, T ; Verspoor, K ; Barron-Cedeno, A ; DaSanMartino, G ; Esposti, MD ; Sebastiani, F ; Macdonald, C ; Pasi, G ; Hanbury, A ; Potthast, M ; Faggioli, G ; Ferro, N (SPRINGER INTERNATIONAL PUBLISHING AG, 2022)
  • Item
    Thumbnail Image
    Cloze Evaluation for Deeper Understanding of Commonsense Stories in Indonesian
    Koto, F ; Baldwin, T ; Lau, JH (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022-01-01)
  • Item
    Thumbnail Image
    One Country, 700+Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia
    Aji, AF ; Winata, GI ; Koto, F ; Cahyawijaya, S ; Romadhony, A ; Mahendra, R ; Kurniawan, K ; Moeljadi, D ; Prasojo, RE ; Baldwin, T ; Lau, JH ; Ruder, S (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)
  • Item
    Thumbnail Image
    The patient is more dead than alive: exploring the current state of the multi-document summarization of the biomedical literature
    Otmakhova, Y ; Verspoor, K ; Baldwin, T ; Lau, JH (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)
  • Item
    Thumbnail Image
    Can Pretrained Language Models Generate Persuasive, Faithful, and Informative Ad Text for Product Descriptions?
    Koto, F ; Lau, JH ; Baldwin, T (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)
  • Item
    Thumbnail Image
    Optimising Equal Opportunity Fairness in Model Training
    Shen, A ; Han, X ; Cohn, T ; Baldwin, T ; Frermann, L (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)
  • Item
    Thumbnail Image
    MultiSpanQA: A Dataset for Multi-Span Question Answering
    Li, H ; Vasardani, M ; Tomko, M ; Baldwin, T (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)
  • Item
    Thumbnail Image
    What does it take to bake a cake? The RecipeRef corpus and anaphora resolution in procedural text
    Fang, B ; Baldwin, T ; Verspoor, K (ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022)