Minerva Elements Records

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 22
  • Item
    Thumbnail Image
    Enhancing Predictive Modeling in Emergency Departments
    Kouhounestani, M ; Song, L ; Luo, L ; Aickelin, U (SCITEPRESS - Science and Technology Publications, 2024)
    Increasing global Emergency Department (ED) visits, exacerbated by COVID-19, has presented multiple challenges in recent years. Electronic Health Records (EHRs) as comprehensive digital repositories of patient health information offer a pathway to construct prediction systems to address these issues. However, the heterogeneity of EHRs complicates accurate predictions. A notable challenge is the prevalence of high-cardinality nominal features (NFs) in EHRs. Due to their numerous distinct values, these features are often excluded from the analysis, risking information loss, reduced accuracy, and interpretability. This study proposes a framework, integrating a preprocessing technique with target encoding (TE-PrepNet) into machine learning (ML) models to address challenges of NFs from MIMIC-IV-ED. We evaluate performance of TE-PrepNet in two specific ED-based prediction tasks: triage-based hospital admissions and ED reattendance within 72 hours at discharge time. Incorporating three NFs, our approach demonstrates improvements compared to the baseline and outperforms previous research that overlooked NFs. Random forest model with TE-PrepNet in the prediction of hospitalisation achieved an AUROC of 0.8458, compared to the baseline AUROC of 0.7520. For the prediction of ED reattendance within 72 hours, the utilisation of XGBoost yielded an improvement, attaining an AUROC of 0.6975, outperforming the baseline AUROC of 0.6166.
  • Item
    Thumbnail Image
    Identification of Patient Ventilator Asynchrony in Physiological Data Through Integrating Machine-Learning
    Stell, A ; Caparo, E ; Wang, Z ; Wang, C ; Berlowitz, D ; Howard, M ; Sinnott, R ; Aickelin, U (SCITEPRESS - Science and Technology Publications, 2024)
    Patient Ventilator Asynchrony (PVA) occurs where a mechanical ventilator aiding a patient's breathing falls out of synchronisation with their breathing pattern. This de-synchronisation may cause patient distress and can lead to long-term negative clinical outcomes. Research into the causes and possible mitigations of PVA is currently conducted by clinical domain experts using manual methods, such as parsing entire sleep hypnograms visually, and identifying and tagging instances of PVA that they find. This process is very labour-intensive and can be error prone. This project aims to make this analysis more efficient, by using machine-learning approaches to automatically parse, classify, and suggest instances of PVA for ultimate confirmation by domain experts. The solution has been developed based on a retrospective dataset of intervention and control patients that were recruited to a non-invasive ventilation study. This achieves a specificity metric of over 90%. This paper describes the process of integrating the output of the machine learning into the bedside clinical monitoring system for production use in anticipation of a future clinical trial.
  • Item
    Thumbnail Image
    Uncertainty in Selective Bagging: A Dynamic Bi-objective Optimization Model
    Maadi, M ; Akbarzadeh Khorshidi, H ; Aickelin, U (SIAM, 2023)
    Bagging is a common approach in ensemble learning that generates a group of classifiers through bootstrapping for classification tasks. Despite its wide applications, generating redundant classifiers remains a central challenge in bagging. In recent years, many selective bagging models have been presented to deal with this challenge. These models mostly focused on the accuracy of classifiers and the diversity among them. Despite the importance of uncertainty in the performance of ensemble classifiers, this criterion has been neglected in selective bagging models. In this paper, we propose a two-stage selective bagging model. In the first stage, we formalize the selective bagging problem as a bi-objective optimization model considering both the uncertainty and accuracy of classifiers. We propose an adaptive evolutionary Two-Arch2 algorithm, named Diverse-Two-Arch2, to solve the bi-objective model. The output of this stage is a subset of classifiers that are diverse, certain about correct predictions, and uncertain about incorrect predictions. While most selective bagging models focus on the selection of a fixed subset of classifiers for all test samples (static approach), our proposed model has a dynamic approach to the selection process. So, in the second stage of the model, we select only certain classifiers to make an ensemble prediction for each test sample. Experimental results on twenty data sets and comparing with two ensemble models, and five state-of-the-art dynamic selective bagging models show the outperformance of the proposed model. We also compare the performance of the proposed Diverse-Two-Arch2 to alternative evolutionary computation methods.
  • Item
    Thumbnail Image
    Capturing prediction uncertainty in upstream cell culture models using conformal prediction and Gaussian processes
    Pham, TD ; Aickelin, U ; Bassett, R ; Papadopoulos, H ; Nguyen, KA ; Boström, H ; Carlsson, L (ML Research Press, 2023)
    This extended abstract compares the efficacy of Gaussian process and conformal XGBoost regressions in capturing prediction uncertainty in simulated and industrial cell culture data.
  • Item
    Thumbnail Image
    An Uncertainty-Accuracy-Based Score Function for Wrapper Methods in Feature Selection
    Maadi, M ; Khorshidi, HA ; Aickelin, U (Institute of Electrical and Electronics Engineers, 2023)
    Feature Selection (FS) is an effective preprocessing method to deal with the curse of dimensionality in machine learning. Redundant features in datasets decrease the classification performance and increase the computational complexity. Wrapper methods are an important category of FS methods that evaluate various feature subsets and select the best one using performance measures related to a classifier. In these methods, the accuracy of classifiers is the most common performance measure for FS. Although the performance of classifiers depends on their uncertainty, this important criterion is neglected in these methods. In this paper, we present a new performance measure called Uncertainty-Accuracy-based Performance Measure for Feature Selection (UAPMFS) that uses an ensemble approach to measure both the accuracy and uncertainty of classifiers. UAPMFS uses bagging and uncertainty confusion matrix. This performance measure can be used in all wrapper methods to improve FS performance. We design two experiments to evaluate the performance of UAPMFS in wrapper methods. In experiments, we use the leave-one-variable-out strategy as the common strategy in wrapper methods to evaluate features. We also define a feature score function based on UAPMFS to rank and select features. In the first experiment, we investigate the importance of considering uncertainty in the FS process and show how neglecting uncertainty affects FS performance. In the second experiment, we compare the performance of the UAPMFS-based feature score function with the most common feature score functions for FS. Experimental results show the effectiveness of the proposed performance measure on different datasets.
  • Item
    Thumbnail Image
    Fast Rate Generalization Error Bounds: Variations on a Theme
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J (IEEE, 2022)
    A recent line of works, initiated by [1] and [2], has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(\sqrt λ I/n ) where λ is an assumption-dependent coefficient and I is some information-Theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate"of O(1 /n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound by evaluating λ under an appropriate assumption. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the ( η, c)-central condition. Under this condition, we give information-Theoretic bounds on the generalization error and excess risk, with a convergence rate of O (1 /n) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.
  • Item
    Thumbnail Image
    Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data
    Yang, Y ; Khorshidi, H ; Aickelin, U (SCITEPRESS - Science and Technology Publications, 2022)
    In many real-life classification tasks, the issue of imbalanced data is commonly observed. The workings of mainstream machine learning algorithms typically assume the classes amongst underlying datasets are relatively well-balanced. The failure of this assumption can lead to a biased representation of the models’ performance. This has encouraged the incorporation of re-sampling techniques to generate more balanced datasets. However, mainstream re-sampling methods fail to account for the distribution of minority data and the diversity within generated instances. Therefore, in this paper, we propose a data-generation algorithm, Cluster-based Diversity Over-sampling (CDO), to consider minority instance distribution during the process of data generation. Diversity optimisation is utilised to promote diversity within the generated data. We have conducted extensive experiments on synthetic and real-world datasets to evaluate the performance of CDO in comparison with SMOTE-based and diversity-based methods (DADO, DIWO, BL-SMOTE, DB-SMOTE, and MAHAKIL). The experiments show the superiority of CDO.
  • Item
    Thumbnail Image
    Run or Pat: Using Deep Learning to Classify the Species Type and Emotion of Pets
    Sinnott, RO ; Aickelin, U ; Jia, Y ; Sun, PY ; Susanto, R (EEE, 2021-01-01)
    Deep learning has been applied in many contexts. In this paper we present a novel application area: to detect the species type and emotion of pets with focus on a diverse set of dog and cat collections comprising 52 dog and 23 cat species. Building on an extensive collection of labelled images with over 300 images per species type, we explore a range of deep learning models to develop a classifier for species type and their associated emotion. We outline the realization of the technical solution delivered through a mobile application (iPhone/Android) and present results based on feedback based on real world adoption and utilisation by the broader mobile application community.
  • Item
    Thumbnail Image
    Collaborative Human-ML Decision Making Using Experts' Privileged Information under Uncertainty
    Maadi, M ; Khorshidi, HA ; Aickelin, U ( 2021-01-01)
    Machine Learning (ML) models have been widely applied for clinical decision making. However, in this critical decision making field, human decision making is still prevalent, because clinical experts are more skilled to work with unstructured data specially to deal with uncommon situations. In this paper, we use clinical experts' privileged information as an information source for clinical decision making besides information provided by ML models and introduce a collaborative human-ML decision making model. In the proposed model, two groups of decision makers including ML models and clinical experts collaborate to make a consensus decision. As decision making always comes with uncertainty, we present an interval modelling to capture uncertainty in the proposed collaborative model. For this purpose, clinical experts are asked to give their opinion as intervals, and we generate prediction intervals as the outputs of ML models. Using Interval Agreement Approach (IAA), as an aggregation function in our proposed collaborative model, pave the way to minimize loss of information through aggregating intervals to a fuzzy set. The proposed model not only can improve the accuracy and reliability of decision making, but also can be more interpretable especially when it comes to critical decisions. Experimental results on synthetic data shows the power of the proposed collaborative decision making model in some scenarios.
  • Item
    Thumbnail Image
    A Robust Mathematical Model for Blood Supply Chain Network using Game Theory
    Valizadeh, J ; Aickelin, U ; Khorshidi, HA (IEEE, 2021-12)
    No alternative to human blood has been found so far, and the only source is blood donation by donors. This study presents a blood supply chain optimization model focusing on the location and inventory management of different centers. The main purpose of this model is to reduce total costs, including hospital construction costs, patient allocation costs, patient service costs, expected time-out fines, non-absorbed blood fines, and outsourcing process costs. We then calculate the cost savings of collaborating in each hospital coalition to calculate the fair allocation of cost savings across hospitals. The proposed model is developed based on the data for the city of Tehran and previous studies in the field of the blood supply chain as well as using four Cooperative Game Theory (CGT) methods such as Shapley value, τ- Value, core-center and least core, to reduce the total cost and the fair profit sharing between hospitals have been evaluated.