Engineering and Information Technology Collected Works - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 173
  • Item
    Thumbnail Image
    Capturing prediction uncertainty in upstream cell culture models using conformal prediction and Gaussian processes
    Pham, TD ; Aickelin, U ; Bassett, R ; Papadopoulos, H ; Nguyen, KA ; Boström, H ; Carlsson, L (ML Research Press, 2023)
    This extended abstract compares the efficacy of Gaussian process and conformal XGBoost regressions in capturing prediction uncertainty in simulated and industrial cell culture data.
  • Item
    Thumbnail Image
    An Uncertainty-Accuracy-Based Score Function for Wrapper Methods in Feature Selection
    Maadi, M ; Khorshidi, HA ; Aickelin, U (IEEE, 2023-08-13)
    Feature Selection (FS) is an effective preprocessing method to deal with the curse of dimensionality in machine learning. Redundant features in datasets decrease the classification performance and increase the computational complexity. Wrapper methods are an important category of FS methods that evaluate various feature subsets and select the best one using performance measures related to a classifier. In these methods, the accuracy of classifiers is the most common performance measure for FS. Although the performance of classifiers depends on their uncertainty, this important criterion is neglected in these methods. In this paper, we present a new performance measure called Uncertainty-Accuracy-based Performance Measure for Feature Selection (UAPMFS) that uses an ensemble approach to measure both the accuracy and uncertainty of classifiers. UAPMFS uses bagging and uncertainty confusion matrix. This performance measure can be used in all wrapper methods to improve FS performance. We design two experiments to evaluate the performance of UAPMFS in wrapper methods. In experiments, we use the leave-one-variable-out strategy as the common strategy in wrapper methods to evaluate features. We also define a feature score function based on UAPMFS to rank and select features. In the first experiment, we investigate the importance of considering uncertainty in the FS process and show how neglecting uncertainty affects FS performance. In the second experiment, we compare the performance of the UAPMFS-based feature score function with the most common feature score functions for FS. Experimental results show the effectiveness of the proposed performance measure on different datasets.
  • Item
    Thumbnail Image
    Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data
    Yang, Y ; Khorshidi, H ; Aickelin, U (SCITEPRESS - Science and Technology Publications, 2022)
    In many real-life classification tasks, the issue of imbalanced data is commonly observed. The workings of mainstream machine learning algorithms typically assume the classes amongst underlying datasets are relatively well-balanced. The failure of this assumption can lead to a biased representation of the models’ performance. This has encouraged the incorporation of re-sampling techniques to generate more balanced datasets. However, mainstream re-sampling methods fail to account for the distribution of minority data and the diversity within generated instances. Therefore, in this paper, we propose a data-generation algorithm, Cluster-based Diversity Over-sampling (CDO), to consider minority instance distribution during the process of data generation. Diversity optimisation is utilised to promote diversity within the generated data. We have conducted extensive experiments on synthetic and real-world datasets to evaluate the performance of CDO in comparison with SMOTE-based and diversity-based methods (DADO, DIWO, BL-SMOTE, DB-SMOTE, and MAHAKIL). The experiments show the superiority of CDO.
  • Item
    Thumbnail Image
    Collaborative Human-ML Decision Making Using Experts' Privileged Information under Uncertainty
    Maadi, M ; Khorshidi, HA ; Aickelin, U ( 2021-01-01)
    Machine Learning (ML) models have been widely applied for clinical decision making. However, in this critical decision making field, human decision making is still prevalent, because clinical experts are more skilled to work with unstructured data specially to deal with uncommon situations. In this paper, we use clinical experts' privileged information as an information source for clinical decision making besides information provided by ML models and introduce a collaborative human-ML decision making model. In the proposed model, two groups of decision makers including ML models and clinical experts collaborate to make a consensus decision. As decision making always comes with uncertainty, we present an interval modelling to capture uncertainty in the proposed collaborative model. For this purpose, clinical experts are asked to give their opinion as intervals, and we generate prediction intervals as the outputs of ML models. Using Interval Agreement Approach (IAA), as an aggregation function in our proposed collaborative model, pave the way to minimize loss of information through aggregating intervals to a fuzzy set. The proposed model not only can improve the accuracy and reliability of decision making, but also can be more interpretable especially when it comes to critical decisions. Experimental results on synthetic data shows the power of the proposed collaborative decision making model in some scenarios.
  • Item
    Thumbnail Image
    A Robust Mathematical Model for Blood Supply Chain Network using Game Theory
    Valizadeh, J ; Aickelin, U ; Khorshidi, HA (IEEE, 2021-12)
    No alternative to human blood has been found so far, and the only source is blood donation by donors. This study presents a blood supply chain optimization model focusing on the location and inventory management of different centers. The main purpose of this model is to reduce total costs, including hospital construction costs, patient allocation costs, patient service costs, expected time-out fines, non-absorbed blood fines, and outsourcing process costs. We then calculate the cost savings of collaborating in each hospital coalition to calculate the fair allocation of cost savings across hospitals. The proposed model is developed based on the data for the city of Tehran and previous studies in the field of the blood supply chain as well as using four Cooperative Game Theory (CGT) methods such as Shapley value, τ- Value, core-center and least core, to reduce the total cost and the fair profit sharing between hospitals have been evaluated.
  • Item
    Thumbnail Image
    A survey on Optimisation-based Semi-supervised Clustering Methods
    Ghasemi, Z ; Khorshidi, HA ; Aickelin, U (IEEE, 2021-12)
    Clustering methods are developed for categorizing data points into different groups so that data points within each group have high similarities. Classic clustering algorithms are unsupervised, meaning that there is not any kind of complementary information to be utilized for attaining better clustering results. However, in some clustering problems, one may have supplementary information which can be employed for guiding the clustering process. In the presence of such information, the problem is semi-supervised clustering. In some articles, the problem of semi-supervised clustering is modeled as an optimization problem. In this research, optimization-based semi-supervised clustering papers from 2013 to 2020 are reviewed. This review is conducted based on a four-step procedure. It is attempted to explore objective functions and optimization algorithms used in these articles, as well as application domain and types of supervised information.
  • Item
    Thumbnail Image
    Online Transfer Learning: Negative Transfer and Effect of Prior Knowledge
    Wu, X ; Manton, JH ; Aickelin, U ; Zhu, J (IEEE, 2021-07-12)
    Transfer learning is a machine learning paradigm where the knowledge from one task is utilized to resolve the problem in a related task. On the one hand, it is conceivable that knowledge from one task could be useful for solving a related problem. On the other hand, it is also recognized that if not executed properly, transfer learning algorithms could in fact impair the learning performance instead of improving it - commonly known as negative transfer. In this paper, we study the online transfer learning problems where the source samples are given in an off-line way while the target samples arrive sequentially. We define the expected regret of the online transfer learning problem, and provide upper bounds on the regret using information-theoretic quantities. We also obtain exact expressions for the bounds when the sample size becomes large. Examples show that the derived bounds are accurate even for small sample sizes. Furthermore, the obtained bounds give valuable insight on the effect of prior knowledge for transfer learning in our formulation. In particular, we formally characterize the conditions under which negative transfer occurs.
  • Item
    Thumbnail Image
    Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming
    Liu, J ; Bai, R ; Lu, Z ; Ge, P ; Aickelin, U ; Liu, D (IEEE, 2020-07-01)
    In medical fields, text classification is one of the most important tasks that can significantly reduce human work-load through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for the drug Pravastatin
    Liu, Y ; Aickelin, U (IEEE, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM) is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical data, which are collected from day to day clinical practice. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Pravastatin. Major side effects for the drug are detected. The detected ADRs are based on computerized method, further investigation is needed.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for drug Alendronate
    Liu, Y ; Aickelin, U (IEEE Control Chapter, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Alendronate. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.