Engineering and Information Technology Collected Works - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 134
  • Item
    Thumbnail Image
    A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems
    Khorshidi, HA ; Aickelin, U ( 2020-11-08)
    Class imbalance is a substantial challenge in classifying many real-world cases. Synthetic over-sampling methods have been effective to improve the performance of classifiers for imbalance problems. However, most synthetic oversampling methods generate non-diverse synthetic instances within the convex hull formed by the existing minority instances as they only concentrate on the minority class and ignore the vast information provided by the majority class. They also often do not perform well for extremely imbalanced data as the fewer the minority instances, the less information to generate synthetic instances. Moreover, existing methods that generate synthetic instances using the majority class distributional information cannot perform effectively when the majority class has a multimodal distribution. We propose a new method to generate diverse and adaptable synthetic instances using Synthetic Over-sampling with Minority and Majority classes (SOMM). SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes. Thus, SOMM performs well for both binary and multiclass imbalance problems. We examine the performance of SOMM for binary and multiclass problems using benchmark data sets for different imbalance levels. The empirical results show the superiority of SOMM compared to other existing methods.
  • Item
    No Preview Available
    Preface
    Siuly, S ; Huang, Z ; Aickelin, U ; Zhou, R ; Wang, H ; Zhang, Y ; Klimenko, SV ( 2017-01-01)
  • Item
    Thumbnail Image
    Towards the development of a simulator for investigating the impact of people management practices on retail performance
    Siebers, PO ; Aickelin, U ; Celia, H ; Clegg, CW ; JE Taylor, S (Palgrave Macmillan, 2014)
    Models to understand the impact of management practices on retail performance are often simplistic and assume low levels of noise and linearity. Of course, in real life, retail operations are dynamic, nonlinear and complex. To overcome these limitations, we investigate discrete-event and agent-based modeling and simulation approaches. The joint application of both approaches allows us to develop simulation models that are heterogeneous and more life-like, though poses a new research question: When developing such simulation models one still has to abstract from the real world, however, ideally in such a way that the ‘essence’ of the system is still captured. The question is how much detail is needed to capture this essence, as simulation models can be developed at different levels of abstraction. In the literature the appropriate level of abstraction for a particular case study is often more of an art than a science. In this paper, we aim to study this question more systematically by using a retail branch simulation model to investigate which level of model accuracy obtains meaningful results for practitioners. Our results show the effects of adding different levels of detail and we conclude that this type of study is very valuable to gain insight into what is really important in a model.
  • Item
    Thumbnail Image
    A Method for Evaluating Options for Motif Detection in Electricity Meter Data
    Dent, I ; Craig, T ; Aickelin, U ; Rodden, T (School of Statistics, Renmin University of China, 2018)
    Investigation of household electricity usage patterns, and matching the patterns to behaviours, is an important area of research given the centrality of such patterns in addressing the needs of the electricity industry. Additional knowledge of household behaviours will allow more effective targeting of demand side management (DSM) techniques. This paper addresses the question as to whether a reasonable number of meaningful motifs, that each represent a regular activity within a domestic household, can be identified solely using the household level electricity meter data. Using UK data collected from several hundred households in Spring 2011 monitored at a frequency of five minutes, a process for finding repeating short patterns (motifs) is defined. Different ways of representing the motifs exist and a qualitative approach is presented that allows for choosing between the options based on the number of regular behaviours detected (neither too few nor too many).
  • Item
    Thumbnail Image
    A hybrid pricing and cutting approach for the multi-shift full truckload vehicle routing problem
    Xue, N ; Bai, R ; Qu, R ; Aickelin, U (Elsevier, 2020-01-01)
    Full truckload transportation (FTL) in the form of freight containers represents one of the most important transportation modes in international trade. Due to large volume and scale, in FTL, delivery time is often less critical but cost and service quality are crucial. Therefore, efficiently solving large scale multiple shift FTL problems is becoming more and more important and requires further research. In one of our earlier studies, a set covering model and a three-stage solution method were developed for a multi-shift FTL problem. This paper extends the previous work and presents a significantly more efficient approach by hybridising pricing and cutting strategies with metaheuristics (a variable neighbourhood search and a genetic algorithm). The metaheuristics were adopted to find promising columns (vehicle routes) guided by pricing and cuts are dynamically generated to eliminate infeasible flow assignments caused by incompatible commodities. Computational experiments on real-life and artificial benchmark FTL problems showed superior performance both in terms of computational time and solution quality, when compared with previous MIP based three-stage methods and two existing metaheuristics. The proposed cutting and heuristic pricing approach can efficiently solve large scale real-life FTL problems.
  • Item
    Thumbnail Image
    Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming
    Liu, J ; Bai, R ; Lu, Z ; Ge, P ; Aickelin, U ; Liu, D (IEEE, 2020-07-01)
    In medical fields, text classification is one of the most important tasks that can significantly reduce human work-load through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfaction. Given a seed population of regular expressions (randomly initialized or manually constructed by experts), our method evolves a population of regular expressions, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for the drug Pravastatin
    Liu, Y ; Aickelin, U (IEEE, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. ADRs are one of most common causes to withdraw some drugs from market. Prescription event monitoring (PEM) is an important approach to detect the adverse drug reactions. The main problem to deal with this method is how to automatically extract the medical events or side effects from high-throughput medical data, which are collected from day to day clinical practice. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Pravastatin. Major side effects for the drug are detected. The detected ADRs are based on computerized method, further investigation is needed.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for drug Alendronate
    Liu, Y ; Aickelin, U (IEEE Control Chapter, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Alendronate. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for drug Simvastatin
    Liu, Y ; Aickelin, U (IEEE, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Simvastatin. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.
  • Item
    Thumbnail Image
    Detect adverse drug reactions for drug Pioglitazone
    Liu, Y ; Aickelin, U ; Baozong, Y ; Qiuqi, R ; Xiaofang, T (IEEE, 2012)
    Adverse drug reaction (ADR) is widely concerned for public health issue. In this study we propose an original approach to detect the ADRs using feature matrix and feature selection. The experiments are carried out on the drug Pioglitazone. Major side effects for the drug are detected and better performance is achieved compared to other computerized methods. The detected ADRs are based on the computerized method, further investigation is needed.