Melbourne School of Population and Global Health - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/247

Search Results

Now showing 1 - 10 of 19

Editorial: Special issue on operations research and machine learning

Khorshidi, HA ; Soltanolkottabi, M ; Allmendinger, R ; Aickelin, U (Taylor and Francis Group, 2024)

Many machine learning techniques work through optimizing specific objective functions. Supervised learning techniques are to minimize the prediction error such as mean square error (MSE) and misclassification rate, or maximize the conditional likelihood, posterior probability, etc. Unsupervised learning techniques usually group instances into clusters in a way that instances within each group are optimally similar while they are distant from instances in other groups. In reinforcement learning, the goal of an agent is to maximize its cumulative reward. However, there is still room to exploit optimization and operations research (OR) in machine learning, and vice versa.
An Uncertainty-Accuracy-Based Score Function for Wrapper Methods in Feature Selection

Maadi, M ; Khorshidi, HA ; Aickelin, U (Institute of Electrical and Electronics Engineers, 2023)

Feature Selection (FS) is an effective preprocessing method to deal with the curse of dimensionality in machine learning. Redundant features in datasets decrease the classification performance and increase the computational complexity. Wrapper methods are an important category of FS methods that evaluate various feature subsets and select the best one using performance measures related to a classifier. In these methods, the accuracy of classifiers is the most common performance measure for FS. Although the performance of classifiers depends on their uncertainty, this important criterion is neglected in these methods. In this paper, we present a new performance measure called Uncertainty-Accuracy-based Performance Measure for Feature Selection (UAPMFS) that uses an ensemble approach to measure both the accuracy and uncertainty of classifiers. UAPMFS uses bagging and uncertainty confusion matrix. This performance measure can be used in all wrapper methods to improve FS performance. We design two experiments to evaluate the performance of UAPMFS in wrapper methods. In experiments, we use the leave-one-variable-out strategy as the common strategy in wrapper methods to evaluate features. We also define a feature score function based on UAPMFS to rank and select features. In the first experiment, we investigate the importance of considering uncertainty in the FS process and show how neglecting uncertainty affects FS performance. In the second experiment, we compare the performance of the UAPMFS-based feature score function with the most common feature score functions for FS. Experimental results show the effectiveness of the proposed performance measure on different datasets.
A Diversity-Based Synthetic Oversampling Using Clustering for Handling Extreme Imbalance

Yang, Y ; Akbarzadeh Khorshidi, H ; Aickelin, U (Springer, 2023)

Imbalanced data are typically observed in many real-life classification problems. However, mainstream machine learning algorithms are mostly designed with the underlying assumption of a relatively well-balanced distribution of classes. The mismatch between reality and algorithm assumption results in a deterioration of classification performance. One form of approach to address this problem is through re-sampling methods, although its effectiveness is limited; most re-sampling methods fail to consider the distribution of minority and majority instances and the diversity within synthetically generated data. Diversity becomes increasingly important when minority data becomes more sparse, as each data point becomes more valuable. They should all be considered during the generation process instead of being regarded as noise. In this paper, we propose a cluster-based diversity re-sampling method, combined with NOAH algorithm. Neighbourhood-based Clustering Diversity Over-sampling (NBCDO) is introduced with the aim to complement our previous cluster-based diversity algorithm Density-based Clustering Diversity Over-sampling (DBCDO). It first uses a neighbourhood-based clustering algorithm to consider the distribution of both minority and majority class instances, before applying NOAH algorithm to encourage diversity optimisation during the generation of synthetic instances. We demonstrate the implementation of both cluster-based diversity methods by conducting experiments over 10 real-life datasets with ≤ 5% imbalance ratio and show that our proposed cluster-based diversity algorithm (NBCDO, DBCDO) brings performance improvements over its comparable methods (DB-SMOTE, MAHAKIL, KMEANS-SMOTE, MC-SMOTE).
Cloud model-based best-worst method for group decision making under uncertainty

Minaei, B ; Akbarzadeh Khorshidi, H ; Aickelin, U ; Geramian, A (Taylor and Francis Group, 2023)

This study aims to enhance computational and analytical aspects of multi-criteria group decision-making (MCGDM) under uncertainty. For this, we use the best-worst method (BWM) and cloud models to develop a more reliable MCGDM algorithm including three stages: first, collecting data through the BWM reference pairwise comparison; second, extracting interval-weights using the BWM bi-level optimisation models and aggregating different opinions via cloud models; and third, using the technique for order of preference by similarity to ideal solution (TOPSIS) to prioritise alternatives. We have also investigated the effectiveness of the proposed approach in a real-life problem of online learning platform selection within the context of the COVID-19 pandemic lockdown. The experiment results demonstrate the superiority of the proposed method over the Bayesian BWM in terms of computational time by 96%. Moreover, the proposed approach outperforms BWM and Bayesian BWM techniques by 33% and 25%, respectively, in terms of conformity to the decision-makers’ intuitive judgments. Our findings also bring important practical implications. Application of the proposed method led to robustness against the number of decision-makers and significantly increased time efficiency in group decision-making. Besides, the computations with the lower inconsistency enhanced the effectiveness of prioritisation in group decision-making.
A Lightweight Window Portion-Based Multiple Imputation for Extreme Missing Gaps in IoT Systems

Adhikari, D ; Jiang, W ; Zhan, J ; Assefa, M ; Khorshidi, HA ; Aickelin, U ; Rawat, DB (Institute of Electrical and Electronics Engineers (IEEE), 2023)

Intelligent techniques, including artificial intelligence and deep learning, normally perform on complete data without missing data. Multiple imputation is indispensable for addressing missing data resulting in unbiased estimates and dealing with uncertainty by providing more valid results. Most state-of-the-art techniques focus on high missing rates (around 50%-60%) and short missing gaps, while imputation for extreme missing gaps and missing rates is an important challenge for multivariate time-series data generated through the Internet of Things (IoT). Hence, we propose a Lightweight Window Portion-based Multiple Imputation (LWPMI) based on multivariate variables, correlation, data fusion, regression, and multiple imputations. We conduct extensive experiments by generating extreme missing gaps and high missing rates ranging from 10% to 90% on data generated by sensors. We also investigate different sets of feature to examine how LWPMI works when features have high, weak, or a mixture of high and weak correlation. All the obtained results prove LWPMI outperforms baseline techniques in preserving pattern, structure, and trend in both 90% extreme missing gap and missing rates.
Guest Editorial Special Issue on Multiobjective Evolutionary Optimization in Machine Learning

Aickelin, U ; Khorshidi, HA ; Qu, R ; Charkhgard, H (Institute of Electrical and Electronics Engineers (IEEE), 2023-08)

We are very pleased to introduce this special issue on multiobjective evolutionary optimization for machine learning (MOML). Optimization is at the heart of many machine-learning techniques. However, there is still room to exploit optimization in machine learning. Every machine-learning technique has hyperparameters that can be tuned using evolutionary computation and optimization, considering normally multiple criteria, such as bias, variance, complexity, and fairness in model selection. Multiobjective evolutionary optimization can help meet these criteria for optimizing machine-learning models. Some of the existing approaches address these multiple criteria by transforming the problem into a single-objective optimization problem. However, multiobjective optimization models are able to outperform single-objective ones in contributing to multiple intended objectives (criteria). In recent years, evolutionary computation has been shown to be the premier method for solving multiobjective optimization problems (MOPs), producing both optimal and diverse solutions beyond the capabilities of other heuristics. This is particularly true for very large solution spaces, which is the case in real-world machine-learning problems with many features.
Evaluation of the Early Intervention Physiotherapist Framework for Injured Workers in Victoria, Australia: Data Analysis Follow-Up

Khorshidi, HA ; Aickelin, U ; de Silva, A (MDPI, 2023-08)

PURPOSE: This study evaluates the performance of the Early Intervention Physiotherapist Framework (EIPF) for injured workers. This study provides a proper follow-up period (3 years) to examine the impacts of the EIPF program on injury outcomes such as return to work (RTW) and time to RTW. This study also identifies the factors influencing the outcomes. METHODS: The study was conducted on data collected from compensation claims of people who were injured at work in Victoria, Australia. Injured workers who commenced their compensation claims after the first of January 2010 and had their initial physiotherapy consultation after the first of August 2014 are included. To conduct the comparison, we divided the injured workers into two groups: physiotherapy services provided by EIPF-trained physiotherapists (EP) and regular physiotherapists (RP) over the three-year intervention period. We used three different statistical analysis methods to evaluate the performance of the EIPF program. We used descriptive statistics to compare two groups based on physiotherapy services and injury outcomes. We also completed survival analysis using Kaplan-Meier curves in terms of time to RTW. We developed univariate and multivariate regression models to investigate whether the difference in outcomes was achieved after adjusting for significantly associated variables. RESULTS: The results showed that physiotherapists in the EP group, on average, dealt with more claims (over twice as many) than those in the RP group. Time to RTW for the injured workers treated by the EP group was significantly lower than for those who were treated by the RP group, indicated by descriptive, survival, and regression analyses. Earlier intervention by physiotherapists led to earlier RTW. CONCLUSION: This evaluation showed that the EIPF program achieved successful injury outcomes three years after implementation. Motivating physiotherapists to intervene earlier in the recovery process of injured workers through initial consultation helps to improve injury outcomes.
Multi-objective Semi-supervised Clustering for Finding Predictive Clusters

Ghasemi, Z ; Khorshidi, HA ; Aickelin, U ( 2022-01-26)
A parametric similarity measure for extended picture fuzzy sets and its application in pattern recognition

Farhadinia, B ; Aickelin, U ; Khorshidi, HA (University of Sistan & Baluchestan, 2022-11-01)

This article advances the idea of extended picture fuzzy set (E-PFS), which is especially an augmentation of generalised spherical fuzzy set (GSFS) by releasing the restricted selection of p in the description of GSFSs. Moreover, by the use of triangular conorm term in the description of E-PFS, it indeed widens the scope of E-PFS not only compared to picture fuzzy set (PFS) and spherical fuzzy set (SFS), but also to GSFS. In the sequel, a given fundamental theorem concerning E-PFS depicts its more ability in comparison with the special types to deal with the ambiguity and uncertainty. Further, we propose a parametric E-PFS similarity measure which plays a critical role in information theory. In order for revealing the advantages and authenticity of E-PFS similarity measure, we exhibit its applicability in multiple criteria decision making entitling the recognition of building material, the recognition of patterns, and the selection process of mega project(s) in developing countries. Furthermore, through the experimental studies, we demonstrate that E-PFS is able to handle uncertain information in real-life decision procedures with no extra parameter, and it has a prominent role in decision making whenever the concepts of PFS, SFS and GSFS do not make sense.
Cluster-based Diversity Over-sampling: A Density and Diversity Oriented Synthetic Over-sampling for Imbalanced Data

Yang, Y ; Khorshidi, H ; Aickelin, U (SCITEPRESS - Science and Technology Publications, 2022)

In many real-life classification tasks, the issue of imbalanced data is commonly observed. The workings of mainstream machine learning algorithms typically assume the classes amongst underlying datasets are relatively well-balanced. The failure of this assumption can lead to a biased representation of the models’ performance. This has encouraged the incorporation of re-sampling techniques to generate more balanced datasets. However, mainstream re-sampling methods fail to account for the distribution of minority data and the diversity within generated instances. Therefore, in this paper, we propose a data-generation algorithm, Cluster-based Diversity Over-sampling (CDO), to consider minority instance distribution during the process of data generation. Diversity optimisation is utilised to promote diversity within the generated data. We have conducted extensive experiments on synthetic and real-world datasets to evaluate the performance of CDO in comparison with SMOTE-based and diversity-based methods (DADO, DIWO, BL-SMOTE, DB-SMOTE, and MAHAKIL). The experiments show the superiority of CDO.

Melbourne School of Population and Global Health - Research Publications

Permanent URI for this collection

Filters

Date

Author

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results