Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Towards high-dimensional classification using Michigan style generic based machine learning
    ABEDINI, MANI ( 2013)
    High-dimensional classification problems arise frequently in many areas of science and engineering. Examples include: disease classification based on gene expression profiles, document classification, image recognition and fraud detection. Machine learning algorithms are widely used to tackle such problems. Typically, statistical techniques or artificial intelligence methods are used to build models that can learn from the data and subsequently new observations can be classified as belonging to a particular category of the data set. Genetic-based machine learning (GBML), which combines evolutionary algorithms and reinforcement learning, has been successful applied in a wide range of complex problem solving and classification tasks. The eXtended Classifier System (XCS), a well-known GMBL technique, evolves a population of classifiers in the form of condition-action rules that can be used for classification problems. A key step in XCS’s iterative learning process is the rule discovery component which creates new classifiers to be added to the bounded population. Unfortunately, population-based approaches are inherently slow when faced with large-scale problems. This may in part be attributed to the additional cost associated with maintaining relatively large populations of classifiers, often encapsulating multiple features. Consequently, few studies have examined the application of XCS in high-dimensional cases. The over-arching aim of this thesis is to develop new GBML classification models, based on an underlying XCS architecture, for high-dimensional classification problems. The objective it to improve the performance measured in terms of accuracy, population diversity and execution costs. To do this, three alternative approaches have been proposed and evaluated: In the first approach, we use “feature quality” to guide the XCS rule discovery process. Here, a pre-processing phase is used to extract feature quality information. Subsequently, the feature quality information is used to bias the evolutionary operators via a “gene mask”. Detailed numerical simulation experiments encapsulating a number of alternative feature ranking methods, parameter setting and benchmark data sets suggest that our proposed model can improve the classification performance of data sets with large numbers of features. In the second approach, a hybrid multi-population ensemble classifier inspired by co-evolutionary algorithms and ensemble learning is used to improve the baseline-XCS performance. In this model, separate sub-populations are evolved in parallel using a binary gene mask to guide the evolutionary operators. Sub-population outputs are weighted in a typical ensemble style to produce an output. As is typical in parallel evolutionary models of this form, we investigate alternative “migration” methods between sub-populations as well as specific mechanisms to adapt the gene mask within a sub-population. Numerical simulation results have shown that the multi-population model has superior performance to the single population model. Significantly, by combining feature quality information and the hybrid co-evolutionary XCS, superior performance can be achieved when measured across the benchmark data sets. The focus of the third approach is different from the previous approaches, in that emphasis is placed more on investigating techniques to reduce execution time of the proposed GMBL models for high-dimensional complex classification tasks using alternative parallel and distributed architectures. We start by comparing deployment techniques and the mapping of the multi-population GBML models using parallel or distributed programming libraries, before listing advantages and disadvantages of each approach. This section concludes with a preliminary study investigating the use of cloud computing infrastructures to run our proposed model. Large-scale classification problems pose many challenges for traditional machine learning algorithms. In this thesis, we have proposed novel extensions to XCS to meet such challenges. Our contributions are based on the development of informed rule discovering mechanisms that employ gene masks, co-evolving sub-populations and parallel and distributed architectures. The experimental results have demonstrated that the proposed models have better performance in terms of accuracy, execution costs and promoting population diversity across the data sets considered – benchmark low dimensional data sets; synthetic data sets and high-dimensional gene expression profile data sets. There are further opportunities to extend the proposed models in interesting ways, including: examining feature interactions more closely and exploring the effectiveness of alternative evolutionary and statistical models.
  • Item
    Thumbnail Image
    Modelling knowledge for scientific collaboration on the semantic web
    Annamalai, Muthukkaruppan ( 2006)
    This thesis analyses the modelling of knowledge in relation to knowledge sharing in scientific collaboration on the semantic web. It outlines both general and specific conceptual frameworks, as well as hypotheses for the description of the knowledge models. The key motivation form this work is drawn from the vision of e-science research agenda to support effective communication and to enable automated task achievement in scientific research based on accessible data and information on the web. A scientific collaboration is a widely distributed networking scientific community. The internet has become the major vehicle for the distributed groups in scientific collaborations to share knowledge related to their research. The next generation web, called the semantic web promises to dispense with some of the human effort to both conserve and operationalise part of the shared knowledge. A fundamental requirement to move towards a semantic-aware web environment necessitates formal and explicit terminologies based on knowledge models. A knowledge model is an extendable conceptual framework of knowledge. We have proposed two types of knowledge models for scientific collaboration, namely domain and task knowledge models. Hence, the key objective of this research work is to characterise the task and domain knowledge models of a scientific collaboration with particular intention of increasing the confidence of knowledge sharing and reuse on the semantic web. We claim that a shared terminology must agree with the purpose and ultimate use of the model. Consequently, we advocate the modelling of the directively and paradigmatically share knowledge in a scientific collaboration. The directively share knowledge is identified with the ratified and widely disseminated general domain knowledge motivated by general informational needs in a domain. The paradigmatically shared knowledge alludes to the sharing of purposive domain knowledge motivated by the needs of common tasks of researchers in a scientific collaboration. We highlight that both content and context are important for sharing of purposive domain knowledge in a scientific collaboration. We have adapted some previous works on the representation of mathematical expressions to arrive at a general framework for representing mathematical relations involving concepts in a domain knowledge model. Similarly, we have adapted and extended an existing set of evaluation criteria to formatively evaluate the knowledge models being built. The concrete implications of the work reported in this thesis are applied to model knowledge within the domain of Experimental High Energy Physics (EHEP), specifically the Belle scientific collaboration. We sum up the key contributions of this thesis as follows: • Analysis of the requirements in developing knowledge models for sharing of knowledge in a scientific collaboration on the sematic web. • A task model to make explicit the task functionality, which includes the task input-output information that can be relied upon as the context for modelling of purposive knowledge of a domain. • A method to identify, analyse, conceptualise and model general and purposive knowledge of a domain. • An approach to introduce explicit mathematical notation into a web-portal knowledge model. • A detailed study of the knowledge classification specific to the EHEP domain. • An adopted and extended criteria based scheme to formatively evaluate the knowledge models being built.