Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 8 of 8
  • Item
    Thumbnail Image
    A voting approach to identify a small number of highly predictive genes using multiple classifiers
    Hassan, MR ; Hossain, MM ; Bailey, J ; Macintyre, G ; Ho, JWK ; Ramamohanarao, K (BMC, 2009-01-30)
    BACKGROUND: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. RESULTS: By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. CONCLUSION: We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
  • Item
    Thumbnail Image
    Protecting SIP Server from CPU-Based DoS Attacks using History-Based IP Filtering
    Zhou, CV ; Leckie, C ; Ramamohanarao, K (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2009-10)
  • Item
    Thumbnail Image
    Trust-based robust scheduling and runtime adaptation of scientific workflow
    Wang, M ; Ramamohanarao, K ; Chen, J (WILEY, 2009-11)
  • Item
    Thumbnail Image
    Building more robust multi-agent systems using a log-based approach
    Unruh, A ; Bailey, J ; Ramamohanarao, K (IOS Press, 2009-03-23)
  • Item
    Thumbnail Image
    Selective Sampling for Approximate Clustering of Very Large Data Sets
    WANG, L. ; BEZDEK, J. ; LECKIE, C. ; KOTAGIRI, R. ( 2008)
  • Item
    Thumbnail Image
    Automatically Determining the Number of Clusters in Unlabeled Data Sets
    Wang, L ; Leckie, C ; Ramamohanarao, K ; Bezdek, J (Institute of Electrical and Electronics Engineers, 2009-03-01)
    One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper, we investigate a new method called Dark Block Extraction (DBE) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set, using several common image and signal processing techniques. Its basic steps include 1) generating a VAT image of an input dissimilarity matrix, 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering, 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal, and 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our DBE method is nearly “automatic,” depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.
  • Item
    Thumbnail Image
    An Analysis of Latent Semantic Term Self-Correlation
    Park, LAF ; Ramamohanarao, K (ASSOC COMPUTING MACHINERY, 2009)
    Latent semantic analysis (LSA) is a generalized vector space method that uses dimension reduction to generate term correlations for use during the information retrieval process. We hypothesized that even though the dimension reduction establishes correlations between terms, the dimension reduction is causing a degradation in the correlation of a term to itself (self-correlation). In this article, we have proven that there is a direct relationship to the size of the LSA dimension reduction and the LSA self-correlation. We have also shown that by altering the LSA term self-correlations we gain a substantial increase in precision, while also reducing the computation required during the information retrieval process.
  • Item
    Thumbnail Image
    Efficient storage and retrieval of probabilistic latent semantic information for information retrieval
    Park, LAF ; Ramamohanarao, K (SPRINGER, 2009-01-01)
    Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of the storage space of PLSI.