- Computing and Information Systems - Research Publications
Computing and Information Systems - Research Publications
Permanent URI for this collection
Search Results
Now showing
1 - 8 of 8
-
ItemA voting approach to identify a small number of highly predictive genes using multiple classifiersHassan, MR ; Hossain, MM ; Bailey, J ; Macintyre, G ; Ho, JWK ; Ramamohanarao, K (BMC, 2009-01-30)BACKGROUND: Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage. RESULTS: By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer. CONCLUSION: We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.
-
ItemProtecting SIP Server from CPU-Based DoS Attacks using History-Based IP FilteringZhou, CV ; Leckie, C ; Ramamohanarao, K (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2009-10)
-
ItemTrust-based robust scheduling and runtime adaptation of scientific workflowWang, M ; Ramamohanarao, K ; Chen, J (WILEY, 2009-11)
-
ItemBuilding more robust multi-agent systems using a log-based approachUnruh, A ; Bailey, J ; Ramamohanarao, K (IOS Press, 2009-03-23)
-
ItemSelective Sampling for Approximate Clustering of Very Large Data SetsWANG, L. ; BEZDEK, J. ; LECKIE, C. ; KOTAGIRI, R. ( 2008)
-
ItemAutomatically Determining the Number of Clusters in Unlabeled Data SetsWang, L ; Leckie, C ; Ramamohanarao, K ; Bezdek, J (Institute of Electrical and Electronics Engineers, 2009-03-01)One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper, we investigate a new method called Dark Block Extraction (DBE) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for Visual Assessment of Cluster Tendency (VAT) of a data set, using several common image and signal processing techniques. Its basic steps include 1) generating a VAT image of an input dissimilarity matrix, 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering, 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal, and 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our DBE method is nearly “automatic,” depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.
-
ItemAn Analysis of Latent Semantic Term Self-CorrelationPark, LAF ; Ramamohanarao, K (ASSOC COMPUTING MACHINERY, 2009)Latent semantic analysis (LSA) is a generalized vector space method that uses dimension reduction to generate term correlations for use during the information retrieval process. We hypothesized that even though the dimension reduction establishes correlations between terms, the dimension reduction is causing a degradation in the correlation of a term to itself (self-correlation). In this article, we have proven that there is a direct relationship to the size of the LSA dimension reduction and the LSA self-correlation. We have also shown that by altering the LSA term self-correlations we gain a substantial increase in precision, while also reducing the computation required during the information retrieval process.
-
ItemEfficient storage and retrieval of probabilistic latent semantic information for information retrievalPark, LAF ; Ramamohanarao, K (SPRINGER, 2009-01-01)Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of the storage space of PLSI.