Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 46
  • Item
  • Item
    Thumbnail Image
    A fast hybrid short read fragment assembly algorithm
    Schmidt, B ; Sinha, R ; Beresford-Smith, B ; Puglisi, SJ (OXFORD UNIV PRESS, 2009-09-01)
    The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.
  • Item
    Thumbnail Image
    Comparative analysis of long DNA sequences by per element information content using different contexts
    Dix, TI ; Powell, DR ; Allison, L ; Bernal, J ; Jaeger, S ; Stern, L (BMC, 2007)
    BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.
  • Item
    Thumbnail Image
    Allocation strategies for utilization of space-shared resources in Bag of Tasks grids
    De Rose, CAF ; Ferreto, T ; Calheiros, RN ; Cirne, W ; Costa, LB ; Fireman, D (ELSEVIER, 2008-05)
  • Item
    Thumbnail Image
    Shuffle-Sum: Coercion-Resistant Verifiable Tallying for STV Voting
    Benaloh, J ; Moran, T ; Naish, L ; Ramchen, K ; Teague, V (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2009-12)
  • Item
    Thumbnail Image
    An empirical study of the effects of NLP components on Geographic IR performance
    Stokes, N ; Li, Y ; Moffat, A ; Rong, J (TAYLOR & FRANCIS LTD, 2008)
  • Item
    Thumbnail Image
    The V*Diagram: A query-dependent approach to moving KNN queries
    Nutanong, S ; Zhang, R ; Taniny, E ; Kulik, L (Association for Computing Machinery (ACM), 2008-01-01)
    The moving k nearest neighbor (M k NN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper presents an incremental safe-region-based technique for answering M k NN queries, called the V*-Diagram. In general, a safe region is a set of points where the query point can move without changing the query answer. Traditional safe-region approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the current knowledge of the query point and the search space in addition to the data objects. As a result, the V*-Diagram has much smaller IO and computation costs than existing methods. The experimental results show that the V*-Diagram outperforms the best existing technique by two orders of magnitude.
  • Item
    Thumbnail Image
    Managing Outsourcing: The Life Cycle Imperative
    CULLEN, SK ; SEDDON, PB ; WILLCOCKS, L ( 2005)
  • Item
    Thumbnail Image
    Decoding prefix codes
    Liddell, M ; Moffat, A (WILEY-BLACKWELL, 2006-12)
  • Item