Minerva Elements Records

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 59
  • Item
  • Item
    Thumbnail Image
    A fast hybrid short read fragment assembly algorithm
    Schmidt, B ; Sinha, R ; Beresford-Smith, B ; Puglisi, SJ (OXFORD UNIV PRESS, 2009-09-01)
    The shorter and vastly more numerous reads produced by second-generation sequencing technologies require new tools that can assemble massive numbers of reads in reasonable time. Existing short-read assembly tools can be classified into two categories: greedy extension-based and graph-based. While the graph-based approaches are generally superior in terms of assembly quality, the computer resources required for building and storing a huge graph are very high. In this article, we present Taipan, an assembly algorithm which can be viewed as a hybrid of these two approaches. Taipan uses greedy extensions for contig construction but at each step realizes enough of the corresponding read graph to make better decisions as to how assembly should continue. We show that this approach can achieve an assembly quality at least as good as the graph-based approaches used in the popular Edena and Velvet assembly tools using a moderate amount of computing resources.
  • Item
    Thumbnail Image
    Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages
    Li, J ; Halgamuge, SK ; Kells, CI ; Tang, SL (BMC, 2007)
    BACKGROUND: Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS: We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION: The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
  • Item
    Thumbnail Image
    Comparative analysis of long DNA sequences by per element information content using different contexts
    Dix, TI ; Powell, DR ; Allison, L ; Bernal, J ; Jaeger, S ; Stern, L (BMC, 2007)
    BACKGROUND: Features of a DNA sequence can be found by compressing the sequence under a suitable model; good compression implies low information content. Good DNA compression models consider repetition, differences between repeats, and base distributions. From a linear DNA sequence, a compression model can produce a linear information sequence. Linear space complexity is important when exploring long DNA sequences of the order of millions of bases. Compressing a sequence in isolation will include information on self-repetition. Whereas compressing a sequence Y in the context of another X can find what new information X gives about Y. This paper presents a methodology for performing comparative analysis to find features exposed by such models. RESULTS: We apply such a model to find features across chromosomes of Cyanidioschyzon merolae. We present a tool that provides useful linear transformations to investigate and save new sequences. Various examples illustrate the methodology, finding features for sequences alone and in different contexts. We also show how to highlight all sets of self-repetition features, in this case within Plasmodium falciparum chromosome 2. CONCLUSION: The methodology finds features that are significant and that biologists confirm. The exploration of long information sequences in linear time and space is fast and the saved results are self documenting.
  • Item
    Thumbnail Image
    Allocation strategies for utilization of space-shared resources in Bag of Tasks grids
    De Rose, CAF ; Ferreto, T ; Calheiros, RN ; Cirne, W ; Costa, LB ; Fireman, D (ELSEVIER, 2008-05)
  • Item
    Thumbnail Image
    Shuffle-Sum: Coercion-Resistant Verifiable Tallying for STV Voting
    Benaloh, J ; Moran, T ; Naish, L ; Ramchen, K ; Teague, V (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2009-12)
  • Item
    Thumbnail Image
    An empirical study of the effects of NLP components on Geographic IR performance
    Stokes, N ; Li, Y ; Moffat, A ; Rong, J (TAYLOR & FRANCIS LTD, 2008)
  • Item
    Thumbnail Image
    Spatially enabling governments through SDI implementation
    Masser, I ; Rajabifard, A ; Williamson, I (TAYLOR & FRANCIS LTD, 2008)
    Spatially enabled government requires the development of effective SDIs that will support the vast majority of society, who are not spatially aware, in a transparent manner. This paper addresses three strategic challenges arising out of the need of creating this new environment. The first of these is the challenge for more inclusive models of governance given that SDI formulation and implementation involve a very large number of stakeholders from all levels of government as well as the private sector and academia. The second concerns the promotion of data sharing between different kinds of organisation. In some cases this may require new forms of organisation to carry out these tasks. The third challenge relates to the establishment of enabling platforms to facilitate access to spatial data and the delivery of data related services.
  • Item
    Thumbnail Image
    The V*Diagram: A query-dependent approach to moving KNN queries
    Nutanong, S ; Zhang, R ; Taniny, E ; Kulik, L (Association for Computing Machinery (ACM), 2008-01-01)
    The moving k nearest neighbor (M k NN) query finds the k nearest neighbors of a moving query point continuously. The high potential of reducing the query processing cost as well as the large spectrum of associated applications have attracted considerable attention to this query type from the database community. This paper presents an incremental safe-region-based technique for answering M k NN queries, called the V*-Diagram. In general, a safe region is a set of points where the query point can move without changing the query answer. Traditional safe-region approaches compute a safe region based on the data objects but independent of the query location. Our approach exploits the current knowledge of the query point and the search space in addition to the data objects. As a result, the V*-Diagram has much smaller IO and computation costs than existing methods. The experimental results show that the V*-Diagram outperforms the best existing technique by two orders of magnitude.
  • Item
    Thumbnail Image
    Managing Outsourcing: The Life Cycle Imperative
    CULLEN, SK ; SEDDON, PB ; WILLCOCKS, L ( 2005)