Computing and Information Systems - Research Publications

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 16
  • Item
    Thumbnail Image
    On the Noncyclic Property of Sylvester Hadamard Matrices
    Tang, X ; Parampalli, U (IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2010-09)
  • Item
    Thumbnail Image
    Reference-Free Validation of Short Read Data
    Schroeder, J ; Bailey, J ; Conway, T ; Zobel, J ; Aramayo, R (PUBLIC LIBRARY SCIENCE, 2010-09-22)
    BACKGROUND: High-throughput DNA sequencing techniques offer the ability to rapidly and cheaply sequence material such as whole genomes. However, the short-read data produced by these techniques can be biased or compromised at several stages in the sequencing process; the sources and properties of some of these biases are not always known. Accurate assessment of bias is required for experimental quality control, genome assembly, and interpretation of coverage results. An additional challenge is that, for new genomes or material from an unidentified source, there may be no reference available against which the reads can be checked. RESULTS: We propose analytical methods for identifying biases in a collection of short reads, without recourse to a reference. These, in conjunction with existing approaches, comprise a methodology that can be used to quantify the quality of a set of reads. Our methods involve use of three different measures: analysis of base calls; analysis of k-mers; and analysis of distributions of k-mers. We apply our methodology to wide range of short read data and show that, surprisingly, strong biases appear to be present. These include gross overrepresentation of some poly-base sequences, per-position biases towards some bases, and apparent preferences for some starting positions over others. CONCLUSIONS: The existence of biases in short read data is known, but they appear to be greater and more diverse than identified in previous literature. Statistical analysis of a set of short reads can help identify issues prior to assembly or resequencing, and should help guide chemical or statistical methods for bias rectification.
  • Item
    Thumbnail Image
    MUSTANG-MR Structural Sieving Server: Applications in Protein Structural Analysis and Crystallography
    Konagurthu, AS ; Reboul, CF ; Schmidberger, JW ; Irving, JA ; Lesk, AM ; Stuckey, PJ ; Whisstock, JC ; Buckle, AM ; Fernandez-Fuentes, N (PUBLIC LIBRARY SCIENCE, 2010-04-06)
    BACKGROUND: A central tenet of structural biology is that related proteins of common function share structural similarity. This has key practical consequences for the derivation and analysis of protein structures, and is exploited by the process of "molecular sieving" whereby a common core is progressively distilled from a comparison of two or more protein structures. This paper reports a novel web server for "sieving" of protein structures, based on the multiple structural alignment program MUSTANG. METHODOLOGY/PRINCIPAL FINDINGS: "Sieved" models are generated from MUSTANG-generated multiple alignment and superpositions by iteratively filtering out noisy residue-residue correspondences, until the resultant correspondences in the models are optimally "superposable" under a threshold of RMSD. This residue-level sieving is also accompanied by iterative elimination of the poorly fitting structures from the input ensemble. Therefore, by varying the thresholds of RMSD and the cardinality of the ensemble, multiple sieved models are generated for a given multiple alignment and superposition from MUSTANG. To aid the identification of structurally conserved regions of functional importance in an ensemble of protein structures, Lesk-Hubbard graphs are generated, plotting the number of residue correspondences in a superposition as a function of its corresponding RMSD. The conserved "core" (or typically active site) shows a linear trend, which becomes exponential as divergent parts of the structure are included into the superposition. CONCLUSIONS: The application addresses two fundamental problems in structural biology: first, the identification of common substructures among structurally related proteins--an important problem in characterization and prediction of function; second, generation of sieved models with demonstrated uses in protein crystallographic structure determination using the technique of Molecular Replacement.
  • Item
    Thumbnail Image
    Classifying proteins using gapped Markov feature pairs
    Ji, X ; Bailey, J ; Ramamohanarao, K (ELSEVIER, 2010-08)
  • Item
  • Item
    Thumbnail Image
    A binary decision diagram based approach for mining frequent subsequences
    Loekito, E ; Bailey, J ; Pei, J (SPRINGER LONDON LTD, 2010-08)
  • Item
  • Item
    Thumbnail Image
    Fast Set Bounds Propagation Using a BDD-SAT Hybrid
    Gange, G ; Stuckey, PJ ; Lagoon, V (AI ACCESS FOUNDATION, 2010)
    Binary Decision Diagram (BDD) based set bounds propagation is a powerful approach to solving set-constraint satisfaction problems. However, prior BDD based techniques in- cur the significant overhead of constructing and manipulating graphs during search. We present a set-constraint solver which combines BDD-based set-bounds propagators with the learning abilities of a modern SAT solver. Together with a number of improvements beyond the basic algorithm, this solver is highly competitive with existing propagation based set constraint solvers.
  • Item
    Thumbnail Image
    Incremental Satisfiability and Implication for UTVPI Constraints
    Schutt, A ; Stuckey, PJ (INFORMS, 2010-09-01)
    Unit two-variable-per-inequality (UTVPI) constraints form one of the largest class of integer constraints that are polynomial time solvable (unless P = NP). There is considerable interest in their use for constraint solving, abstract interpretation, spatial database algorithms, and theorem proving. In this paper we develop new incremental algorithms for UTVPI constraint satisfaction and implication checking that require ℴ(m + n log n + p) time and ℴ(n + m + p) space to incrementally check satisfiability of m UTVPI constraints on n variables, and we check the implication of p UTVPI constraints. The algorithms can be straightforwardly extended to create nonincremental implication checking and generation of all (nonredundant) implied constraints, as well as generate minimal unsatisfiable subsets and minimal implicants.
  • Item
    Thumbnail Image
    The Multidimensional Knapsack Problem: Structure and Algorithms
    Puchinger, J ; Raidl, GR ; Pferschy, U (INFORMS, 2010-03-01)
    We study the multidimensional knapsack problem, present some theoretical and empirical results about its structure, and evaluate different integer linear programming (ILP)-based, metaheuristic, and collaborative approaches for it. We start by considering the distances between optimal solutions to the LP relaxation and the original problem and then introduce a new core concept for the multidimensional knapsack problem (MKP), which we study extensively. The empirical analysis is then used to develop new concepts for solving the MKP using ILP-based and memetic algorithms. Different collaborative combinations of the presented methods are discussed and evaluated. Further computational experiments with longer run times are also performed to compare the solutions of our approaches to the best-known solutions of another so-far leading approach for common MKP benchmark instances. The extensive computational experiments show the effectiveness of the proposed methods, which yield highly competitive results in significantly shorter run times than do previously described approaches.