Computing and Information Systems - Research Publications

Permanent URI for this collection

http://hdl.handle.net/11343/350

Search Results

Now showing 1 - 10 of 10

On the reliability and the limits of inference of amino acid sequence alignments

Rajapaksa, S ; Sumanaweera, D ; Lesk, AM ; Allison, L ; Stuckey, PJ ; Garcia de la Banda, M ; Abramson, D ; Konagurthu, AS (OXFORD UNIV PRESS, 2022-06-24)

MOTIVATION: Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS: By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Ballot-Polling Audits of Instant-Runoff Voting Elections with a Dirichlet-Tree Model

Everest, F ; Blom, M ; Stark, PB ; Stuckey, PJ ; Teague, V ; Vukcevic, D (Springer International Publishing, 2023-01-01)
Universal Architectural Concepts Underlying Protein Folding Patterns

Konagurthu, AS ; Subramanian, R ; Allison, L ; Abramson, D ; Stuckey, PJ ; Garcia de la Banda, M ; Lesk, AM (FRONTIERS MEDIA SA, 2021-04-30)

What is the architectural "basis set" of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures-called concepts-typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence-structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
Lightweight Nontermination Inference with CHCs

Kafle, B ; Gange, G ; Schachte, P ; Sondergaard, H ; Stuckey, PJ ; Calinescu, R ; Pasareanu, CS (SPRINGER INTERNATIONAL PUBLISHING AG, 2021)

Non-termination is an unwanted program property (considered a bug) for some software systems, and a safety property for other systems. In either case, automated discovery of preconditions for non-termination is of interest. We introduce NtHorn, a fast lightweight non-termination analyser, able to deduce non-trivial sufficient conditions for non-termination. Using Constrained Horn Clauses (CHCs) as a vehicle, we show how established techniques for CHC program transformation and abstract interpretation can be exploited for the purpose of non-termination analysis. NtHorn is comparable in power to the state-of-the-art non-termination analysis tools, as measured on standard competition benchmark suites (consisting of integer manipulating programs), while typically solving problems an order of magnitude faster.
Disjunctive Interval Analysis

Gange, G ; Navas, JA ; Schachte, P ; Sondergaard, H ; Stuckey, PJ ; Dragoi, C ; Mukherjee, S ; Namjoshi, K (SPRINGER INTERNATIONAL PUBLISHING AG, 2021)

We revisit disjunctive interval analysis based on the Boxes abstract domain. We propose the use of what we call range decision diagrams (RDDs) to implement Boxes, and we provide algorithms for the necessary RDD operations. RDDs tend to be more compact than the linear decision diagrams (LDDs) that have traditionally been used for Boxes. Representing information more directly, RDDs also allow for the implementation of more accurate abstract operations. This comes at no cost in terms of analysis efficiency, whether LDDs utilise dynamic variable ordering or not. RDD and LDD implementations are available in the Crab analyzer, and our experiments confirm that RDDs are well suited for disjunctive interval analysis.
Abstract interpretation, symbolic execution and constraints

Amadini, R ; Gange, G ; Schachte, P ; Sondergaard, H ; Stuckey, P ; de Boer, F ; Mauro, J (Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2020)

Abstract interpretation is a static analysis framework for sound over-approximation of all possible runtime states of a program. Symbolic execution is a framework for reachability analysis which tries to explore all possible execution paths of a program. A shared feature between abstract interpretation and symbolic execution is that each – implicitly or explicitly – maintains constraints during execution, in the form of invariants or path conditions. We investigate the relations between the worlds of abstract interpretation, symbolic execution and constraint solving, to expose potential synergies.
Dynamic Programming for Predict plus Optimise

Demirovic, E ; Stuckey, PJ ; Bailey, J ; Chan, J ; Leckie, C ; Ramamohanarao, K ; Guns, T (ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, 2020)

We study the predict+optimise problem, where machine learning and combinatorial optimisation must interact to achieve a common goal. These problems are important when optimisation needs to be performed on input parameters that are not fully observed but must instead be estimated using machine learning. We provide a novel learning technique for predict+optimise to directly reason about the underlying combinatorial optimisation problem, offering a meaningful integration of machine learning and optimisation. This is done by representing the combinatorial problem as a piecewise linear function parameterised by the coefficients of the learning model and then iteratively performing coordinate descent on the learning coefficients. Our approach is applicable to linear learning functions and any optimisation problem solvable by dynamic programming. We illustrate the effectiveness of our approach on benchmarks from the literature.
Dashed Strings and the Replace(-all) Constraint

Amadini, R ; Gange, G ; Stuckey, PJ (Springer International Publishing, 2020)

Dashed strings are a formalism for modelling the domain of string variables when solving combinatorial problems with string constraints. In this work we focus on (variants of) the Replace constraint, which aims to find the first occurrence of a query string in a target string, and (possibly) replaces it with a new string. We define a Replace propagator which can also handle Replace-Last (for replacing the last occurrence) and Replace-All (for replacing all the occurrences). Empirical results clearly show that string constraint solving can draw great benefit from this approach.
Dashed strings for string constraint solving

Amadini, R ; Gange, G ; Stuckey, PJ (Elsevier, 2020-12-01)

String processing is ubiquitous across computer science, and arguably more so in web programming — where it is also a critical part of security issues such as injection attacks. In recent years, a number of string solvers have been developed to solve combinatorial problems involving string variables and constraints. We examine the dashed string approach to string constraint solving, which represents an unknown string as a sequence of blocks of characters with bounds on their cardinalities. The solving approach relies on propagation of information about the blocks of characters that arise from reasoning about the constraints in which they occur. This approach shows promising performance on many benchmarks involving constraints like string length, equality, concatenation, and regular expression membership. In this paper, we formally review the definition, the properties and the use of dashed strings for string constraint solving, and we provide an empirical validation that confirms the effectiveness of this approach.
String constraint solving: past, present and future

Amadini, R ; Gange, G ; Schachte, P ; Sondergaard, H ; Stuckey, PJ ; DeGiacomo, G ; Catala, A ; Dilkina, B ; Milano, M ; Barro, S ; Bugarin, A ; Lang, J (University of Santiago de Compostela, 2020-01-01)

String constraint solving is an important emerging field, given the ubiquity of strings over different fields such as formal analysis, automated testing, database query processing, and cybersecurity. This paper highlights the current state-of-the-art for string constraint solving, and identifies future challenges in this field.

Computing and Information Systems - Research Publications

Permanent URI for this collection

Filters

Date

Author

Type

Settings

Sort By

Results per page

Statistics

Citations

Search Results