Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Improving combinatorial optimization
    Chu, Geoffrey G. ( 2011)
    Combinatorial Optimization is an important area of computer science that has many theoretical and practical applications. In this thesis, we present important contributions to several different areas of combinatorial optimization, including nogood learning, symmetry breaking, dominance, relaxations and parallelization. We develop a new nogood learning technique based on constraint projection that allows us to exploit subproblem dominances that arise when two different search paths lead to subproblems which are identical on the remaining unlabeled variables. On appropriate problems, this nogood learning technique provides orders of magnitude speedup compared to a base solver which does not learn nogoods. We present a new symmetry breaking technique called SBDS-1UIP, which is an extension of Symmetry Breaking During Search (SBDS). SBDS-1UIP uses symmetric versions of the 1UIP nogoods derived by Lazy Clause Generation solvers to prune symmetric parts of the search space. We show that SBDS-1UIP can exploit at least as many symmetries as SBDS, and that it is strictly more powerful on some problems, allowing us to exploit types of symmetries that no previous general symmetry breaking technique is capable of exploiting. We present two new general methods for exploiting almost symmetries (symmetries which are broken by a small number of constraints). The first is to treat almost symmetries as conditional symmetries and exploit them via conditional symmetry breaking constraints. The second is to modify SDBS-1UIP to handle almost symmetries. Both techniques are capable of producing exponential speedups on appropriate problems. We examine three reasonably well known problems: the Minimization of Open Stacks Problem, the Talent Scheduling Problem (CSPLib prob039), and the Maximum Density Still Life Problem (CSPLib prob032). By applying various powerful techniques such as nogood learning, dynamic programming, dominance and relaxations, we are able to solve these problems many orders of magnitude faster than the previous state of the art. We identify cache contention as a serious bottleneck that severely limit the amount of speedup achievable when parallelizing SAT and LCG solvers on multi-core systems. We alter the data structures used in the state of the art SAT solver MiniSat to be more cache aware, leading to a sequential SAT solver that is some 80% faster on average and a parallel SAT solver that is some 140% faster on average. We examine an important issue in parallel search that has not been properly addressed in the literature. Many work stealing schemes used for load balancing completely ignore the branching heuristic, and instead try to maximize the granularity of the work stolen. This can result in many of the processors searching in unfruitful parts of the search tree. We analyze this problem and develop a new work stealing scheme called confidence based work stealing, which partitions the work based on how strong the branching heuristic is at each node. The new parallel algorithm produced near linear or super linear speedup on all the problems we tested.
  • Item
    Thumbnail Image
    Algorithms for the study of RNA and protein structure
    Stivala, Alexander David ( 2010)
    The growth in the number of known sequences and structures of RNA and protein molecules has led to the need to solve many computationally demanding problems in the analysis of RNA and protein structure. This thesis describes algorithms for structural comparison of RNA and protein molecules. In the case of proteins, it also describes a technique for automatically generating two-dimensional diagrammatic representations for visual comparison. A general technique for parallelizing dynamic programs in a straightforward way, by means of a shared lock-free hash table implementation and randomization of subproblem ordering is described. This generic approach is applied to several well-known dynamic programs, as well as a dynamic program for structural alignment of RNA molecules by aligning their base pairing probability matrices. Two algorithms for protein structure and substructure searching are described. These algorithms are also capable of finding non-sequential matches, that is, matches between structures where the sequential order of secondary structure elements is not preserved. The first algorithm is based on the relaxation of an earlier quadratic integer problem (QIP) formulation to a quadratic program (QP). The second algorithm uses the same formulation but approximates it using simulated annealing. It is shown that this results in significant increases in speed. This algorithm is also capable of greater accuracy when assessed as a fold recognition method. A parallel implementation of this algorithm on modern graphics processing unit (GPU) hardware is also described. This parallel implementation results in a further significant speedup, and, to the best of our knowledge, is the first use of a GPU for the protein structural search problem. Finally, a system to automatically generate two-dimensional representations of protein structure is described. Such diagrams are particularly useful in analysing complex protein folds. A method for using these diagrams as an interface to the protein substructure search methods is also described.