Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 2 of 2
  • Item
    Thumbnail Image
    Computational substructure querying and topology prediction of the beta-sheet
    Ho, Hui Kian ( 2014)
    Studying the three-dimensional structure of proteins is essential to understanding their function, and ultimately, their dysfunction that causes disease. The limitations of experimental protein structure determination presents a need for computational approaches to protein structure prediction and analysis. The beta-sheet is a commonly occurring protein substructure important to many biological processes and are often implicated in neurological disorders. Targeted experimental studies of beta-sheets are especially difficult due to their general insolubility in isolation. This thesis presents a series of contributions to the computational analysis and prediction of beta-sheet structure, which are useful for knowledge discovery and for directing more detailed experimental work. Approaches for predicting the simplest type of beta-sheet, the beta-hairpin, are first described. Improvements over existing methods are obtained by using the most important beta-hairpin features identified through systematic feature selection. An examination of the most important features provides a physiochemical basis of their usefulness in beta-hairpin prediction. New methods for the more general problem of beta-sheet topology prediction are described. Unlike recent methods, ours are independent of multiple sequence alignment (MSAs) and therefore do not rely on the coverage of reference sequence databases or sequence homology. Our evaluations showed that our methods do not exhibit the same reductions in performance as a state-of-the-art method for sequences with low quality MSAs. A new method for the indexing and querying of beta-sheet substructures, called BetaSearch, is described. BetaSearch exploits the inherent planar constraints of beta-sheet structure to achieve significant speedups over existing graph indexing and conventional 3D structure search methods. Case studies are presented that demonstrate the potential of this method for the discovery of biologically interesting beta-sheet substructures. Finally, a purpose-built open source toolkit for generating 2D protein maps is described, which is useful for the coarse-grained analysis and visualisation of 3D protein structures. It can also be used in existing knowledge discovery pipelines for automated structural analysis and prediction tasks, as a standalone application, or imported into existing experimental applications.
  • Item
    Thumbnail Image
    Automatic parallelisation for Mercury
    Bone, Paul ( 2012)
    Multicore computing is ubiquitous, so programmers need to write parallel programs to take advantage of the full power of modern computer systems. However, the most popular parallel programming methods are difficult and extremely error-prone. Most such errors are intermittent, which means they may be unnoticed until after a product has been shipped; they are also often very difficult to fix. This problem has been addressed by pure declarative languages that support explicit parallelism. However, this does nothing about another problem: it is often difficult for developers to find tasks that are worth parallelising. When they can be found, it is often too easy to create too much parallelism, such that the overheads of parallel execution overwhelm the benefits gained from the parallelism. Also, when parallel tasks depend on other parallel tasks, the dependencies may restrict the amount of parallelism available. This makes it even harder for programmers to estimate the benefit of parallel execution. In this dissertation we describe our profile feedback directed automatic parallelisation system, which aims at solving this problem. We implemented this system for Mercury, a pure declarative logic programming language. We use information gathered from a profile collected from a sequential execution of a program to inform the compiler about how that program can be parallelised. Ours is, as far as we know, the first automatic parallelisation system that can estimate the parallelism available among any number of parallel tasks with any number of (non-cyclic) dependencies. This novel estimation algorithm is supplemented by an efficient exploration of the program's call graph, an analysis that calculates the cost of recursive calls (as this is not provided by the profiler), and an efficient search for the best parallelisation of N computations from among the two to the power of N minus one candidates. We found that in some cases where our system parallelised a loop, spawning off virtually all of its iterations, the resulting programs exhibited excessive memory usage and poor performance. We therefore designed and implemented a novel program transformation that fixes this problem. Our transformation allows programs to gain large improvements in performance and in several cases, almost perfect linear speedups. The transformation also allows recursive calls within the parallelised code to take advantage of tail recursion. Also presented in this dissertation are many changes that improve the performance of Mercury's parallel runtime system, as well as a proposal and partial implementation of a visualisation tool that assists developers with parallelising their programs, and helps researchers develop automatic parallelisation tools and improve the performance of the runtime system. Overall, we have attacked and solved a number of issues that are critical to making automatic parallelism a realistic option for developers.