Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Automatic parallelisation for Mercury
    Bone, Paul ( 2012)
    Multicore computing is ubiquitous, so programmers need to write parallel programs to take advantage of the full power of modern computer systems. However, the most popular parallel programming methods are difficult and extremely error-prone. Most such errors are intermittent, which means they may be unnoticed until after a product has been shipped; they are also often very difficult to fix. This problem has been addressed by pure declarative languages that support explicit parallelism. However, this does nothing about another problem: it is often difficult for developers to find tasks that are worth parallelising. When they can be found, it is often too easy to create too much parallelism, such that the overheads of parallel execution overwhelm the benefits gained from the parallelism. Also, when parallel tasks depend on other parallel tasks, the dependencies may restrict the amount of parallelism available. This makes it even harder for programmers to estimate the benefit of parallel execution. In this dissertation we describe our profile feedback directed automatic parallelisation system, which aims at solving this problem. We implemented this system for Mercury, a pure declarative logic programming language. We use information gathered from a profile collected from a sequential execution of a program to inform the compiler about how that program can be parallelised. Ours is, as far as we know, the first automatic parallelisation system that can estimate the parallelism available among any number of parallel tasks with any number of (non-cyclic) dependencies. This novel estimation algorithm is supplemented by an efficient exploration of the program's call graph, an analysis that calculates the cost of recursive calls (as this is not provided by the profiler), and an efficient search for the best parallelisation of N computations from among the two to the power of N minus one candidates. We found that in some cases where our system parallelised a loop, spawning off virtually all of its iterations, the resulting programs exhibited excessive memory usage and poor performance. We therefore designed and implemented a novel program transformation that fixes this problem. Our transformation allows programs to gain large improvements in performance and in several cases, almost perfect linear speedups. The transformation also allows recursive calls within the parallelised code to take advantage of tail recursion. Also presented in this dissertation are many changes that improve the performance of Mercury's parallel runtime system, as well as a proposal and partial implementation of a visualisation tool that assists developers with parallelising their programs, and helps researchers develop automatic parallelisation tools and improve the performance of the runtime system. Overall, we have attacked and solved a number of issues that are critical to making automatic parallelism a realistic option for developers.
  • Item
    Thumbnail Image
    A generic framework for the simulation of biologically plausible spiking neural networks on graphics processors
    Abi-Samra, Jad ( 2011)
    The study of the structure and functionality of the brain has been ardently investigated, as the implications of such research may aid in the treatment and diagnosis of mental diseases. This has led to a growing interest in numerical simulation tools that can model its network complexity, in order to achieve a greater understanding of the underlying processes of this complex biological system. The computational requirements of neural modeling makes high performance multi-core systems a desirable architecture when simulating large-scale networks. Graphics processing units (GPUs) are an inexpensive, power-efficient, supercomputing alternative for solving compute-intensive scientific applications. However, the irregular communication and execution patterns in realistic spiking neural networks pose a challenge to their implementation on these massively data parallel devices. In this work, we propose a generic framework for simulating large-scale spiking neural networks with biologically realistic connectivity on GPUs. We provide an extensive list of optimization techniques and strategies which target the main issues involved with neural simulation on these devices, such as: optimal access patterns, synaptic referencing, current aggregation, firing representation, and task distribution. We succeed in building a GPU-based simulator that preserves the flexibility, accuracy, and biological plausibility of neural simulation, while providing high performance and efficient memory usage. Overall, our implementation achieves speedups of around 35-84 times on a single graphics card over an optimized CPU implementation based on the SPIKESIM simulator. We also provide a comparison with other GPU neural simulators related to this work. Following that, we analyze the communication aspects of migrating the system onto a multi-GPU cluster. This is done in an attempt to quantitatively determine the implications of communication overhead with large-scale neural simulation, when employing distributed clusters with GPU devices. We describe a model to determine the arising dependency cost from partitioning a neural network across the different components of the distributed system. We also discuss various techniques for minimizing overhead resulting from frequent messaging and global synchronization. Finally, we provide a theoretical analysis of the suggested communication model in relation to computational and overall performance, as well as a discussion on the relevance of the work.
  • Item
    Thumbnail Image
    Algorithms for the study of RNA and protein structure
    Stivala, Alexander David ( 2010)
    The growth in the number of known sequences and structures of RNA and protein molecules has led to the need to solve many computationally demanding problems in the analysis of RNA and protein structure. This thesis describes algorithms for structural comparison of RNA and protein molecules. In the case of proteins, it also describes a technique for automatically generating two-dimensional diagrammatic representations for visual comparison. A general technique for parallelizing dynamic programs in a straightforward way, by means of a shared lock-free hash table implementation and randomization of subproblem ordering is described. This generic approach is applied to several well-known dynamic programs, as well as a dynamic program for structural alignment of RNA molecules by aligning their base pairing probability matrices. Two algorithms for protein structure and substructure searching are described. These algorithms are also capable of finding non-sequential matches, that is, matches between structures where the sequential order of secondary structure elements is not preserved. The first algorithm is based on the relaxation of an earlier quadratic integer problem (QIP) formulation to a quadratic program (QP). The second algorithm uses the same formulation but approximates it using simulated annealing. It is shown that this results in significant increases in speed. This algorithm is also capable of greater accuracy when assessed as a fold recognition method. A parallel implementation of this algorithm on modern graphics processing unit (GPU) hardware is also described. This parallel implementation results in a further significant speedup, and, to the best of our knowledge, is the first use of a GPU for the protein structural search problem. Finally, a system to automatically generate two-dimensional representations of protein structure is described. Such diagrams are particularly useful in analysing complex protein folds. A method for using these diagrams as an interface to the protein substructure search methods is also described.