Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 8 of 8
  • Item
    Thumbnail Image
    Location proof architectures
    SENEVIRATNE, JANAKA ( 2014)
    Upcoming location based services such as pay-as-you-drive insurances mandate verified locations. To enable such services Location Proof Architectures (LPAs) have been proposed in literature to verify or prove a user location. Specifically, an LPA allow a user (or a device on behalf of its user) to obtain a proof of its presence at a location from a trusted third party. In addition to guarding against cheating users who may claim false locations, another major concern in an LPA is to preserve user location privacy. To achieve this a user's identity and location data should be maintained separately in tandem with additional measures that avoid leaking sensitive identity and location data. We identify two types of location proof architectures: 1. sporadic location proofs for specific user locations and 2. continuous location proofs for user routes. In this thesis, we present two sporadic LPAs. Firstly, we propose an LPA where a user cannot falsely claim a location. Also, this LPA preserves user privacy by verifying a user identity and a location independently. Secondly, we propose an LPA that uses pseudonyms. We present a trusted third party free group pseudonym registering system for the LPA and show that our approach can achieve a guaranteed degree of privacy in the LPA. This thesis also introduces a framework for continuous LPA. In a continuos LPA, a verifier receives a sequence of location samples on a user route and assigns a degree of confidence with each possible user route. Specifically, we explain a stochastic model which associates a degree of confidence with a user route based on the distribution pattern of location samples.
  • Item
    Thumbnail Image
    User centric cellular trajectory inference with partial knowledge
    PERERA, BATUGAHAGE KUSHANI ANURADHA ( 2014)
    The uncertainty associated with cellular network trajectories is a major problem for their use in location based applications such as person tracking. Inferring the real trajectory of a person, using highly imprecise cellular trajectory data, is a challenging task. GPS trajectories are subjected to less uncertainty compared to cellular network trajectories and are preferred over cellular network trajectories, for many location based applications. However, GPS based location acquisition has limited applicability for certain contexts due to high power consumption and poor coverage. Cellular network based location acquisition is therefore a good alternative for GPS in such scenarios. Consequently, a cellular trajectory inference method which can handle the uncertainty of cellular trajectories is promising to investigate, such that cellular trajectories can be utilised in location based applications. In this thesis, our main focus is on user centric trajectory inference approaches, where the trajectory inference is performed by mobile phone users rather than the mobile network operator. Many existing cellular trajectory inference methods use knowledge about the cellular network such as the spatial distribution of neighbouring cell towers and signal strength information. However, this full knowledge about the cellular network is confidential to the mobile network operator, and mobile phone users are not guaranteed to have access to such information. Therefore, these techniques are not applicable for user centric cellular trajectory inference with partial knowledge about the cellular network. Therefore, user centric approaches for cellular trajectory inference are even more challenging. We propose a cellular trajectory inference method which utilises only a user’s connected cell tower location sequence and corresponding timing information, as this is the only type of knowledge guaranteed for a mobile phone user. We suggest using a user’s speed information as background knowledge, as it is easily accessible by the user, compared to knowledge about the cellular network. Furthermore, we suggest exploiting the preciseness of the time dimension of cellular trajectories to obtain precise handover times. These precise handover times can be used with speed information to accurately compute the distance, a user has travelled within a cell. We propose a method to infer the straight line segments of a trajectory, using above distance information. The inferred straight line trajectory segments are later used to estimate other segments of the trajectory. We theoretically and experimentally show that our proposed method achieves higher accuracy than existing cellular trajectory inference methods, for cases where the user’s trajectory tends to be a combination of straight lines. The intuition behind straight line inference is that people follow the shortest path to reach a destination avoiding unnecessary turns and therefore often prefer to select a straight line path. Additional advantages of our proposed inference method include the ability to locally run on mobile devices and the ability to perform trajectory inference within an unfamiliar environment, since no historical trajectory information or pre-training is required.
  • Item
    Thumbnail Image
    Rapid de novo methods for genome analysis
    HALL, ROSS STEPHEN ( 2013)
    Next generation sequencing methodologies have resulted in an exponential increase in the amount of genomic sequence data available to researchers. Valuable tools in the initial analysis of such data for novel features are de novo techniques - methods which employ a minimum of comparative sequence information from known genomes. In this thesis I describe two heuristic algorithms for the rapid de novo analysis of genomic sequence data. The first algorithm employs a multiple Fast Fourier Transform, mapped to two dimensional spaces. The resulting bitmap clearly illustrates periodic features of a genome including coding density. The compact representation allows mega base scales of genomic data to be rendered in a single bitmap. The second algorithm RTASSS, (RNA Template Assisted Secondary Structure Search) predicts potential members of RNA gene families that are related by similar secondary structure, but not necessarily conserved sequence. RTASSS has the ability to find candidate structures similar to a given template structure without the use of sequence homology. Both algorithms have a linear complexity.
  • Item
    Thumbnail Image
    Efficient mixnets with application to electronic voting
    Ramchen, Kim ( 2013)
    Cryptographic mixnets are a fundamental tool in the construction of secure electronic elections. Traditional mixnets rely upon third party mixers to perform vote anonymisation at election time. This approach places inherent limitations on the robustness and efficiency of mixing. In this thesis we show that third party mixers are not required to be active at election time - in fact it is highly feasible for the shuffle to be constructed before the election. A basic primitive used is the public key obfuscator for a re-encryption shuffle. We show that the seminal obfuscator of Paillier shuffles by Adida and Wikstro ̈m [AW07a] can be extended to generalised Paillier shuffles [DJ01]. The resulting obfuscations are composable, allowing obfuscation of re-encryption permutation networks. This, in turn, implies an obfuscator for a Paillier shuffle with improved efficiency (N log^3.5 N vs. N^2). This leads to a very robust and efficient mixnet: when distributed over O(N) nodes the mixnet achieves mixing in polylogarithmic time, independent of the level of privacy or verifiability required. In fact, our mixnet is the first to achieve mixing in time sublinear in the number of inputs, assuming the number of nodes available is bounded by the number of inputs. Although the mixnet may have a biased distribution, we show that using particular networks leads to an acceptable bias-efficiency tradeoff. We additionally show that the mixnet is secure in the sense of indistinguishability of chosen permutations [NSNK04].
  • Item
    Thumbnail Image
    Towards small-effort adaptions to off-the-shelf spatial and temporal indexes for modern database applications
    Stradling, Martin James ( 2012)
    Modern database applications demand high throughput of queries on complex data, such as the increasingly common types of spatial and temporal data. Many structures have been proposed for indexing spatial and temporal data and as standalone implementations they often achieve high levels of performance. However they are generally difficult to implement in an existing DBMS, which makes them very expensive to adopt. For example, Oracle took more than 5 years to implement the R-tree spatial index structure in their commercial DBMS. This thesis examines the techniques that exist in off-the-shelf spatial and temporal indexes with an emphasis on achieving large performance gains with minimal changes. In the domain of spatial data we propose a new index structure called Size Separation Indexing (SSI). This structure builds on the B+-tree, which is present in almost all DBMSs. Through extensive experimentation we show that SSI performs at least as well as all current spatial indexes and better than most on both a flat filesystem and on top of a DBMS. In the domain of temporal data we extend the TSB-tree, which is also based on the B+ -tree. In recent work the TSB-tree has been integrated into Microsoft SQL Server and we introduce the memory Hierarchy-aware Version tree which significantly improves on the performance of the TSB-tree for almost all query types and is scalable to huge datasets. Because our structures build upon existing indexes, we argue that they are better suited for implementation in current systems than other structures, reducing costs and increasing performance.
  • Item
    Thumbnail Image
    Online time series approximation and prediction
    Xu, Zhenghua ( 2012)
    In recent years, there are rapidly increasing research interests in the management of time series data due to its importance in a variety of applications such as network traffic management, telecommunications, finance, sensor network and location based services. In this thesis, we focus on two important problems of the management of time series data: the online time series approximation problem and the online time series prediction problem. The time series approximation can reduce the space and the computational cost of storing and transmitting time series data, and also reduce the workload of the data processing. Segmentation is one of the most commonly used methods to meet this requirement. However, while most of the current segmentation methods aim to minimize the holistic error between the approximation and the original time series, few works try to represent time series as compact as possible with an error bound guarantee on each data point. Moreover, in many real world situations, the patterns of the time series do not follow a constant rule such that using only one type of functions may not yield the best compaction. Motivated by these observations, we propose an online segmentation algorithm which approximates time series by a set of different types of candidate functions (poly nomials of different orders, exponential functions, etc.) and adaptively chooses the most compact one as the pattern of the time series changes. A challenge in this approach is to determine the approximation function on the fly (“online”). Thereby, we further present a novel method to efficiently generate the compact approximation of a time series in an online fashion for several types of candidate functions. This method incrementally narrows the feasible coefficient spaces of candidate functions in coefficient coordinate systems such that it can make each segment as long as possible given an error bound on each data point. Extensive experimental results show that our algorithm generates more compact approximations of the time series with lower average errors than the state-of-the-art algorithm. The time series prediction aims to predict future values of the time series according to their previously observed values. In this thesis, we focus our work on a specific branch of the time series prediction: the online locational time series prediction problem, i.e, predicting the final locations of the locational time series according to their current observed locational time series values on the fly. This problem is also called the destination prediction problem and the systems used to solve this problem are the called destination prediction systems. However, most of current destination prediction systems are based on the traveling histories of specific users so they are too ad hoc to accurately predict other users’ destination. In our work, we propose the first generic destination prediction solution to provide the accurate destination prediction services to all users based on the given user queries (i.e., the users’ current partial traveling trajectories) without knowing any traveling histories of the users. Extensive experiments validate that the prediction result of our method is more accurate than that of the competing method.
  • Item
    Thumbnail Image
    A generic framework for the simulation of biologically plausible spiking neural networks on graphics processors
    Abi-Samra, Jad ( 2011)
    The study of the structure and functionality of the brain has been ardently investigated, as the implications of such research may aid in the treatment and diagnosis of mental diseases. This has led to a growing interest in numerical simulation tools that can model its network complexity, in order to achieve a greater understanding of the underlying processes of this complex biological system. The computational requirements of neural modeling makes high performance multi-core systems a desirable architecture when simulating large-scale networks. Graphics processing units (GPUs) are an inexpensive, power-efficient, supercomputing alternative for solving compute-intensive scientific applications. However, the irregular communication and execution patterns in realistic spiking neural networks pose a challenge to their implementation on these massively data parallel devices. In this work, we propose a generic framework for simulating large-scale spiking neural networks with biologically realistic connectivity on GPUs. We provide an extensive list of optimization techniques and strategies which target the main issues involved with neural simulation on these devices, such as: optimal access patterns, synaptic referencing, current aggregation, firing representation, and task distribution. We succeed in building a GPU-based simulator that preserves the flexibility, accuracy, and biological plausibility of neural simulation, while providing high performance and efficient memory usage. Overall, our implementation achieves speedups of around 35-84 times on a single graphics card over an optimized CPU implementation based on the SPIKESIM simulator. We also provide a comparison with other GPU neural simulators related to this work. Following that, we analyze the communication aspects of migrating the system onto a multi-GPU cluster. This is done in an attempt to quantitatively determine the implications of communication overhead with large-scale neural simulation, when employing distributed clusters with GPU devices. We describe a model to determine the arising dependency cost from partitioning a neural network across the different components of the distributed system. We also discuss various techniques for minimizing overhead resulting from frequent messaging and global synchronization. Finally, we provide a theoretical analysis of the suggested communication model in relation to computational and overall performance, as well as a discussion on the relevance of the work.
  • Item
    Thumbnail Image
    Multi-resolution indexing method for time series
    Ma, Mei ( 2010)
    Time series datasets are useful in a wide range of diverse real world applications. Retrieving or querying from a collection of time series is a fundamental task, with a key example being the similarity query. A similarity query returns all time series from the collection that are similar to a given reference time series. This type of query is particularly useful in prediction and forecasting applications. A key challenge for similarity queries is efficiency and for large datasets, it is important to develop efficient indexing techniques. Existing approaches in this area are mainly based on the Generic Multimedia Indexing Method (GEMINI), which is a framework that uses spatial indexes such as the R-tree to index reduced time series. For processing a similarity query, the index is first used to prune candidate time series using a lower bounding distance. Then, all remaining time series are compared using the original similarity measure, to derive the query result. Performance within this framework depends on the tightness of the lower bounding distance with respect to the similarity measure. Indeed much work has been focused on representation and dimensionality reduction, in order to provide a tighter lower bounding distance. Existing work, however, has not used employed dimensionality reduction in a flexible way, requiring all time series to be reduced to have the same dimension. In contrast, in this thesis, we investigate the possibility of allowing a variable dimension reduction. To this end, we develop a new and more flexible tree based indexing structure called the Multi-Resolution Index (MR-Index), which allows dimensionality to vary across different levels of the tree. We provide efficient algorithms for querying, building and maintaining this structure. Through an experimental analysis, we show that the MR-Index can deliver improved query efficiency compared to the traditional R-tree index, using both the Euclidean and dynamic time warping similarity measures.