Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 199
  • Item
    Thumbnail Image
    Safe acceptance of zero-confirmation transactions in Bitcoin
    Yang, Renlord ( 2016)
    Acceptance of zero confirmation transactions in Bitcoin is inherently unsafe due to the lack of consistency in states between nodes in the network. As a consequence of this, Bitcoin users must endure a mean wait time of 10 minutes to accept confirmed transactions. Even so, due to the possibility of forks in the Blockchain, users who may want to avoid invalidation risks completely may have to wait up to 6 confirmations, which in turn results in a 60 minute mean wait time. This is untenable and remains a deterrent to the utility of Bitcoin as a payment method for merchants. Our work seeks to address this problem by introducing a novel insurance scheme to guarantee a deterministic outcome for transaction recipients. The proposed insurance scheme utilizes standard Bitcoin scripts and transactions to produce inter-dependent transactions which will be triggered or invalidated based on the occurance of potential doublespend attacks. A library to setup the insurance scheme and a test suite was implemented for anyone who may be interested in using this scheme to setup a fully anonymous and trustless insurance scheme. Based on our test in Testnet, our insurance scheme was successful at defending against 10 out of 10 doublespend attacks.
  • Item
    Thumbnail Image
    Automatic caloric expenditure estimation with smartphone's built-in sensors
    Cabello Wilson, Nestor Stiven ( 2016)
    Fitness-tracking systems are technologies commonly used to enhance peoples' lifestyles. Feedback, usability, and ease of acquisition are fundamental to achieving the good physical condition goal. Users need constant motivation as a way to keep their interest in the fitness system and consequently, continue on a healthy lifestyle track. However, although feedback is increasingly being incorporated in many fitness-tracking systems, usability and ease of acquisition are remaining shortcomings that need to be enhanced. Features such as automatic activity identification, low-energy consumption, simplicity and goals-achieved notifications provide a good user experience. Nevertheless, most of these functions require the acquisition of a relatively expensive fitness-tracking device. Smartphones provide a partial solution by allowing users an easy access to multiple fitness applications, which reduce the need for purchasing another gadget. Nonetheless, improvements in the user experience are still necessary. In the other hand, wearables devices satisfy the usability, however, the cost of their acquisition represents an impediment to some users. The system proposed in this research aims to handle these issues and offers a solution by combining the benefits from mobile applications such as feedback and ease of acquisition, with the usability that wearable devices provide, into a smartphone Android application. Data collected from a single user while performing a series of common daily activities namely walking, jogging, cycling, climbing stairs, and walking downstairs, was used to classify and provide an automatic identification of these activities with an overall accuracy of 91%, and identifying the stairs activities with an accuracy of 81%. Finally, the caloric expenditure, which we considered the most important metric for motivating a user to perform a physical activity, was estimated by following the oxygen consumption equations from the American College of Sports Medicine (ACSM).
  • Item
    Thumbnail Image
    Gossip-based Asynchronous Aggregation Protocols for Unstructured P2P Systems
    Rao, Imran Ahmad ( 2017)
    Decentralized nature of Peer-to-Peer (P2P) networks has proven to be efficient and effective in providing scalable solutions for the implementation of large-scale distributed systems. However, this decentralized nature of P2P networks also poses significant challenges in resource discovery and management. To efficiently deploy and manage P2P networks, system administrators may need to identify the aggregate capabilities of all the nodes in the system from a global perspective. For example, for efficient scheduling of jobs, they may need to locate the least loaded nodes in the system. To execute such global functions which result in the aggregate capabilities of nodes, P2P systems require decentralized and distributed protocols without the coordination of a central mediator. For these reasons, gossip-based protocols have emerged as a popular choice to compute aggregates in large-scale P2P systems. In a gossip-based push-pull aggregation protocol, each node at a given frequency exchanges its local information with one of its neighbor nodes. As a result of this exchange, both nodes update their local estimate of the global aggregate. These locally computed estimates at individual nodes, asymptotically converge to a constant, provided that the network topology remains connected and the system mass is conserved. In existing aggregation protocols, the accuracy and convergence of the estimated aggregate at local nodes heavily depends upon synchronization of aggregation rounds. Synchronization is not trivial to achieve and maintain in large-scale distributed P2P systems due to a number of factors such as different process execution speeds, message transmission delays, and clock drifts. Moreover, nodes joining and departing the system at random make it even harder to keep aggregation rounds synchronized. In this thesis, we investigate the synchronization behavior of popular existing gossip-based aggregation protocols. Through detailed simulations, we evaluate the impacts of asynchrony on the accuracy and the diffusion speed of these protocols. We propose a number of push-pull aggregation protocols to improve their accuracy in the presence of asynchronous time and compare these protocols with some of the existing protocols and list their respective pros and cons. Based upon these results, we identify the challenges in efficiently computing the aggregates in the presence of communication delays and asynchrony. Especially, we identify the scenarios in a synchronous push-pull protocol which cause the loss of system mass and measure this loss. We then propose a push-pull gossip-style novel aggregation protocol, called LAP, which addresses the above-mentioned issues and compute the system aggregate efficiently and accurately. This protocol is optimistic in nature and executes the recovery procedures after an anomaly is detected. Our protocol strives to preserve the system mass in the presence of system asynchrony and dynamics. More precisely, it does not require coordination and therefore the start and the end of an aggregation round can be asynchronous and arbitrarily long. Through detailed simulations, we evaluate the impacts of asynchrony on the accuracy and the diffusion speed of the LAP protocol.
  • Item
    Thumbnail Image
    Privacy preserving protocols for large distributed systems
    Ramchen, Kim Sasha ( 2017)
    A fundamental problem in large distributed systems is how to enable parties to communicate securely while maintaining privacy. In this thesis we investigate the construction of privacy preserving protocols in three problem domains. These are secure group communications, secure outsourceable computation and secret sharing. Within these domains, flexible data sharing, low round complexity and the distribution of access control are guiding principles for our constructions. We present a novel construction of attribute based encryption from correlation-relaxed two-to-one recodings. This construction is based upon the use of noisy cryptographic multilinear maps and entails replacing a correlation-secure encoding function with an indistinguishability property that states that a ciphertext is hard to decrypt without access to a certain input encoding. We construct the first noninteractive protocols for several tasks related to private set intersection. We provide efficient protocols for three related problems, each motivated by a particular kind of genomic testing. Set intersection with labelling hides the intersecting set itself and returns only the labels of the common elements, thus allowing a genomics company to return diagnoses without exposing the IP of its database. Fuzzy matching with labelling extends this to allow matching at a particular Hamming distance, which solves the same problem but incorporates the possibility of genetic variation. Closest matching returns the item in the server's database closest to the client's query - this can be used for taxonomic classification. Our protocols are optimised for the matching of k-mers (sets of k-length strings) rather than individual nucleotides, which is particularly useful for representing the short reads produced by next generation sequencing technologies. We present a very simple universally verifiable MPC protocol. The first component is a threshold somewhat homomorphic cryptosystem that permits an arbitrary number of additions (in the source group), followed by a single multiplication, followed by an arbitrary number of additions in the target group. The second component is a black-box construction of universally verifiable distributed encryption switching between any public key encryption schemes supporting shared setup and key generation phases, as long as the schemes satisfy some natural additive-homomorphic properties. This allows us to switch back from the target group to the source group, and hence perform an arbitrary number of multiplications. The key generation algorithm of our prototypical cryptosystem, which is based upon concurrent verifiable secret sharing, permits robust re-construction of powers of a shared secret. We demonstrate the scalability of distribution switching as a viable approach to secure vote tallying by implementing a private verifiable form of Instant Runoff Voting on real Australian election data comprising 40,000 votes. We investigate the construction of algebraic manipulation detection codes which are secure against general algebraic attacks, i.e., error-detecting/correcting codes which are secure against algebraic tampering functions. We prove that such codes exist when the families of tampering functions are point additions and polynomial functions modulo a prime. We prove both positive and negative results concerning the existence of general algebraic manipulation detection codes compared to non-malleable codes.
  • Item
    Thumbnail Image
    Hybrid optimization of vehicle routing problems
    Lam, Edward ( 2017)
    Vehicle routing problems are combinatorial optimization problems that aspire to design vehicle routes that minimize some measure of cost, such as the total distance traveled or the time at which the last vehicle returns to a depot, while adhering to various restrictions. Vehicle routing problems are of profound interest in both academia and industry because they are opportunities to study graph structures and algorithms, and because they underpin practical applications in a multitude of industries, but notably, the transportation and logistics industries. This dissertation presents two applications relevant to industry and develops a fully hybrid method for solving a classical vehicle routing problem. The first application combines vehicle routing with crew scheduling. In industry, vehicle routing and crew scheduling are usually performed in stages due to the large search space of integrated models. The first stage finds minimal-cost vehicle routes with little consideration to crew constraints and objectives. The second stage schedules crews on the routes from the first stage. Disregarding crew constraints in the first stage can lead to suboptimality or even infeasibility of the overall problem in the second stage. To quantify the suboptimality of staged optimization models, two formulations of the integrated problem are developed. The first is an ordinary mixed integer programming model, and the second is a constraint programming model containing a linear relaxation global constraint that performs cost-based filtering. The two integrated models are solved using a branch-and-bound search and a highly specialized large neighborhood search. The large neighborhood search exploits the substructures linking the vehicle routing and crew scheduling elements of the problem, and when executed on the constraint programming model, is shown to perform significantly better than the other approaches. The second application introduces a number of scheduling constraints to the Vehicle Routing Problem with Pickup and Delivery and Time Windows. The scheduling constraints arise from a lack of loading bays or equipment that unloads and loads incoming vehicles. These constraints limit the number of vehicles present or in service at any particular site by requiring the arrival of vehicles to be scheduled around the availabilities of a scarce resource. A mixed integer programming model, a constraint programming model and a sequential model are implemented for the problem but are shown to be inferior to a branch-and-price-and-check model, which hybridizes column generation and constraint programming with nogood learning. This thesis concludes with a hybrid method, named Branch-and-Check with Explanations, that unifies linear programming, constraint programming and Boolean satisfiability. The method begins with a linear programming model that omits several critical constraints. The solver uses the linear programming model to find objective bounds and candidate solutions, which are checked by a constraint programming model for feasibility of the omitted constraints. A Boolean satisfiability model performs conflict analysis on infeasible candidate solutions to derive nogood cuts, which are placed into the linear programming model and the constraint programming model. The method is implemented in a proof-of-concept solver for the Vehicle Routing Problem with Time Windows and is shown to be competitive against a branch-and-cut model while avoiding the intricacies involved in developing the cutting planes and separation algorithms required in branch-and-cut.
  • Item
    Thumbnail Image
    Protecting organizational knowledge: a strategic perspective framework
    DEDECHE, AHMED ( 2014)
    Organizational knowledge is considered a valuable resource for providing competitive advantage. Extensive research has been done on strategies to encourage knowledge creation and sharing. However, limited research has been done on strategies for protecting this valuable resource from the risk of leakage. This research aims to contribute in bridging this gap by two contributions: developing a model that describes knowledge leakage, and providing a framework of strategies for protecting competitive organisational knowledge. The research is grounded on two bodies of literature: Knowledge management and information security. The research aims for identifying security strategies in literature and adapting them to address knowledge protection needs.
  • Item
    Thumbnail Image
    Understanding how cloud computing enables business model innovation in start-up companies
    Alrokayan, Mohammed ( 2017)
    Start-up companies contribute significantly to the national economies of many countries but their failure rate is notably high. Successful start-ups typically depend on innovative business models to be competitive and maintain profitability. This thesis explores how the new technologies of cloud computing might enable start-ups to create and maintain competitive advantage. A conceptual framework called Cloud-Enabled Business Model Innovation (CEBMI) is presented that identifies three research questions concerning how cloud computing might enable business model innovation, what form this innovation takes, and how the innovation leads to competitive advantage. These questions were then investigated through three empirical studies involving six case studies with start-ups and two qualitative studies involving interviews with 11 business consultants and three cloud service providers. The detailed findings are presented as a set of key propositions that offer answers to the research questions, and together sketch a view of how CEBMI might enable start-ups to achieve competitive advantage.
  • Item
    Thumbnail Image
    Investigating the evolution of structural variation in cancer
    Cmero, Marek ( 2017)
    Cancers arise from single progenitor cells that acquire mutations, eventually dividing into mixed populations with distinct genotypes. These populations can be estimated by identifying common mutational profiles, using computational techniques applied to sequencing data from tumour tissue samples. Existing methods have largely focused on single nucleotide variants (SNVs), despite growing evidence of the importance of structural variation (SV) as drivers in certain subtypes of cancer. While some approaches use copy-number aberrant SVs, no method has incorporated balanced rearrangements. To address this, I developed a Bayesian inference approach for estimating SV cancer cell fraction called SVclone. I validated SVclone using in silico mixtures of real samples in known proportions and found that clonal deconvolution using SV breakpoints can yield comparable results to SNV-based clustering. I then applied the method to 2,778 whole-genomes across 39 distinct tumour types, uncovering a subclonal copy-number neutral rearrangement phenotype with decreased overall survival. This clinically relevant finding could not have been found using existing methods. To further expand the methodology, and demonstrate its application to low data quality contexts, I developed a novel statistical approach to test for clonal differences in high-variance, formalin-fixed, paraffin-embedded (FFPE) samples. Together with variant curation strategies to minimise FFPE artefact, I applied the approach to longitudinal samples from a cohort of neo-adjuvant treated prostate cancer patients to investigate whether clonal differences can be inferred in highly noisy data. This thesis demonstrates that characterising the evolution of structural variation, particularly balanced rearrangements, results in clinically relevant insights. Identifying the patterns and dynamics of structural variation in the context of tumour evolution will ultimately help improve understanding of common pathways of tumour progression. Through this knowledge, cancers driven by SVs will have clearer prognoses and clinical treatment decisions will ultimately be improved, leading to better patient outcomes.
  • Item
    Thumbnail Image
    Practical approximation algorithms for clustering and covering
    McClintock, Jessica ( 2017)
    The aim of this thesis is to provide solutions to computationally difficult network optimisation problems on modern architectures. There are three types of problem that we will be studying – clustering, covering and ordering – all which are computationally hard to solve, and in practice often involve very large data sets. These models are used in range of real-world applications, and therefore we will investigate both the practical and theoretical aspects of solving these problems in big data contexts. The main approach we consider for solving such instances is to obtain polynomial-time approximation-algorithms, which efficiently solve these problems to within some constant factor of optimality. We also introduce several heuristics for a type of scheduling problem with graph-based constraints and demonstrate their performance in a practical setting, before providing an approximation algorithm and hardness results for a formalised version of the problem. For instances on big data, where the computational bottleneck is the available RAM, we consider models for algorithm design that would allow such instances to be solved. For this purpose, we also design clustering algorithms using the MapReduce paradigm for parallelisation, giving experimental evaluations of their performance in comparison to existing approaches.
  • Item
    Thumbnail Image
    Spectrum-based fault localization using machine learning
    Neelofar ( 2017)
    Debugging is critical in the production of reliable software. One of the effective bug localization techniques is Spectrum-based Fault Localization (SBFL). This technique locates a buggy statement by applying an evaluation metric to program spectra and ranking program components on the basis of the score it computes. Most SBFL research to date has had a strong bias toward single bug programs and there are many good evaluation metrics available. The same is true for deterministic bugs (which cause test cases to fail whenever they are executed). However, having multiple bugs has now become a norm in large software systems, and debugging metrics that perform well for such software are consequently in high demand. In this thesis, we propose a parametric class of metrics that adapts itself to single, multiple and deterministic bugs. The parameter values are learnt using optimization methods such as Genetic Programming or Simulated Annealing. We name our proposed class of metric as Hyperbolic” metric due to the nature of its contours which are like hyperbola. We evaluate the performance of our proposed metric both on real programs and model programs with single bugs, multiple bugs, deterministic bugs and non-deterministic bugs, and find that the proposed class of metrics performs as well or better than the previous best-performing metrics over a broad range of data. SBFL is lightweight but has limited accuracy due to the limited amount of information it uses for localizing faults. It depends solely on the execution information of program components in passed and failed test cases. In this thesis, we propose a debugging approach that is a hybrid of SBFL and Static Analysis to provide extra information to SBFL about programs under test. Program statements are categorized into various types which are given weights based on how likely a category is related to a bug. We evaluate the performance of our technique both for small programs from the Siemens Test Suite (STS) and the larger Space program. Results show that our technique improves the performance of a wide variety of fault localization metrics on single and multiple bug data. Statement type is one of the many program features that can be used to get valuable clues about the location of a bug. Other features could be statement length, nesting depth, cyclomatic complexity etc. However, it is very expensive and thus impractical to plugin each of these features into our proposed technique to find how effective they are in fault localization. We devise a statistical method that can be used to evaluate the importance of a program feature in fault localization without using it in a machine learning method. The similarity of the results obtained by our proposed statistical method and actual implementation of the feature in machine learning techniques depicts the accuracy of our proposed method in feature-importance identification. Usually, when two or more well-performing techniques are combined, they do not perform better than any of the techniques individually. We combined our hyperbolic class of metrics with statement weightage technique, and the combined technique further improves the performance of the hyperbolic class of metrics. The improvement in performance is statistically significant for many single and multi-bug datasets.