Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 4 of 4
  • Item
    Thumbnail Image
    On the effectiveness of removing location information from trajectory data for preserving location privacy
    Hossain, Amina ( 2017)
    The ubiquity of GPS enabled smartphones with Internet connectivity has resulted in the widespread development of location-based services (LBSs). People use these services to obtain useful advises for their daily activities. For example, a user can open a navigation app to find a route that results in the shortest driving time from the current location to a destination. Nevertheless, people have to reveal location information to the LBS providers to leverage such services. Location information is sensitive since it can reveal habits about an individual. LBS providers are aware of this and take measures to protect user privacy. One well established and simple approach is to remove GPS data from user data working with the assumption that it will lead to a high degree of privacy. In this thesis, we challenge this notion of removing location information while retaining other features would lead to a high degree of location privacy. We find that it is possible to reconstruct the original routes by analyzing just the turn instructions provided to a user by a navigation service. We evaluated our approach using both synthetic and real road network data and demonstrate the effectiveness of this new attack in a range of realistic scenarios.
  • Item
    Thumbnail Image
    Supervised algorithms for complex relation extraction
    Khirbat, Gitansh ( 2017)
    Binary relation extraction is an essential component of information extraction systems, wherein the aim is to extract meaningful relations that might exist between a pair of entities within a sentence. Binary relation extraction systems have witnessed a significant improvement over past three decades, ranging from rule-based systems to statistical natural language techniques including supervised, semi-supervised and unsupervised machine learning approaches. Modern question answering and summarization systems have motivated the need for extracting complex relations wherein the number of related entities is more than two. Complex relation extraction (CRE) systems are highly domain specific and often rely on traditional binary relation extraction techniques employed in a pipeline fashion, thus susceptible to processing-induced error propagation. In this thesis, we investigate and develop approaches to extract complex relations directly from natural language text. In particular, we deviate from the traditional disintegration of complex relations into constituent binary relations and propose usage of shortest dependency parse spanning the n related entities as an alternative to facilitate direct CRE. We investigate this proposed approach by a comprehensive study of supervised learning algorithms with a special focus on training support vector machines, convolutional neural networks and deep learning ensemble algorithms. Research in the domain of CRE is stymied by paucity of annotated data. To facilitate future exploration, we create two new datasets to evaluate our proposed CRE approaches on a pilot biographical fact extraction task. An evaluation of results on new and standard datasets concludes that usage of shortest path dependency parse in a supervised setting enables direct CRE with an improved accuracy, beating current state-of-the-art CRE systems. We further show the application of CRE to achieve state-of-the-art performance for directly extracting events without the need of disintegrating them into event trigger and event argument extraction processes.
  • Item
    Thumbnail Image
    Efficient orthogonal parametrisation of recurrent neural networks using householder reflections
    Mhammedi, Zakaria ( 2017)
    In machine learning, Recurrent Neural Networks (RNNs) have been successfully used in many applications. They are particularly well suited for problems involving time-series. This is because RNNs process input sequences one element at a time, and thus, the chronological order of these elements is taken into account. In many practical prediction tasks involving time-series, long-term time dependencies in data are crucial. These are the features that RNNs need to learn. However, learning these long-term dependencies in sequences using RNNs is still a major challenge. This is mainly due to the exploding and vanishing gradient problems, which are more prominent the longer the input sequences are. Recent methods have been suggested to solve this problem by constraining the transition matrix to be unitary during training, which ensures that its norm is exactly equal to one. This sets an upper bound on the norm of the back-propagated gradients, preventing them from growing exponentially. However, existing methods either have limited expressiveness or scale poorly with the size of the network when compared with the simple RNN case, especially when using stochastic gradient descent. Our contributions are as follows. We first show that constraining the transition matrix to be unitary is a special case of an orthogonal constraint. Therefore, it may not be necessary to work with complex-valued matrices. Then we present a new parametrisation of the transition matrix which allows efficient training of an RNN while ensuring that the matrix is always orthogonal. Furthermore, a good trade-off between speed and expressiveness can be chosen by selecting the right number of reflection vectors - the vectors used in the new parametrisation. In particular, when the number of reflection vector is equal to the size of the hidden layer the transition matrix is allowed to span the full set of orthogonal matrices. Using our approach, one stochastic gradient step can, in the worst case, be performed in time complexity $\mathcal{O}(\bm{T} n^2)$, where $T$ and $n$ are the length of the input sequence and the size of the hidden layer respectively. This time complexity is the same as that of the simple RNN. Finally, we test our new parametrisation on problems with long-term dependencies. Our results suggest that the orthogonal constraint on the transition matrix has similar benefits to the unitary constraint.
  • Item
    Thumbnail Image
    Low-cost leaving home activity recognition using mobile sensing
    Li, Han ( 2017)
    Leaving home activity recognition (LHAR) is essential in context-aware applications. For example, on a rainy day, a smartphone can remind a user to bring an umbrella when the leaving home activity is recognized. However, research in this field is substantially lacking. Most existing studies require extra hardware such as sensors installed at home to help recognize such activities, which limits the applicability of these studies. With the ubiquity of mobile sensing technique, it becomes feasible for a smartphone to sense the ambient environment and a user’s current context. In this thesis, we develop a low-cost system using mobile sensing for timely recognition of leaving home activities. To the best of our knowledge, we are the first to recognize leaving home activities using only sensors on smartphones. Overall, our system can recognize leaving home activities within 20 seconds after the home door is closed with a precision of 93.1\% and recall of 100\%. Recognizing leaving home activities while only leveraging sensors on smartphones can be challenging in two aspects: 1) the diversity of home environments result in inconsistent features, which significantly affects the recognition performance; and 2) mobile sensing is restricted by limited resources on smartphones, e.g., power and computation capability. To overcome such limitations, we first investigate sensors available on commodity smartphones and find that features extracted from WiFi, barometer, cell tower, magnetic field sensor and accelerometer readings are relatively robust in recognizing leaving home activities. Second, due to the variety of residential property environments, we propose a sensor selection algorithm to adaptively select the most suitable sensors for each home to personalize the training. Both classification performance and sensing cost are taken into consideration to provide the user an option to trade power consumption for recognition accuracy, and vice versa. Inspired by the observation that leaving home activity usually involves walking, we activate the power-hungry sensors to start the recognition only when the walking event is detected. Thus, we reduce the sensing cost by 76.65\%.