# Computing and Information Systems - Theses

## Search Results

Now showing 1 - 5 of 5
• Item
Learning to generalise through features
Grebenyuk, Dmitry ( 2020)
A Markov decision process (MDP) cannot be used for learning end-to-end control policies in Reinforcement Learning when the dimension of the feature vectors changes from one trial to the next. For example, this difference is present in an environment where the number of blocks to manipulate can vary. Because we cannot learn a different policy for each number of blocks, we suggest framing the problem as a POMDP instead of the MDP. It allows us to construct a constant observation space for a dynamic state space. There are two ways we can achieve such construction. First, we can design a hand-crafted set of observations for a particular problem. However, that set cannot be readily transferred to another problem, and it often requires domain-dependent knowledge. On the other hand, a set of observations can be deduced from visual observations. This approach is universal, and it allows us to easily incorporate the geometry of the problem into the observations, which can be challenging to hard-code in the former method. In this Thesis, we examine both of these methods. Our goal is to learn policies that can be generalised to new tasks. First, we show that a more general observation space can improve the performance of policies tested in untrained tasks. Second, we show that meaningful feature vectors can be obtained from visual observations. If properly regularised, these vectors can reflect the spacial structure of the state space and used for planning. Using these vectors, we construct an auto-generated reward function, able to learn working policies.
• Item
Towards highly accurate publication information extraction from academic homepages
Zhang, Yiqing ( 2018)
More and more researchers list their research profiles in academic homepages online. Publications from a researcher's academic homepage contain rich information, such as the researcher's fields of expertise, research interests, and collaboration network. Extracting publication information from academic homepages is an essential step in automatic profile analysis, which enables many applications such as academic search, bibliometrics and citation analysis. The publications extracted from academic homepages can also be a supplementary source for bibliographic databases. We investigate two publication extraction problems in this thesis: (i) Given an academic homepage, how can we precisely extract all the individual publication strings from the homepage? Here, a publication string is a text string that describes a publication record. We call this problem publication string extraction. (ii) Given a publication string, how can we extract different fields, such as publication authors, publication title, and publication venue, from the publication string? We call this problem publication field extraction. There are two types of traditional approaches to these two problems, rule-based approaches and machine learning based approaches. Rule-based approaches cannot accommodate the large variety of styles in the homepages, and they require significant efforts in rule designing. Machine learning based approaches rely on a large amount of high-quality training data as well as suitable model structures. To tackle these challenges, we first collect two datasets and annotate them manually. We propose a training data enhancement method to generate large sets of semi-real data for training our models. For the publication string extraction problem, we propose a PubSE model that can model the structure of a publication list in both line-level and webpage-level. For the publication field extraction problem, we propose an Adaptive Bi-LSTM-CRF model that can utilize the generated and the manually labeled training data to the full extent. Extensive experiment results show that the proposed methods outperform the state-of-the-art methods in the publication extraction problems studied.
• Item
Supervised algorithms for complex relation extraction
Khirbat, Gitansh ( 2017)
Binary relation extraction is an essential component of information extraction systems, wherein the aim is to extract meaningful relations that might exist between a pair of entities within a sentence. Binary relation extraction systems have witnessed a significant improvement over past three decades, ranging from rule-based systems to statistical natural language techniques including supervised, semi-supervised and unsupervised machine learning approaches. Modern question answering and summarization systems have motivated the need for extracting complex relations wherein the number of related entities is more than two. Complex relation extraction (CRE) systems are highly domain specific and often rely on traditional binary relation extraction techniques employed in a pipeline fashion, thus susceptible to processing-induced error propagation. In this thesis, we investigate and develop approaches to extract complex relations directly from natural language text. In particular, we deviate from the traditional disintegration of complex relations into constituent binary relations and propose usage of shortest dependency parse spanning the n related entities as an alternative to facilitate direct CRE. We investigate this proposed approach by a comprehensive study of supervised learning algorithms with a special focus on training support vector machines, convolutional neural networks and deep learning ensemble algorithms. Research in the domain of CRE is stymied by paucity of annotated data. To facilitate future exploration, we create two new datasets to evaluate our proposed CRE approaches on a pilot biographical fact extraction task. An evaluation of results on new and standard datasets concludes that usage of shortest path dependency parse in a supervised setting enables direct CRE with an improved accuracy, beating current state-of-the-art CRE systems. We further show the application of CRE to achieve state-of-the-art performance for directly extracting events without the need of disintegrating them into event trigger and event argument extraction processes.
• Item
Efficient orthogonal parametrisation of recurrent neural networks using householder reflections
Mhammedi, Zakaria ( 2017)
In machine learning, Recurrent Neural Networks (RNNs) have been successfully used in many applications. They are particularly well suited for problems involving time-series. This is because RNNs process input sequences one element at a time, and thus, the chronological order of these elements is taken into account. In many practical prediction tasks involving time-series, long-term time dependencies in data are crucial. These are the features that RNNs need to learn. However, learning these long-term dependencies in sequences using RNNs is still a major challenge. This is mainly due to the exploding and vanishing gradient problems, which are more prominent the longer the input sequences are. Recent methods have been suggested to solve this problem by constraining the transition matrix to be unitary during training, which ensures that its norm is exactly equal to one. This sets an upper bound on the norm of the back-propagated gradients, preventing them from growing exponentially. However, existing methods either have limited expressiveness or scale poorly with the size of the network when compared with the simple RNN case, especially when using stochastic gradient descent. Our contributions are as follows. We first show that constraining the transition matrix to be unitary is a special case of an orthogonal constraint. Therefore, it may not be necessary to work with complex-valued matrices. Then we present a new parametrisation of the transition matrix which allows efficient training of an RNN while ensuring that the matrix is always orthogonal. Furthermore, a good trade-off between speed and expressiveness can be chosen by selecting the right number of reflection vectors - the vectors used in the new parametrisation. In particular, when the number of reflection vector is equal to the size of the hidden layer the transition matrix is allowed to span the full set of orthogonal matrices. Using our approach, one stochastic gradient step can, in the worst case, be performed in time complexity $\mathcal{O}(\bm{T} n^2)$, where $T$ and $n$ are the length of the input sequence and the size of the hidden layer respectively. This time complexity is the same as that of the simple RNN. Finally, we test our new parametrisation on problems with long-term dependencies. Our results suggest that the orthogonal constraint on the transition matrix has similar benefits to the unitary constraint.
• Item
A bit-vector solver based on word-level propagation
Wang, Wenxi ( 2016)
Reasoning with bit-vectors arises in a variety of applications in verification and cryptography. Michel and Van Hentenryck have proposed an interesting approach to bit-vector constraint propagation on the word level. Most of the operations are propagated in constant time, assuming the bit-vector fits in a machine word. In contrast, bit-vector SMT solvers usually solve bit-vector problems by bit-blasting, that is, mapping the resulting operations to conjunctive normal form clauses, and using SAT technology to solve them. This also means generating intermediate variables which can be an advantage, as these can be searched on and learnt about. Since each approach has advantages it is important to see whether we can benefit from these advantages by using a word-level propagation approach with learning. In this dissertation, we describe an approach to bit-vector solving using word-level propagation with learning. We provide alternative word-level propagators to Michel and Van Hentenryck, and we evaluate the variants of the approach empirically. We also experiment with different approaches to learning and back-jumping in the solver. Based on the insights gained, we propose a portfolio solver using machine learning which can enhance state-of-the-art solvers. We show that, with careful engineering, a word-level propagation based approach can compete with (or complement) bit-blasting.