Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Improving the utility of topic models: an uncut gem does not sparkle
    LAU, JEY HAN ( 2013)
    This thesis concerns a type of statistical model known as topic model. Topic modelling learns abstract “topics” in a collection of documents, and by “topic” we mean an idea, theme or subject. For example we may have an article that discusses space exploration, or a book about crime. Space exploration and crime, these two subjects, are the “topics” that we are talking about. As one imagine, topic modelling has a direct application in digital libraries, as it automates the learning and categorisation of topics in books and articles. The merit of topic modelling, however, is that its machinery is not limited to processing just words but symbols in general. As such, topic modelling has seen applications in other areas outside text processing such as biomedical research for inferring protein families. Most applications, however, are small scale and experimental and much of the impact is still contained in academic research. The overarching theme of the thesis is thus to improve the utility of topic modelling. We achieve this in two ways: (1) by improving a few aspects of topic modelling to make it more accessible and usable by users; and (2) by proposing novel applications of topic modelling to real-world problems. In the first step, we look into improving the preprocessing methodology of documents that serves as the creation of input for topic models. We also experiment extensively to improve the visualisation of topics—one of the main output of topic models—to increase its usability for human users. In the second step, we apply topic modelling in a lexicography-oriented work to learn and detect new meanings that have emerged in words and in the social media space to identify popular social trends. Both were novel applications and delivered promising results, demonstrating the strength and wide applicability of topic models.