Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 3 of 3
  • Item
    Thumbnail Image
    Analysing the interplay of location, language and links utilising geotagged Twitter content
    Afshin, Rahimi ( 2018)
    Language use and interactions on social media are geographically biased. In this work we utilise this bias in predictive models of user geolocation and lexical dialectology. User geolocation is an important component of applications such as personalised search and recommendation systems. We propose text-based and network-based geolocation models, and compare them over benchmark datasets yielding state-of-the- art performance. We also propose hybrid and joint text and network geolocation models that improve upon text or network only models and show that the joint models are able to achieve reasonable performance in minimal supervision scenarios, as often happens in real world datasets. Finally, we also propose the use of continuous representations of location, which enables regression modelling of geolocation and lexical dialectology. We show that our proposed data-driven lexical dialectology model provides qualitative insights in studying geographical lexical variation.
  • Item
    Thumbnail Image
    Improving the utility of social media with Natural Language Processing
    HAN, BO ( 2014)
    Social media has been an attractive target for many natural language processing (NLP) tasks and applications in recent years. However, the unprecedented volume of data and the non-standard language register cause problems for off-the-shelf NLP tools. This thesis investigates the broad question of how NLP-based text processing can improve the utility (i.e., the effectiveness and efficiency) of social media data. In particular, text normalisation and geolocation prediction are closely examined in the context of Twitter text processing. Text normalisation is the task of restoring non-standard words to their standard forms. For instance, earthquick and 2morrw should be transformed into “earthquake” and “tomorrow”, respectively. Non-standard words often cause problems for existing tools trained on edited text sources such as newswire text. By applying text normalisation to reduce unknown non-standard words, the accuracy of NLP tools and downstream applications is expected to increase. In this thesis, I explore and develop lexical normalisation methods for Twitter text. I shift the focus of text normalisation from a cascaded token-based approach to a type-based approach using a combined lexicon, based on the analysis of existing and developed text normalisation methods. The type-based method achieved the state-of-the-art end-to-end normalisation accuracy at the time of publication, i.e., 0.847 precision and 0.630 recall on a benchmark dataset. Furthermore, it is simple, lightweight and easily integrable which is particularly well suited to large-scale data processing. Additionally, the effectiveness of the proposed normalisation method is shown in non-English text normalisation and other NLP tasks and applications. Geolocation prediction estimates a user’s primary location based on the text of their posts. It enables location-based data partitioning, which is crucial to a range of tasks and applications such as local event detection. The partitioned location data can improve both the efficiency and the effectiveness of NLP tools and applications. In this thesis, I identify and explore several factors that affect the accuracy of text-based geolocation prediction in a unified framework. In particular, an extensive range of feature selection methods is compared to determine the optimised feature set for the geolocation prediction model. The results suggest feature selection is an effective method for improving the prediction accuracy regardless of geolocation model and location partitioning. Additionally, I examine the influence of other factors including non-geotagged data, user metadata, tweeting language, temporal influence, user geolocatability, and geolocation prediction confidence. The proposed stacking-based prediction model achieved 40.6% city-level accuracy and 40km median error distance for English Twitter users on a recent benchmark dataset. These investigations provide practical insights into the design of a text-based normalisation system, as well as the basis for further research on this task. Overall, the exploration of these two text processing tasks enhances the utility of social media data for relevant NLP tasks and downstream applications. The developed method and experimental results have immediate impact on future social media research.
  • Item
    Thumbnail Image
    Mitigating the risk of organisational information leakage through online social networking
    Abdul Molok, Nurul Nuha ( 2013)
    The inadvertent leakage of sensitive organisational information through the proliferation of online social networking (OSN) is a significant challenge in a networked society. Although considerable research has studied information leakage, the advent of OSN amongst employees represents new fundamental problems to organisations. As employees are bringing their own mobile devices to the workplace, which allow them to engage in OSN activities at anytime and anywhere, reported cases involving leakage of organisational information through OSN are on the rise. Despite its opportunities, OSN has the tendency to blur the boundaries between employees’ professional and personal use of social media, presenting challenges for organisations to protect the confidentiality of their valuable information. The thesis investigates two phenomena. First, it explores the disclosure of sensitive organisational information by employees through the use of social media. Second, it looks into organisational security strategies employed to mitigate the associated security risks. During the first multiple-case study, employees across four organisations were interviewed to understand their OSN behaviour and the types of work-related information they disclosed online. In the second multiple-case study, the researcher went back to the same organisations and interviewed security managers to understand potential security impacts of employees’ OSN behaviour, and the various security strategies implemented in the organisations. The findings emerging from these interpretive multiple-case studies, based on rich insights from both employees and security managers, led to the development of a maturity framework. This framework can assist organisations to assess, develop or improve their security strategies to mitigate social media related risks. The framework was evaluated through focus groups with experts in security and social media management. The research, which consists of two sets of multiple case studies and focus groups, has resulted in three main contributions as stated below: 1. Understanding of contextual influences on the disclosure of sensitive organisational information, from multiple perspectives 2. Identification of the influence of managerial attitudes on the deployment of a particular information security strategy, especially in relation to social media use amongst employees 3. Development and evaluation of a Maturity Framework for Mitigating Leakage of Organisational Information through OSN As suggested by the literature, security behaviour can be either intentional or unintentional in nature. However, this research found that information leakage through employees’ OSN was more unintended than intended, which indicated that generally, employees did not mean to cause security problems to organisations. The research also provided evidence that information leakage through OSN was due to influences that could be categorized into personal, organisational and technological factors. Interestingly, employees and security managers had different understandings of why information leakage through OSN happens. Employees demonstrated that leakage was inadvertent, while security managers did not understand that employees had no intention of causing security problems. These findings suggested that information leakage via OSN could be effectively mitigated by organisations, depending on the way the managemet perceived how employees’ OSN behaviour could jeopardise the confidentiality of information. In accordance to the security literature, this research found different kinds of security strategies that organisations employed to mitigate security issues posed by OSN. Interestingly, this research also found that across the organisations, these security strategies varied in their levels of sophistication, revealing certain managerial attitudes which influenced the organisational capability to manage the risk of leakage via employees’ OSN. Since the higher level of strategy sophistication actually results in more risk-averse employee OSN behaviour, this research identified relationships between employee OSN behaviour, OSN security strategies and the managerial attitudes. For example, the organisation that received little management support on security initiatives tended to have poorly developed controls, which resulted in low level of employees’ awareness of risky OSN behaviour. Finally, this research culminated in the development of a Maturity Framework for Mitigating Leakage of Organisational Information through OSN which was evaluated by security experts through focus groups. This framework can be used by organisations to assess how well their current information security measures can be expected to protect them from this insider threat. It can also provide recommendations for organisations to improve their current OSN security strategies.