Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 10 of 424
  • Item
    Thumbnail Image
    Word Associations as a Source of Commonsense Knowledge
    Liu, Chunhua ( 2023-12)
    Commonsense knowledge helps individuals naturally make sense of everyday situations and is important for AI systems to truly understand and interact with humans. However, acquiring such knowledge is difficult due to its implicit nature and sheer size, causing existing large-scale commonsense resources to suffer from a sparsity issue. This thesis addresses the challenge of acquiring commonsense knowledge by using word associations, a resource yet untapped for this purpose in natural language processing (NLP). Word associations are spontaneous connections between concepts that individuals make (e.g., smile and happy), reflecting the human mental lexicon. The aim of this thesis is to complement existing resources like commonsense knowledge graphs and pre-trained language models (PLMs), and enhance models’ ability to reason in a more intuitive and human-like manner. To achieve this aim, we explore three aspects of word associations: (1) understanding the relational knowledge they encode, (2) comparing the content and utility for NLP downstream tasks of large-scale word associations with widely-used commonsense knowledge resources, and (3) improving knowledge extraction from PLMs with word associations. We introduce a crowd-sourced large-scale dataset of word association explanations, which is crucial for disambiguating multiple reasons behind word associations. This resource fills a gap in the cognitive psychology community by providing a dataset to study the rationales and structures underlying associations. By automating the process of labelling word associations with relevant relations, we demonstrate that these explanations enhance the performance of relation extractors. We conduct a comprehensive comparison between large-scale word association networks and the ConceptNet commonsense knowledge graph, analysing their structures, knowledge content, and benefits for commonsense reasoning tasks. Even though we identify systematic differences between the two resources, we find that they both show improvements when incorporated into NLP models. Finally, we propose a diagnostic framework to understand the implicit knowledge encoded in PLMs and identify effective strategies for knowledge extraction. We show that word associations can enhance the quality of extracted knowledge from PLMs. The contributions of this thesis highlight the value of word associations in acquiring commonsense knowledge, offering insights into their utility in cognitive psychology and NLP research.
  • Item
    Thumbnail Image
    Multi-document Summarisation Supporting Clinical Evidence Review
    Otmakhova, Yulia ( 2023-12)
    Summarising (often contradictory) results of multiple clinical trials into conclusions which can be safely implemented by medical professionals in their daily practice is a very important, but highly challenging, task. In this thesis, we tackle it from three directions: we present our domain-specific evaluation framework, construct a new dataset for biomedical multi-document summarisation, and conduct experiments to analyse and improve the performance of summarisation models. We first examine what constitutes a well-formed answer to a clinical question, and define its three components -- PICO elements (biomedical entities), direction of findings, and modality (certainty). Next, we present a framework for human evaluation of biomedical summaries, which is based on these aspects and allows non-expert annotators to assess the factual correctness of conclusions faster and more robustly. Then, we use this framework to highlight issues with summarisation models, and examine the possibility of automating the summary evaluation using large generative language models. Following that, we present our multi-document summarisarion dataset which has several levels of inputs and targets granularity (such as documents, sentences, and claims) as well as rich annotation for the clinical evidence aspects we defined, and use it in several scenarios to test capabilities of existing models. Finally, we turn to the question of synthesing the input studies into conclusions, in particular, reflecting the direction and certainty of findings in summaries. First, we attempt to improve aggregation of entities and their relations using global attention mechanism in a pre-trained multi-document summarisation model. As this proves to be difficult, we examine if the models are at least able to detect modality and direction correctly. For that, we propose a dataset of counterfactual summaries and a method to test the models’ sensitivity to direction and certainty. Finally, we outline our preliminary experiments with a large generative language model, which shows some potential for better aggregation of direction values and PICO elements. Overall, the analysis and proposals in this thesis contribute deeper understanding of what is required of summarisation models to be able to generate useful and reliable multi-document summaries of clinical literature, improve their evaluation in that respect, and make a step towards better modeling choices.
  • Item
    Thumbnail Image
    Trustworthy Machine Learning: From Images to Time Series
    Jiang, Yujing ( 2023-09)
    Deep neural networks (DNNs) have demonstrated remarkable performance in several areas, including computer vision, natural language processing, healthcare and medical imaging, speech recognition and synthesis, and many more. Recent research has highlighted the vulnerability of DNNs to adversarial attacks, which can compromise the security and reliability of machine learning models, leading to misclassifications, unauthorized access, or unintended behaviors, posing significant risks in various applications. Adversarial machine learning has emerged as a critical research area that refers to deliberate and malicious attempts to manipulate or deceive machine learning models by exploiting their vulnerabilities to obtain a desired outcome. Evasion attacks, also known as adversarial perturbations or adversarial examples, involve modifying input data to mislead a machine learning model's predictions. The attacker introduces carefully crafted perturbations, which can be imperceptible to humans, to manipulate the model's output. Another prominent and concerning threat in this context is the backdoor attack, where an adversary manipulates the training process of a machine learning model to introduce a hidden trigger, also known as a backdoor, that can be exploited during the model's deployment. This trigger may not be visually imperceptible and is designed to be activated under specific conditions, such as the presence of certain input features. Once the backdoor is implanted, the attacker can exploit it by providing inputs that activate the trigger, causing the model to produce incorrect or manipulated outputs. Attacks and defenses in adversarial machine learning are key components of research aimed at understanding and mitigating the vulnerabilities of machine learning models to adversarial manipulation. By studying attacks, researchers gain insights into the vulnerabilities of machine learning models and systems. This knowledge helps identify potential weaknesses and develop robust and secure solutions. On the other hand, detecting and mitigating adversarial and backdoor attacks are also important research areas to ensure the integrity and trustworthiness of machine learning systems. This knowledge can be used to develop effective countermeasures, improve model robustness, and enhance overall system security. In this thesis, we investigate the trustworthiness of machine learning models and explore their learning behaviors and characteristics. While we investigate these challenges for computer vision applications, we also transfer this knowledge to time series and conduct investigations on the corresponding challenges, including building novel approaches specifically for time series and constructing an end-to-end model for both images and time series. We also explore the possibility of controlling what information can be learned by machine learning models to protect data privacy and mitigate possible attacks. The first part of our work aims to explore a more efficient and effective way to improve adversarial robustness with adversarial training on images. We propose Dual Head Adversarial Training (DH-AT), an improved variant of AT that attaches a second head to one intermediate layer of the network. The two heads can be trained either simultaneously or independently with different training parameters to combine different levels of robustness in a single model. The main head can also be directly loaded from a pre-trained model without any modifications, in which case only one head requires training. In real-world scenarios, the second head and the lightweight CNN together form a strengthening mechanism to improve the adversarial robustness of any existing models. Additionally, the second head can also be switched off when robustness is no longer the primary concern. Adversarial machine learning has been extensively researched on computer vision applications in the context of images, while there are few works on non-DNN-based time series models. It is still unclear which strategies are more effective on time series. Moreover, time series are of diverse types, such as stock prices, temperature readings, weather data, and heart rate monitoring, to name a few. As such, non-flexible attack patterns can hardly be effective on all types of time series. To fill this gap, in the second part of our work, we study the problem of backdoor attacks on time series and propose a novel generative approach for crafting stealthy sample-specific backdoor trigger patterns. We also reveal the unique challenge of time series backdoor attacks posed by the inherent properties of time series. By leveraging generative adversarial networks (GANs), our approach can generate backdoored time series that are as realistic as real-time series, while achieving a high attack success rate. Furthermore, by training the trigger pattern generator on multiple types of time series, we can obtain a universal generator. We also empirically show that our proposed attack can generate stealthy and effective backdoor attacks against state-of-the-art DNN-based time series models and is resistant to potential backdoor defenses. The third part of our work involves training a robust deep learning model in the presence of backdoor samples. We extend the work of Anti-Backdoor Learning (ABL) and propose a novel End-to-End Anti-Backdoor Learning (E2ABL) method that can be used for both image and time series inputs. Different from the original ABL defense which is a complex two-stage training method, E2ABL achieves end-to-end robust training with the help of a second classification head attached to the shallow layers of a DNN. With the second head, E2ABL traps the potential backdoor samples at the shallow layers and purifies their labels dynamically during training. Through extensive experiments, we empirically show that E2ABL outperforms existing defenses by a considerable margin against 9 state-of-the-art image domain and 3 time series domain backdoor attacks. The fourth part of our work extends unlearnable examples from images to time series that use invisible noise to prevent data from being easily exploited by deep learning models. We propose a specific type of error-minimizing noise that aims to make time series data unlearnable to deep learning models. It can be applied at various scales, ranging from the entire time series input to small patches. Importantly, the noise is designed to be resistant to common data filtering methods, ensuring its persistence in obstructing model learning. In summary, this Ph.D. thesis aims to provide comprehensive insights into the domain of trustworthy machine learning, with a specific focus on backdoor attacks, their detection, and mitigation strategies. By investigating various attack models, detection techniques, and mitigation strategies, this research contributes to the development of more robust and secure machine learning systems. The findings presented in this thesis will serve as a valuable resource for researchers, practitioners, and policymakers working in the field of trustworthy machine learning and cybersecurity.
  • Item
    Thumbnail Image
    BRING-YOUR-OWN-DEVICE (BYOD) SECURITY MANAGEMENT IN HOSPITALS – A SOCIOTECHNICAL APPROACH
    Wani, Tafheem Ahmad ( 2023-09)
    Bring-Your-Own-Device or ‘BYOD’ refers to the use of personal devices such as laptops, smartphones, or tablets for work purposes. Among the top industries driving BYOD is healthcare, with a great demand for BYOD use in hospitals. The multifunctional and ubiquitous nature of modern mobile devices allow them to be used for a variety of purposes. These include clinical documentation, electronic medical record and diagnostic services, clinical photography, clinical communication, and collaboration among other tasks. Overall, BYOD in hospitals can improve mobility and productivity among clinicians. However, BYOD use also leads to data security concerns, particularly due to the risk of leaking sensitive patient information. In a BYOD environment, device owners such as doctors, nurses and allied health professionals may hold significant control and custody of sensitive patient data they access through their personal devices. This extends the scope for risks such as staff misuse and human error, known to be the leading cause of healthcare data breaches, especially in the absence of hospital installed security controls. Furthermore, the stringent healthcare data privacy laws which healthcare organisations need to comply with, coupled with the fact that the healthcare industry is most affected by data breaches make BYOD use a major challenge for hospitals. Previous research about BYOD security management generally has been limited, fragmented, and largely techno-centric. More contextualised, industry-based research into BYOD security is called for. Empirical studies exploring hospital BYOD security challenges are scarce and cover few aspects of the topic. Modern healthcare cybersecurity breaches also demand for a systematic and holistic approach in understanding hospital BYOD security. This thesis therefore aimed to address these gaps by investigating hospital BYOD security management through a holistic socio-technical lens. The PPT (People-Policy-Technology) model was used to explore cultural, organisational, managerial and policy related factors and their impact on hospital BYOD security, in addition to technical factors. The research question “How can a socio-technical approach improve BYOD security management in hospitals?” was addressed using Mixed Method Action Research (MMAR), a form of action research where an iterative mechanism was used to synergistically integrate results from multiple studies to answer the research question. First, a literature review identified prominent hospital BYOD security risks and produced a preliminary hospital BYOD (hBYOD) security framework, consisting of guidelines for secure hospital BYOD use. Second, IT management stakeholders and BYOD clinical users were surveyed and interviewed to understand BYOD security management practices employed by Australian hospitals and the clinicians’ preferences and security behaviour with respect to BYOD use respectively. Third, all findings were synthesised and merged through the MMAR approach to refine the hBYOD framework in the light of evidence gathered. Finally, recommendatory guidelines provided by the framework were mapped to a newly formed hospital BYOD security maturity model to streamline their implementation and a pilot implementation study in a major hospital tested the utility of this model. This thesis makes a significant contribution by enabling improvements in hospital data security. It provides comprehensive guidance across the BYOD security lifecycle, allowing evaluation and improvements in hospital BYOD socio-technical security practices through the hospital BYOD security framework and maturity model. It can therefore benefit hospital policymakers, technologists, and clinical stakeholder representatives through informed decision-making and BYOD strategy development. Furthermore, the thesis elucidates how alignment between cultures of clinical productivity and data security may be achieved through the application of socio-technical theory. It also demonstrates the value of participatory and collaborative methods for guideline development in healthcare cybersecurity.
  • Item
    No Preview Available
    Reflected Reality: Augmented Reality Interaction with Mirror Reflections
    Zhou, Qiushi ( 2023-11)
    Mirror reflections enable a compelling visuomotor experience that allows people to simultaneously embody two spaces: through the physical body in front of the mirror and through the reflected body in the illusory space behind the mirror. This experience offers unique affordances for Augmented Reality (AR) interaction that leverages the natural human perception of the relationship between the two bodies. This thesis explores possibilities of AR interaction with mirror reflections through unpacking and investigating this relationship. Through a systematic literature review of Extended Reality interaction that is not from the first-person perspective (1PP), we identify opportunities for novel AR interaction techniques from second-person perspective (2PP) using the reflected body in the mirror (Article I). Following this, we contribute Reflected Reality: a design space for AR interaction with mirror reflections that covers interaction from different perspectives (1PP/2PP), using different spatial frames of reference (egocentric/allocentric), and under different perceptions of the use of the space in the mirror (as reflection/extension of the physical space) (Article II). Previous work and the evaluation results of reflected reality interaction suggest that most of its novel interaction affordances revolve around the physical and the reflected bodies in the egocentric spaces. Following this observation, we conduct two empirical studies to investigate how users perceive virtual object locations around their physical bodies through a target acquisition task (Article III), and to understand how users can perform bodily interaction using their reflected bodies in the mirror through a movement acquisition task following a virtual instructor (Article IV). Together, results from these studies provide a fundamental knowledge base for designing reflected reality interaction in different task scenarios. After investigating the spatial affordance of mirror reflections for AR interaction, this thesis further explores the affordance for embodied perception through the mediation of the reflected user. Intuiting from results of Article IV, we conduct a systematic review of dance and choreography in HCI that reveals opportunities for using AR with mirror reflections to mediate the integration of the visual presentation and kinaesthetic sensation of body movement (Article V). We present the findings and discussions from a series of workshops on dance improvisation with a prototype AR mirror, which reveals the affordance of a multi-layered embodied presence across the mirror perceived by dancers (Article VI). We conclude this thesis with a discussion that summarises the knowledge gained from the empirical studies, elucidates the implications of the design space and novel interaction techniques, and illuminates future research directions inspired by its empirical and theoretical implications.
  • Item
    Thumbnail Image
    Technology for Social Programs in Residential Aged Care: Design and Implementation Considerations
    Thach, Kong Saoane ( 2023-08)
    The ageing population is increasing, leading to a greater need for aged care services. Residential aged care facilities (RACFs) offer 24-hour support for older adults who can no longer live independently. Social care, including recreational activities and emotional support, is essential for the wellbeing of aged care residents, especially those with complex conditions. To provide better care, some facilities have adopted technology-based activities to enhance residents’ experiences, including immersive virtual reality (VR) for enrichment and video calling for social connectedness. However, the introduction of technology comes with challenges, making it necessary to carefully design and implement. To ensure technology benefits residents and does not overburden caregivers, it is vital to conduct a systematic investigation of the challenges, and to identify design and implementation considerations. This interdisciplinary research adopts a socio-technical perspective, using Greenhalgh et al.’s NASSS framework to investigate challenges and opportunities in adopting technology in RACF social programs. The thesis aims to address the research question: "How can technology in social programs be effectively designed and implemented in RACFs?". This research examined two types of technologies as case studies: immersive VR and video-calling. For each case study, I conducted systematic literature reviews, surveys with aged care staff, and interviews with staff to understand the complexities of adopting these technologies. An in-depth case study with an aged care assistant who had significant experience with using VR for people with dementia was conducted for VR, and thematic analysis of multiple interviews with aged care staff members working in different roles was conducted for video-calling. An ethnographic study was also conducted at a Victoria-based RACF to better understand real-world contextual issues impacting the design and implementation of technology in RACFs. This thesis has made significant contributions to the fields of human-computer interaction, information systems, and gerontology. It provides empirical evidence of the complexities arising from the adoption and utilization of technology in RACF social programs. Although prior empirical research indicates that older adults generally have positive emotional responses while experiencing technology-based social programs, maintaining such programs can be challenging due to complexities related to residents' conditions, the technology itself, staff facilitation, and organizational constraints. Cultural factors are also important, especially for those with past traumas. Staff play a crucial role facilitating technology-based activities. Successful facilitation requires a deep understanding of residents, effective communication, and support from the organization. This thesis also offers practical insights for effectively designing and implementing technology in RACFs, introducing a theoretical framework, namely ITERAC (Implementation of Technology for Enrichment in Residential Aged Care), to guide such implementation. That is, to provide enrichment through technology-based social programs, a facilitation triad comprising staff, residents, and technology, with support from the RACF and relevant stakeholders, is essential. It can serve as a promising guide for RACFs to adopt technologies in social programs. The thesis also suggests avenues for future research in this field.
  • Item
    Thumbnail Image
    Workflow Scheduling in Cloud and Edge Computing Environments with Deep Reinforcement Learning
    Jayanetti, Jayanetti Arachchige Amanda Manomi ( 2023-08)
    Cloud computing has firmly established itself as a mandatory platform for delivering computing services over the internet in an efficient manner. More recently, novel computing paradigms such as edge computing have also emerged to complement the traditional cloud computing paradigm. Owing to the multitude of benefits offered by cloud and edge computing environments, these platforms are increasingly used for the execution of workflows. The problem of scheduling workflows in a distributed system is NP-Hard in the general case. Scheduling workflows across highly dynamic cloud and edge computing environments is even more complex due to inherent challenges associated with these environments including the need to satisfy diverse contradictory objectives, coordinating executions across highly distributed infrastructures and dynamicity of the operating conditions. These requirements collectively give rise to the need for adaptive workflow scheduling algorithms that are capable of satisfying diverse optimization goals amid highly dynamic conditions. Deep Reinforcement Learning (DRL) has emerged as a promising paradigm for dealing with highly dynamic and complex problems due to the ability of DRL agents to learn to operate in stochastic environments. Despite the benefits of DRL, there are multiple challenges associated with the application of DRL techniques including multi-objectivity, curse of dimensionality, partial observability and multi-agent coordination. In this thesis, we propose novel DRL algorithms and architectures to efficiently overcome these challenges.
  • Item
    Thumbnail Image
    Data driven models for fine-grained framing analysis in political debates and the media
    Khanehzar, Shima ( 2023-06)
    News media and political speeches play an important role in shaping public opinion on social issues by presenting facts and events in a biased/colored way. Accordingly, framing -- the practice of highlighting, emphasizing, or obscuring some aspects of an issue -- is a central concept in communication and political studies. While automatically detecting and identifying frames has attracted much attention in NLP and spawned a variety of methods, it poses several challenges, such as vagueness and complexity in defining frames, the abundance of text data, the dynamic nature of language, and the variation in using frames across different contexts. The aim of this thesis is to automatically predict different types of framing across contexts: news articles and political speeches. It develops machine learning and NLP approaches to solve challenges in the computational modeling of framing. The research provides theory-driven models for various types of framing at different levels of abstraction and units of analysis (document-, span-, and lexical-level) and demonstrates the added value of analysis connecting these levels. It investigates the extent to which pretrained language models can be used to model these concepts of framing. Additionally, it shows the capacity of the model to leverage additional unlabeled data, and to increase the transparency of the predictions. The thesis also investigates the differences in narrative and textual structure between news media framing and framing in political speeches. Lastly, to gain a deeper understanding of framing, the research investigates broader modeling framing across growing political issues (immigration and same-sex marriage) in the US and Australia with different political structures. To facilitate this analysis, a novel dataset of Australian political speeches pertaining to these issues is introduced. The findings of this thesis demonstrate the potential of our models to uncover framing biases in news articles and political speeches. These models can be used for future applications, such as: automatic, yet transparent, highlighting of reporting patterns across countries or news outlets; frame-guided summarization, which can support both frame-balanced or frame-specific news summaries.
  • Item
    Thumbnail Image
    Effective, Efficient, and Generalizable Algorithms for Trajectory Similarity Queries
    Chang, Yanchuan ( 2023-08)
    Trajectory data is becoming increasingly accessible due to the prevalence of GPS-equipped devices, such as smart phones and vehicles. This type of data contains rich location and movement information of people which enables a wide range of location-based applications, such as carpooling, contact tracing, urban planning, traffic analysis and location-based recommendation. How to retrieve trajectories from a large volume of trajectory data effectively and efficiently has become an important area of research, which has attracted extensive interests from both the academia and the industry. Existing trajectory query algorithms struggle to meet the requirements of emerging applications in both effectiveness and efficiency, particularly as the trajectories grow in length. These limitations underscore the pressing need for novel trajectory query algorithms. This thesis addresses the need with a focus on trajectory similarity queries. The thesis studies four problems related to trajectory similarity queries. The first problem, sub-trajectory similarity join, is a new type of trajectory similarity queries. While most existing studies focus on querying trajectories that are similar to each other in their entirety, we propose a new trajectory similarity measure that focuses on the partial similarity of trajectories. We measure the length of the time duration that two trajectories are close (i.e., within a certain distance in space). We define the sub-trajectory similarity join query based on this measure, which returns pairs of trajectories satisfying a sub-trajectory similarity threshold. Such queries target contact tracing and carpooling applications. We present a client-server-based distributed index structure and a query algorithm with an efficient backtracking technique for the join query. Theoretical analysis and experiments on real data confirm the effectiveness and the efficiency of our proposed index structure and query algorithm. The second problem, road network representation learning, concerns trajectory queries on road networks. Our aim is to learn a task-agnostic road network representation that can be applied to different trajectory queries, thus avoiding storing multiple task-specific road network representations and improving the storage space efficiency of trajectory databases. Existing road network representation learning approaches cannot satisfy this goal, since most approaches are supervised learning-based that learn task-specific road network representations. Further, these approaches exploit generic graph neural networks methods to learn graph representations based on topological features while ignoring the spatial features of road networks, which are important to trajectory applications. To address these issues, we propose a self-supervised contrastive learning method to learn generic and task-agnostic road network representations. We devise four novel modules to learn spatial features and spatial correlations of road networks. Once trained, the road network representation can be directly applied to different trajectory queries without any fine-tuning. Experimental results on different trajectory queries, such as trajectory similarity measurement and shortest-path-based trajectory route planning, show that the proposed model outperforms state-of-the-art self-supervised models consistently and even achieves comparable performance to the supervised models. The third problem, trajectory similarity learning, concerns trajectory representation for trajectory similarity measurement in Euclidean space. Our aim is to learn a trajectory representation that enables effective and efficient similarity evaluation between two given trajectories, which is a core operator in trajectory query processing. Motivated by the strong representation learning capability of contrastive learning, we again propose a contrastive learning-based method. We design a novel dual-feature self-attention-based trajectory backbone encoder and four trajectory dedicated augmentation methods for trajectory representation learning, to encode both the coarse-grained and the fine-grained spatial properties of trajectories into the learned representations. Once trained, the backbone encoder can be used on its own for trajectory representation computation and similarity estimation. It can also be fine-tuned to compute an approximation of traditional heuristic trajectory similarity measures such as the Frechet measure. Experimental results show that our proposed approach produces trajectory representations that lead to consistently more accurate trajectory similarity measures than those of the state-of-the-art approaches. The forth problem concerns an in-depth analysis on existing trajectory similarity measures. Our aim is to provide a comprehensive comparison of the heuristic trajectory similarity measures and the deep learning-based ones from an efficiency perspective, and to analyze their strengths, limitations, and applicable scenarios. Recent studies on the learned trajectory similarity measures focus on how to accurately approximate heuristic trajectory similarity measures. They have largely omitted the efficiency considerations. We implement deep learning-based and heuristic approaches on both CPU and GPU for a fair efficiency comparison. Experimental results show that, heuristic approaches run faster than deep learning-based approaches when measuring the similarity between two trajectories on both CPU and GPU without any pre-computation. Once trajectory embeddings are given and can be reused, some deep learning-based approaches can achieve better computational efficiency than the heuristic ones. We also conduct experiments on kNN queries by using the dedicated index structures, where the deep learning-based approaches consistently outperform the heuristic ones on efficiency. This study shows clearly which class of method should be applied for what purpose and given a set of experimental circumstances.
  • Item
    Thumbnail Image
    Towards Accurate and Reliable Modelling for Semantic Textual Similarity
    Wang, Yuxia ( 2023-04)
    Semantic Textual Similarity (STS) is a natural language understanding task. It measures the degree of semantic equivalence between two snippets of text, ranging from exact semantic equivalence to complete unrelatedness quantified by continuous values. This allows STS to capture the notion of intermediate shades of similarity. With the fundamental understanding of sentences and intuitive outputs, STS has been extensively applied to various natural language processing tasks, such as information retrieval, machine translation (MT), summarisation, and question answering (QA). Recent publications increasingly argue that state-of-the-art STS models outperform the average human performance over existing general-language STS datasets. However, accuracy of STS in knowledge-rich domains with a limited amount of training data, such as clinical and biomedical text, still lags behind. Moreover, similar to other classifiers based on deep neural networks (DNNs), STS models have also been empirically demonstrated to have poor calibration. That is, the predictive probability does not reflect the true correctness likelihood. DNNs are prone to be over-confident when they make wrong predictions (Guo et al., 2017), which can be catastrophic in safety-critical applications, such as autonomous driving or clinical decision support. To this end, we aim to enhance the generalisation performance and improve reliability of STS, in terms of more accurate and reliable modelling, particularly in low-resource scenarios, such as for clinical texts. Accordingly, we address three main research questions in this thesis: (1) How to improve the generalisation accuracy of domain-specific STS, with an emphasis on clinical STS? (2) How to alleviate the detrimental effects of noisy labelled data on STS and other textual regression tasks? (3) How to recognise noisy labels and provide trustworthy predictions for textual regression tasks, to better support safety-critical applications? We address the first research question from two perspectives. To alleviate data sparsity, we perform data augmentation and optimise model configurations in the supervised scenario. In the zero-shot setting, silver labels and synthetic generation via rules are applied, coupled with several sampling strategies in the framework of self-learning. These strategies are demonstrated to be effective to improve generalised accuracy by mitigating exposure bias, with the increased amount of training examples. To overcome the difficulties in semantically understanding low-frequency clinical terms in STS sentence pairs, we link clinical synonyms by extracting and normalising clinical mentions to a unique identifier in the UMLS thesaurus, and then make a comparison based on the unified identifier, instead of latent representations. Next, for the second research question, we propose a simple noisy label detection method for continuous labels to prevent error propagation from the input layer. The approach is based on the observation that the projection mapping sentence pairs to noisy labels is learned through memorisation at advanced stages of learning, and that the Pearson correlation is sensitive to outliers. A spectrum of experiments on STS covering three major real-world noisy label sources, including careless annotation aggregation, inherent label uncertainty due to domain complexity and data augmentation, demonstrates that the proposed approach is more effective than standard regularisation strategies, such as early stopping or dropout. It is also shown to have similar benefits on two other textual regression tasks: sentiment analysis rating and machine translation quality estimation. Finally, in response to the third research question of how to provide trustworthy predictions, we examine the use of uncertainty models, such as Gaussian process regression and deep Bayesian networks. They not only predict the aggregated similarity score (averaged human opinion), but also estimate the predictive uncertainty for each instance. However, due to a lack of access to the full distribution of human judgements in gold standard data sets, we are only able to evaluate the outputs in terms of model calibration and reliability, using metrics of expected calibration error and negative log-probability density. It remains unknown whether these uncertainty-aware models capture the aspects of semantic uncertainty that cause disagreement among humans in the similarity score assessment. To close this gap of analysing collective human opinions in STS, we collect the first uncertainty-aware STS corpus. It consists of ~15,000 Chinese sentence pairs with 150,000 annotations. The analysis shows that current models do not capture the distribution of human opinions. Rather, the estimated variance tends to reflect the predictive confidence over the whole dataset. In summary, this thesis presents a number of methods for improving the generalisation accuracy and the predictive reliability of semantic textual similarity modelling. The validity of proposed models is demonstrated under different task and data scenarios and serves as a stepping stone for future research.