Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Enhancing Deep Multimodal Representation: Online, Noise-robust and Unsupervised Learning
    Silva, Dadallage Amila Ruwansiri ( 2022)
    Information that is generated and shared today uses data that involves different modalities. These multimodalities are not limited to the well-known sensory media (e.g., text, image, video, and audio), but could be any abstract or inferred form of encoding information (e.g., propagation network of a news article and sentiment of a text) that represents a different viewpoint of the same object. For machine learning models to be competitive with humans, they should be able to extract and combine information from these modalities. Thus, multimodal representation learning has emerged as a broad research domain that aims to understand complex multimodal environments while narrowing the heterogeneity gap among different modalities. Due to the potential of representing latent information in complex data structures, deep learning-based techniques have recently attracted much attention for multimodal representation learning. Nevertheless, most existing deep multimodal representation learning techniques lack the following: (1) ability to continuously learn and update representations in a memory-efficient manner while being recency-aware and avoiding catastrophic forgetting of historical knowledge; (2) ability to learn unsupervised representations for under-exploited multimodalities with complex data structures (i.e., temporally evolving networks) and high diversity (cross-domain multimodal data); and (3) ability to directly serve as features to address various real-world applications without fine-tuning using an application-specific labelled dataset. This thesis aims to bridge these research gaps in deep multimodal representation learning approaches. In addition, this thesis addresses real-world applications involving multimodal data such as misinformation detection, spatiotemporal activity modeling and online market basket analysis. The main contributions of this thesis include: (1) proposing two novel online learning strategies for learning deep multimodal representations, and proposing two frameworks using the proposed online learning strategies to address two real-world applications -- i.e., user-guided spatiotemporal activity modeling (USTAR) and online market basket analysis (OMBA); (2) proposing METEOR, a memory and time efficient online representation learning algorithm for making deep multimodal representations compact and scalable to cope with the different data rates of real-world multimodal data streams; (3) developing an unsupervised framework to capture and preserve domain-specific and domain-shared knowledge in cross-domain data streams, and applying the proposed framework to address cross-domain fake news detection; (4) proposing an unsupervised model to learn representations for temporally evolving graphs by mimicking the future knowledge of an evolving graph at an early timestep, and developing a new framework called Propagation2Vec with the help of the proposed objective functions for fake news early detection; and (5) developing a theoretically-motivated noise-robust unsupervised learning framework, which can filter out the noise (i.e., fine-tune) in multimodal representations learned from general pretraining objective functions without requiring a labelled dataset, and applying the findings to address the unsupervised fake news detection task.