Computing and Information Systems - Theses

Permanent URI for this collection

Search Results

Now showing 1 - 1 of 1
  • Item
    Thumbnail Image
    Linking news and tweets
    Lin, Xiaojie ( 2016)
    In recent years, the rise of social media such as Twitter has been changing the way people acquire information. Meanwhile, traditional information sources such as news articles are still irreplaceable. These have led to a new branch of study on understand- ing the relationship between news articles and social media posts and fusing information from these heterogeneous sources. In this study, we focus on techniques for linking news and relevant tweets. Specifically, given a set of news articles and a set of tweets, we aim to find a subset of relevant tweets for each article in the news set. We propose a super- vised linking algorithm that employs multiple features such as time, named entities and event phrases, which outperforms all the baseline methods in our experiments. Since humans can accurately determine whether a tweet is relevant to a news article, we study two types of crowdsourcing algorithms, namely validation algorithms and amendment algorithms. Validation algorithms put a news-tweet pair in the result set only if the pair is validated by the crowd in some way. We propose three validation algorithms, namely Highest, EP and EP+, all of which effectively collect relevant tweets while maintaining a high precision in the experiments. Amendment algorithms first use a machine-based approach to generate an initial set of results, and then improve the result set by asking the crowd informative questions. The amendment algorithms Half , HHalf and LHalf also demonstrate their effectiveness on improving the quality of the result sets in the experi- ments. We extend our study to social network subgraph matching algorithms, which can be used to i) improve the performance of our linking algorithms; ii) analyze and extract information from the news-tweet network or other social network graphs. We propose an efficient GPU algorithm, which significantly outperforms the state-of-the-art CPU al- gorithm in terms of running time in our experiments when the workload is heavy.