Search results personalization in microblogging environments
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2018 Dr Sameendra Madushan Smarawickrama
Social networking and microblogging platforms such as Twitter, Facebook, Tumbler, Instagram etc., have become an integral part of day to day lives of people today for real-time information exchanging. Twitter has become one of the most popular microblogging avenues in the recent years. Today, it hosts more than 300 million monthly active users and generates more than 500 million tweets everyday. Twitter users both publish messages as well as search for messages. Current search results given by Twitter are chronologically ordered and often users have to manually scan through an overwhelming number of tweets to find the content of interest. This process can quickly become infeasible. Personalization techniques address this problem by learning the user interests and tailoring search results by matching them with the user’s interests. There has been a tremendous amount of work done in the domain of web search results personalization. However, research on personalization in microblogging environments such as Twitter, is very sparse. Microblogging environments differ from traditional web environments in several ways: microblogs are very short compared to web documents, they are noisy with informal language and they contain special entities such as hashtags and user mentions. Compared to the web domain, microblogging environments are rich in social interactions as well. Therefore, in this thesis, we propose novel approaches to personalized search in microblogging environments using the cutting-edge technologies used in the text/data mining research. We use Twitter as a specific use case. Our first approach is based on the use of topic modelling algorithms for search results personalization. Coping with sparsity is one of the major challenges when applying topic modelling algorithms on short text documents such as tweets. We conduct an in-depth investigation of how topic modelling algorithms can be applied on short text environments and propose a new tweet grouping scheme to solve the sparsity problem which outperforms the existing schemes. We then use the proposed grouping scheme to train topic models on the user’s past tweets which are then used for search results personalization. Our second approach is based on the use of neural word embeddings which has gained much attention recently due to its performance in various NLP tasks. We use neural word embeddings to build user profiles by finding semantically related words to a given word. Search results are then personalized using this user profile. Our third approach uses word sense induction techniques to identify different meanings associated with the user’s query in the initial search results and use this information to best match search results with the user’s profile. We also introduce two novel offline evaluation techniques based on Twitter list combinations and hashtags. The list combinations based approach is based on the assumption that a list is composed of like minded users and if a user who is a member of a particular list initiates a search, matching tweets by users in the same list are relevant. The hashtag based approach is based on the assumption that when a user initiates a search with a particular hashtag, user’s own tweets with that particular hashtag are relevant to the user. We evaluate our proposed personalization approaches using both of these offline evaluation techniques. Finally, we build PTSE, a web based service incorporating our personalization approaches where users can log in with their Twitter handle and submit search queries to obtain personalized results.
Keywordssocial media mining; personalization; user modelling; twitter; microblogs
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References