Training accurate, diverse, and unbiased recommender systems by joint optimization
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2020 Xiaojie Wang
In today's era of information explosion, users are faced with an overwhelming number of alternative items to choose from. For example, the Netflix website has millions of movies for users to watch, whereas Amazon Kindle Store has millions of books for users to read. Typically, a small fraction of items satisfy users' information needs and it is difficult for users to find these items given the sheer number of all items. Recommender systems serve as primary means to help users easily and quickly find what they want with the aim of presenting their desired items right in front of them. Unlike search systems which require a user to explicitly formulate his or her information needs as a query and retrieve items that are good matches to the query, recommender systems do not require explicit queries from users and proactively present items that the users may enjoy according to users' preferences. Recommender systems have a wide range of applications in the real world and are extensively used in our daily life. For example, one real-world application of search systems is the Google search system: it provides a rank list of relevant documents to a user after receiving a query issued by the user. For example, one real-world application of recommender systems is the Netflix recommendation system: it suggests movies to a user in the hope that the user will enjoy watching the suggested movies. Another real-world application of recommender systems is the YouTube recommendation engine: it helps to select videos that users want to watch from a huge number of available videos online. Search and recommender systems are important in the era of information explosion for two reasons. First, search and recommender systems play an important role in helping users find results that are of interest. Second, search and recommender systems contribute billions of dollars in net revenue to industry every year. Recommender systems play an important role in helping users find results that are of interest and contribute billions of dollars in net revenue to industry every year. In this thesis, we focus on improving recommender systems in terms of three different aspects, i.e., accuracy, diversity, and unbiasedness, each of which has its own unique challenges. (1) A common issue with search systems is that most queries are ambiguous or too broad in specifying user intents. To address this issue, search result diversification aims to return a rank list of diverse results that cover as many user intents of a query as possible. To choose appropriate algorithms for search result diversification, we design widely applicable metrics to faithfully measure the diversity of search results. (1) First, we aim to improve the accuracy of recommender systems. We identify an important type of resources in many applications of recommender systems, e.g., recommending tags for users to label images uploaded to image-hosting websites. Such type of resources, which we call privileged provisions, is available when using labeled data to train models and is not available when using models to predict unseen data. We propose two general classes of algorithms, collectively referred to as adversarial distillation, that can use privileged provisions to achieve accurate recommendation results. Moreover, we provide rigorous proofs to show that the proposed algorithms have non-trivial theoretical guarantees to achieve global optimal recommendation results. (2) Then, we aim to improve the diversity of recommender systems. We observe that preferences in diverse results vary across users: a user may be interested in only action movies and another user may enjoy a variety of movie categories. We propose a novel algorithm that can explicitly consider such users' preferences over different item categories when recommending a personalized rank list of items. To effectively measure how well the recommended items satisfy individual user's preferences, we design a metric that can measure personalized diversity of recommendation results. % Extensive experiments on two real-world datasets confirm the superiority of the proposed algorithm and show the effectiveness of the proposed measure in capturing user preferences. Extensive experiments show that the proposed algorithm outperforms state-of-the-art algorithms by up to 10.0% and 10.8% in terms of an existing metric and the proposed metric, respectively. (3) Last, we aim to improve the unbiasedness of recommender systems. Datasets for training recommender systems are often biased and a widely-recognized challenge is to present accurate recommendation results given biased datasets for training. To address this challenge, we propose a doubly robust estimator to unbiasedly measure the error of recommendation results and develop a joint learning approach to obtain accurate recommender systems. The proposed approach outperforms state-of-the-art approaches by a 12% drop in the recommendation error. Although recommendation datasets are often biased, it is usually possible to gather a small unbiased dataset with a reasonable cost in practice. We propose a bi-level learning approach that can effectively leverage additional information from the small unbiased dataset to boost the accuracy of recommendation results. The proposed approach achieves up to a 7.9% drop in recommendation error compared to previous approaches.
Keywordsdiversity; distillation; debiasing
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References