Similarity based clustering in large-scale, decentralized overlay networks
AuthorBukhari, Irum Fahim
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2018 Dr. Irum Fahim Bukhari
Collaborative filtering, such as recommendation algorithms for example where Amazon may recommend books to a customer based on other customers’ feedback and profile similarity, and/or similarity between other books, has become a key approach for effectively providing people with the most relevant information to them. Some large-scale applications, like file sharing, instant news broadcasting, and personal data management, are decentralized in their operation, and the standard centralized approach for collaborative filtering, whereby a centralized system receives all of the data and computes the outcomes, is not applicable. For these applications to remain decentralized while making use of collaborative filtering, requires research into how to decentralize the collaborative filtering computations. Many large-scale decentralized systems use an interconnection network overlay over the Internet, where individual nodes in the system connect to some set of other nodes, and to achieve scalability in this case it is widely accepted that gossiping models are the most effective choice. For collaborative filtering in these systems, the nodes themselves are typically considered to be "users”, that are searching for information, and the similarity between users can be directly instantiated in the overlay network, where similar users are directly connected to each other. On this basis, recommendation becomes straight forward, because neighboring nodes will likely have relevant information. We call this Similarity Based Clustering (SBC) of an overlay network. In this thesis, we improve upon the state-of-the-art approach for gossip-based SBC overlays, by proposing a new protocol, called Optimum Benefit Protocol, that outperforms existing protocols in terms of key metrics including convergence time, bandwidth consumption and node churn. We have implemented our protocol on PeerNet, a simulation environment for large-scale decentralized systems, and obtained a range of results using real-world data sets drawn from Movielens, Yahoo WebScope and Epinions, that provide user profiles, user ratings, and item information from big data sources. Specifically our approach addresses the "long-tail convergence” problem observed where some overlay networks take a very long time to converge for some applications. Overall we show that our protocol is competitive for building large-scale decentralized systems that require collaborative filtering.
Keywordsgossip protocols; convergence; decentralized similarity overlays; bandwidth; node churn; similarity clustering; collaborative filtering; degree distribution
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References