Highly efficient distributed hypergraph analysis: real-time partitioning and quantized learning
AffiliationComputing and Information Systems
Document TypeMasters Research thesis
Access StatusOpen Access
© 2018 Wenkai Jiang
Hypergraphs have been shown to be highly effective when modeling a wide range of applications where high-order relationships are of interest, such as social network analysis and object classification via hypergraph embedding. Applying deep learning techniques on large scale hypergraphs is challenging due to the size and complex structure of hypergraphs. This thesis addresses two problems of hypergraph analysis, real-time partitioning and quantized neural networks training, in a distributed computing environment. When processing a large scale hypergraph in real-time and in a distributed fashion, the quality of hypergraph partitioning has a significant influence on communication overhead and workload balance among the machines participating in the distributed processing. The main challenge of real-time hypergraph partitioning is that hypergraphs are represented as a dynamic hypergraph stream formed by a sequence of hyperedge insertions and deletions, where the structure of a hypergraph is constantly changing. The existing methods that require all information of a hypergraph are inapplicable in this case as only a sub-graph is available to the algorithm at a time. We solve this problem by proposing a streaming refinement partitioning (SRP) algorithm that partitions a real-time hypergraph flow in two phases. With extensive experiments on a scalable hypergraph framework named HyperX, we show that SRP can yield partitions that are of the same quality as that achieved by offline partitioning algorithms in terms of communication overhead and workload balance. For machine learning tasks over hypergraphs, studies have shown that using deep neural networks (DNNs) can improve the learning outcomes. This is because the learning objectives in hypergraph analysis are becoming more complex these days, where features are difficult to define and are highly-correlated. DNNs can be used as a powerful classifier to construct features automatically. However, DNNs require high computational power and network bandwidth as the size of DNN models are getting larger. Moreover, the widely adopted training algorithm, stochastic gradient descent (SGD), suffers in two main problems: vast communication overhead that comes from the broadcasts of parameters during the partial gradient aggregations, and the inherent variance between partial gradients, making the training process even longer as it impedes the convergence rate of SGD. We investigate these two problems in depth. Without sacrificing the performance, we develop a quantization technique to reduce the communication overhead and a new training paradigm, named cooperated low-precision training (C-LPT), in which importance sampling is used to reduce variance, and the master and workers collaborate together to make compensation for the precision loss due to the quantization. Incorporating deep learning techniques into distributed hypergraph analysis shows a great potential in query processing and knowledge mining on high-dimensional data records where relationships among them are highly correlated. On one hand, such a process takes the advantage of strong representational power of DNNs as an appearance-based classifier; on the other hand, such a process exploits hypergraph representations to gain benefits from its strong capability in capturing high-order relationships.
KeywordsHypergraph; graph framework; graph partitioning; deep neural network training; quantized learning
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References