Contrast Data Mining of Multi-source Heterogeneous Trajectory Data
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2020 Li Li
The rapid growth of location-acquisition and mobile computing techniques has led to an increasing availability of human trajectory data. This raises the challenge of detecting and understanding human mobility from these trajectory datasets to extract useful knowledge in a variety of domains, such as business management and urban computing. In this thesis, we focus on research into knowledge discovery from multi-source heterogeneous trajectory data. To be specific, five research questions in three scenarios are studied. The details are as follows. The first research question is how to perform trajectory pattern identification and anomaly detection for pedestrian flows. We propose to adopt contour maps as the visualization method of the origin-destination flow matrix to describe the distribution of pedestrian movements in terms of entry/exit areas. By transforming the origin-destination flow matrix into a dissimilarity matrix, a visual clustering algorithm is applied to visually cluster the most popular and related areas. We also propose a clustering-based algorithm to detect normal/abnormal time periods with similar/anomalous pedestrian flow patterns. Our results on one synthetic and one real-life dataset validate the effectiveness of our proposed algorithms. The second research question is how to perform contrast pattern mining from multi- source datasets in retail environments. Given the sales data and customers’ trajectory data, in order to find patterns where there has been a big change in one dataset but little change in the other dataset, we define a new kind of contrast pattern, conditional contrast patterns, which are a subset of traditional contrast patterns in one kind of dataset conditioned on a property of these patterns in another kind of dataset. Accordingly, we propose an algorithm based on tree search for mining these patterns. Experiments on a synthetic dataset as well as a real-life retail dataset show that our proposed patterns are more informative and actionable for decision makers than traditional contrast patterns, and our tree-based algorithm has good performance in terms of computational efficiency. Three research questions are studied in the third scenario, i.e., human behavior analysis in heterogeneous mobile networks. First, we focus on identifying the underlying geographical corridors of trajectories generated in mobile networks. We propose a hierarchical multi-scale trajectory clustering algorithm for corridor identification by analyzing the non-homogeneity of the spatial distribution of cell towers and users’ movements. Results on a three-week real-life dataset from China Mobile show that our method can achieve the best performance with more than 10% improvement in clustering quality compared with other state-of-the-art methods. Identifying static corridors plays an important role in managing networks for the long term design of a network. However, there is also a great opportunity for dynamically reconfiguring a network in response to changes in traffic flows. Therefore, in our fourth work, we propose a framework based on contrast data mining to identify significantly different corridors during different time periods. Contrast corridors are defined and a distance measure based on Hausdorff distance and earth movers’ distance is proposed to calculate the dissimilarity between the identified corridors. Experimental results on synthetic as well as real-life datasets show that our method can effectively and robustly detect contrast corridors from trajectories generated from different time periods in mobile networks by improving the F1 score by 20% on average. Finally, we focus on how to design caching strategies at the edge of networks. Edge caching in mobile networks can improve users’ experience, reduce latency and balance the network traffic load. Considering that cells located in different places have different levels of predictability due to the heterogeneity of mobile users’ content preferences and mobility, we propose an adaptive edge caching algorithm based on content popularity as well as the individual’s prediction results to provide an optimal caching strategy, aiming to maximize the cache hit rate with acceptable file replacement cost. Our results on a real-life dataset as well as simulation data show that our method is more appropriate for resource-limited and heterogeneous network than other methods. In summary, we have proposed several trajectory data mining approaches to extract useful knowledge from heterogeneous trajectory data or multi-source datasets in three different scenarios. We have shown that our proposed methods can achieve better performance compared to existing state-of-art techniques on a variety of real-life datasets.
Keywordstrajectory data mining; heterogeneous data; contrast data mining; trajectory clustering; edge caching; mobile network
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References