Big data clustering for smart city applications
AffiliationElectrical and Electronic Engineering
Document TypePhD thesis
Access StatusOpen Access
© 2016 Dr. Dheeraj Kumar
The Internet of Things (IoT) infrastructure for the creation of smart cities consists of internet connected sensors, devices and citizens. This IoT infrastructure generates an enormous amount of data in the form of city-scale physical measurements and public opinions, constituting big data. Smart cities aim to efficiently use this wealth of data to manage and solve the problems faced by modern cities for better decision making. However, interpretation of the massive amount of smart city generated big data to create actionable knowledge is a challenging task. Aggregation and Summarization (data clustering) is a useful tool to create knowledge from raw data from different sources. However, traditional data clustering algorithms are not suitable for unlabelled smart city data owing to its high volume and generation velocity and limited experience about generating phenomenon. This thesis presents a novel framework for clustering tendency assessment for big data: clusiVAT, which provides an aggregated view of the big data to create actionable knowledge. clusiVAT intelligently selects a small number of samples from the data such that the samples retain the approximate geometry of the big dataset. The reordered dissimilarity image of the samples generated using single linkage minimum spanning tree (MST) suggests the number of clusters in the data, which is required as an input for most popular clustering algorithms. The cluster labels are then extended to the non-sampled points using the nearest prototype rule. The clusiVAT framework was applied to two real life smart city applications to understand the underlying patterns hidden in the huge volumes of data to generate knowledge. The first application used clusiVAT for clustering and anomaly detection from the pedestrian and vehicle trajectories obtained from a video surveillance system. Experiments were performed on a real-life MIT trajectories dataset of vehicles and pedestrians from a parking lot scene. The trajectory clusters and anomalies thus obtained were helpful in the high-level interpretation of a scene (crowd behavior modeling), as feedback for a low-level (individual) tracking and activity prediction system and as an alarm for human supervisor. For the second application, clusiVAT was used to cluster large scale (of the order of millions) vehicular trajectories obtained from the GPS traces of taxis in the city of Beijing and Singapore using a novel Dijkstra-based dynamic time warping distance measure. The results facilitated the understanding of spatial and temporal patterns in trajectories and were of great significance for decision-makers to understand road traffic conditions and to propose metro bus corridors and light rail systems for better public transport. Another prominent data generated by smart city IoT infrastructure are high-velocity data streams. Automatic interpretation of these evolving big data is required for timely detection of unusual events. This thesis presents a computationally efficient 'hot' update approach for incremental visualization of evolving cluster structure in streaming data. The new algorithms were demonstrated for two applications: online anomaly detection and sliding window based clustering of time series data. Numerical experiments on weather monitoring data from great barrier reef and the city of Melbourne provided visual clues to the onset of the new structure in streaming data.
Keywordsbig data clustering; cluster tendency assessment; Internet of Things; single linkage; trajectory clustering; anomaly detection; VAT; iVAT; clusiVAT; hierarchical clustering; MIT trajectory dataset; road graph network; scalable clustering; trajectory distance measure; visual assessment of clusters in streaming data; cluster heat maps; smart city streaming data analysis; online anomaly detection; sliding window based time series clustering
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References