Electronic Science and Technology ›› 2021, Vol. 34 ›› Issue (5): 47-53.doi: 10.16180/j.cnki.issn1007-7820.2021.05.009

Previous Articles     Next Articles

Accelerated Spectral Clustering Based on Improved Landmark Selection

XU Hangfan,LIU Cong,TANG Jiangang,PENG Dunlu   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
  • Received:2020-02-10 Online:2021-05-15 Published:2021-05-24
  • Supported by:
    National Nature Science of Foundation of China(61703278);National Nature Science of Foundation of China(61772342)

Abstract:

In order to solve the problems that the traditional landmark-based spectral clustering algorithm is susceptible to unevenly distributed landmark points and outlier landmark points, and its sampling methods such as K-means consume a large time and space when face large-scale data. This study proposes an accelerated spectral clustering based on improved landmark selection. The algorithm uses the standard deviation of the pairwise similarity matrix between landmark points to measure the uniformity of the distribution of landmark points. It selectes the landmark points set uniformly distributed from landmark points sets generated randomly, and then removes outlier landmark points with low local density. The sparse similarity matrix is constructed by the obtained landmark points set and the original data set. K-means clustering is performed on data points generated by the first k right singular feature vectors of the landmark points set to obtain the final clustering result. This study theoretically analyzes the time complexity and space complexity of the algorithm and performed experimental verification. Experimental results show that the algorithm is 3%~10% higher than that of the random sampling method, and the time-consuming is 50%~60% less than that of the K-means sampling method.

Key words: spectral clustering, large data sets, landmark sampling, outlier point, standard deviation, sparse similarity matrix, local density, singular value decomposition

CLC Number: 

  • TP301.6