Electronic Science and Technology ›› 2022, Vol. 35 ›› Issue (12): 78-83.doi: 10.16180/j.cnki.issn1007-7820.2022.12.011

Previous Articles     Next Articles

K-Anonymity Data Publishing Algorithm Based on Hybrid Clustering

FANG Kai1,SHI Zhicai1,2,JIA Yuanyuan1   

  1. 1. School of Electronic and Electrical Engineering,Shanghai University of Engineering Science,Shanghai 201620,China
    2. Shanghai Key Laboratory of Integrated Administration Technologies for Information Security,Shanghai 200240,China
  • Received:2021-05-19 Online:2022-12-15 Published:2022-12-13
  • Supported by:
    National Natural Science Foundation of China(61802252)

Abstract:

In order to reduce the loss of information in data publishing, a k-anonymous data publishing algorithm based on hybrid clustering is proposed to solve the problem of low data availability in existing data anonymity schemes based on clustering. Compared with the traditional single clustering method, the proposed algorithm combines partition clustering and distance clustering, selects the initial clustering center point according to the density characteristics of the data set, and uses partition clustering to achieve the optimal clustering iteratively. In addition, the proposed method eliminates part of the outlier noise in the data set to reduce its impact on the clustering results. For hybrid data records, the distance measurement method combining k-means and k-modes is adopted, and the bucket generalization algorithm is introduced to reduce the information loss caused by generalization operation. Experimental results show that compared with the existing methods, the k-anonymity data publishing algorithm based on hybrid clustering can effectively reduce the information loss of data anonymity and improve the quality of data publishing.

Key words: privacy preserving, data publishing, k-anonymity, clustering, bucket generalization algorithm, mixed attributes, network security, information loss

CLC Number: 

  • TP309