Journal of Xidian University ›› 2023, Vol. 50 ›› Issue (5): 188-198.doi: 10.19665/j.issn1001-2400.20230601

• Cyberspace Security • Previous Articles     Next Articles

Privacy preserving multi-classification LR scheme for data quality

CAO Laicheng(),WU Wentao(),FENG Tao(),GUO Xian()   

  1. School of Computer and Communication,Lanzhou University of Technology,Lanzhou 730050,China
  • Received:2023-01-15 Online:2023-10-20 Published:2023-11-21

Abstract:

In order to protect the privacy of the multi-classification logistic regression model in machine learning,ensure the quality of training data,and reduce the computing and communication costs,a privacy preserving multi-classification logistic regressions cheme for data quality is proposed.First,based on the homomorphic encryption for arithmetic of approximate numbers technology,the batch processing technology and single-instruction multi-data mechanism are used to package multiple messages into one ciphertext,and the encrypted vector is safely shifted into the ciphertext corresponding to the plaintext vector.Second,the binary logistic regression model is extended to multiple classifications by training multiple classifiers using the "One vs Rest" disassembly strategy.Finally,the training data set is divided into several matrices of a fixed size,which still retain the complete data structure of the sample information.The fixed Hessian method is used to optimize the model parameters so that they can be used in any case and keep the parameters private.during model training.The scheme can reduce data sparsity and ensure data quality.The security analysis shows that the training model and user data information cannot be leaked in the whole process.Meanwhile,the experiment shows that the training accuracy of this scheme is greatly improved compared with the existing scheme and almost the same as that obtained by training unencrypted data,and that the scheme has a lower computing cost.

Key words: homomorphic encryption, cloud computing, logical regression, privacy-preserving, data quality

CLC Number: 

  • TP309.2