Journal of Xidian University ›› 2019, Vol. 46 ›› Issue (2): 132-138.doi: 10.19665/j.issn1001-2400.2019.02.022

Previous Articles     Next Articles

Compression algorithm for weights quantized deep neural network models

CHEN Yun,CAI Xiaodong(),LIANG Xiaoxi,WANG Meng   

  1. School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
  • Received:2018-04-24 Online:2019-04-20 Published:2019-04-20
  • Contact: Xiaodong CAI E-mail:caixiaodong@guet.edu.cn

Abstract:

There is a large number of weight parameters in deep neural network models. In order to reduce the storage space of deep neural network models, a compression algorithm for weights quantization is proposed. In the forward propagation process, a four-value filter is utilized for quantizing full-precision weights into four states as 2, 1, -1, and -2 to encode weights efficiently. In order to obtain an accurate four-value weights model, the L2 distance between full-precision weights and scaled four-value weights is minimized. To further improve the compression of the model, 16 four-value weights are encoded and compressed using a 32-bit binary number. Experimental results on the datasets of MNIST, CIFAR-10 and CIFAR-100 show that the model compression ratio of the algorithm is the same as that for the TWN (Ternary Weight Network), which is 6.74%, 6.88% and 6.62%, respectively. Also, the accuracy rate is increased by 0.06%, 0.82% and 1.51%. The results indicate that the algorithm can provide efficient and accurate compression of deep neural network models.

Key words: weights quantization, compression, four-value filter, storage space, full-precision

CLC Number: 

  • TP391