西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 132-138.doi: 10.19665/j.issn1001-2400.2019.02.022

• • 上一篇    下一篇

权重量化的深度神经网络模型压缩算法

陈昀,蔡晓东(),梁晓曦,王萌   

  1. 桂林电子科技大学 信息与通信学院,广西壮族自治区 桂林 541004
  • 收稿日期:2018-04-24 出版日期:2019-04-20 发布日期:2019-04-20
  • 通讯作者: 蔡晓东
  • 作者简介:陈昀(1991-),男,桂林电子科技大学硕士研究生,E-mail:1655770801@qq.com.
  • 基金资助:
    2014年广西物联网技术及产业化推进协同创新中心项目(WLW200601);2016年广西科技计划重点研发计划(AB16380264);2016年"认知无线电与信息处理"省部共建教育部重点实验室基金(CRKL160102)

Compression algorithm for weights quantized deep neural network models

CHEN Yun,CAI Xiaodong(),LIANG Xiaoxi,WANG Meng   

  1. School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China
  • Received:2018-04-24 Online:2019-04-20 Published:2019-04-20
  • Contact: Xiaodong CAI

摘要:

深度神经网络模型通常存在大量的权重参数,为了减少其对存储空间的占用,提出权重量化的深度神经网络模型压缩算法。在前向传播过程中,使用一个四值滤波器将全精度权重量化为2、1、-1和-2四种状态,以进行高效的权重编码。最小化全精度权重与缩放后四值权重的L2距离,以获得精确的四值权重模型。使用一个32位二进制数对16个四值权重进行编码压缩,以大幅度压缩模型。在MNIST、CIFAR-10和CIFAR-100数据集上的实验表明,该算法分别获得了6.74%、6.88%和6.62%的模型压缩率,与三值权重网络的相同,但准确率分别提升了0.06%、0.82%和1.51%。结果表明,该算法可提供高效、精确的深度神经网络模型压缩。

关键词: 权重量化, 压缩, 四值滤波器, 存储空间, 全精度

Abstract:

There is a large number of weight parameters in deep neural network models. In order to reduce the storage space of deep neural network models, a compression algorithm for weights quantization is proposed. In the forward propagation process, a four-value filter is utilized for quantizing full-precision weights into four states as 2, 1, -1, and -2 to encode weights efficiently. In order to obtain an accurate four-value weights model, the L2 distance between full-precision weights and scaled four-value weights is minimized. To further improve the compression of the model, 16 four-value weights are encoded and compressed using a 32-bit binary number. Experimental results on the datasets of MNIST, CIFAR-10 and CIFAR-100 show that the model compression ratio of the algorithm is the same as that for the TWN (Ternary Weight Network), which is 6.74%, 6.88% and 6.62%, respectively. Also, the accuracy rate is increased by 0.06%, 0.82% and 1.51%. The results indicate that the algorithm can provide efficient and accurate compression of deep neural network models.

Key words: weights quantization, compression, four-value filter, storage space, full-precision

中图分类号: 

  • TP391