There is a large number of weight parameters in deep neural network models. In order to reduce the storage space of deep neural network models, a compression algorithm for weights quantization is proposed. In the forward propagation process, a four-value filter is utilized for quantizing full-precision weights into four states as 2, 1, -1, and -2 to encode weights efficiently. In order to obtain an accurate four-value weights model, the L2 distance between full-precision weights and scaled four-value weights is minimized. To further improve the compression of the model, 16 four-value weights are encoded and compressed using a 32-bit binary number. Experimental results on the datasets of MNIST, CIFAR-10 and CIFAR-100 show that the model compression ratio of the algorithm is the same as that for the TWN (Ternary Weight Network), which is 6.74%, 6.88% and 6.62%, respectively. Also, the accuracy rate is increased by 0.06%, 0.82% and 1.51%. The results indicate that the algorithm can provide efficient and accurate compression of deep neural network models.