Journal of Xidian University ›› 2020, Vol. 47 ›› Issue (4): 55-63.doi: 10.19665/j.issn1001-2400.2020.04.008

Previous Articles     Next Articles

High performance multiply-accumulator for the convolutional neural networks accelerator

KONG Xin1,2(),CHEN Gang1,GONG Guoliang1,LU Huaxiang1,2,3,4,Mao Wenyu1   

  1. 1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
    2. University of Chinese Academy of Sciences, Beijing, 100089, China
    3. Center of Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
    4. Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing 100083, China
  • Received:2020-01-06 Online:2020-08-20 Published:2020-08-14

Abstract:

The multiply-accumulator (MAC) in existing convolutional neural network(CNN) accelerators generally have some problems, such as a large area, a high power consumption and a long critical path. Aiming at these problems, this paper presents a high-performance MAC based on transmission gates for CNN accelerators. This paper proposes a new data accumulation and compression structure suitable for the MAC, which reduces the hardware overhead. Moreover, we propose a new parallel adder architecture. Compared with the Brent Kung adder, the proposed adder reduces the number of gate delay stages and improves the calculation speed without causing an increase in hardware resources. In addition, we use the advantages of the transmission gate to optimize each unit circuit of the MAC. The 16-by-8 fixed-point high performance MAC based on the methods presented in this paper has a critical path delay of 1.173ns, a layout area of 9049.41μm2, and an average power consumption of 4.153mW at 800MHz under the SMIC 130nm tt corner. Compared with the traditional MAC, the speed is increased by 37.42%, the area is reduced by 47.84%, and the power consumption is reduced by56.77% under the same conditions.

Key words: multiply accumulator, transmission gate, accumulation and compression, convolutional neural network, high performance

CLC Number: 

  • TN4