西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (3): 130-139.doi: 10.19665/j.issn1001-2400.2019.03.020

• • 上一篇    下一篇

一种高性能可重构深度卷积神经网络加速器

乔瑞秀1,2,陈刚1,龚国良1,鲁华祥1,2,3,4   

  1. 1. 中国科学院半导体研究所,北京 100083
    2. 中国科学院大学,北京 100049
    3. 中国科学院脑科学与智能技术卓越创新中心,上海 200031
    4. 半导体神经网络智能感知与计算技术北京市重点实验室,北京 100083
  • 收稿日期:2019-02-14 出版日期:2019-06-20 发布日期:2019-06-19
  • 作者简介:乔瑞秀(1991- ),女,中国科学院大学博士研究生,E-mail: qiaoruixiu@semi.ac.cn.
  • 基金资助:
    中国科学院战略性先导科技专项(A)类超导计算机研发(XDA18000000);北京市科技计划(Z181100001518006);国家自然科学基金青年基金(61701473);国家自然科学基金青年基金(61401423);中国科学院STS计划(KFJ-STS-ZDTP-070);中国科学院国防科技创新基金(CXJJ-17-M152)

High performance reconfigurable accelerator for deep convolutional neural networks

QIAO Ruixiu1,2,CHEN Gang1,GONG Guoliang1,LU Huaxiang1,2,3,4   

  1. 1. Institute of Semiconductors, Chinese Academy of Sciences, Beijing, 100083, China
    2. University of the Chinese Academy of Sciences, Beijing, 100049, China
    3. Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China
    4. Semiconductor Neural Network Intelligent Perception and Computing Technology Beijing Key Lab, Beijing 100083, China
  • Received:2019-02-14 Online:2019-06-20 Published:2019-06-19

摘要:

由于深度卷积神经网络的卷积层通道规模及卷积核尺寸多样,现有加速器面对这些多样性很难实现高效计算。为此,基于生物脑神经元机制提出了一种深度卷积神经网络加速器。该加速器拥有类脑神经元电路的多种分簇方式及链路组织方式,可以应对不同通道规模。设计了3种卷积计算映射,可以应对不同卷积核大小;实现了局部存储区数据的高效复用,可大量减少数据搬移,提高了计算性能。分别以目标分类和目标检测网络进行测试,该加速器的计算性能分别达498.6×10 9次/秒和571.3×10 9次/秒;能效分别为582.0×10 9次/(秒·瓦)和651.7×10 9次/(秒·瓦)。

关键词: 深度神经网络, 加速器, 可重构结构, 高性能, 超大规模集成电路

Abstract:

In deep convolutional neural networks,the diversity of channel sizes and kernel sizes makes it difficult for existing accelerators to achieve efficient calculations. Therefore, based on the biological brain neuron mechanism, a deep convolutional neural network accelerator is proposed which can provide not only multiple clustering methods for brain-like neurons and link organization among brain-like neurons towards different channel sizes, but also three mapping methods for different convolution kernel sizes. The accelerator implements efficient reuse of local memory data, which greatly reduces the amount of data movement and improves the computing performance. Tested by the object classification network and object detection network, the accelerator's computational performance is 498.6 GOPS and 571.3 GOPS, respectively; the energy efficiency is 582.0 GOPS/W and 651.7 GOPS/W, respectively.

Key words: deep neural networks, accelerator, reconfigurable architecture, high performance, very large scale integrated circuit

中图分类号: 

  • TN4