西安电子科技大学学报

• 研究论文 • 上一篇    下一篇

传输触发架构的可编程神经网络处理器设计

赵博然;张犁;石光明;黄蓉;徐欣冉   

  1. (西安电子科技大学 电子工程学院,陕西 西安 710071)
  • 收稿日期:2017-10-11 发布日期:2018-09-25
  • 通讯作者: 张犁(1968-), 男, 高级工程师, E-mail: zhang1_li2@mail.xidian.edu.cn
  • 作者简介:赵博然(1991-), 男, 西安电子科技大学硕士研究生, E-mail: zhaoboran2016@gmail.com

Design of the programmable neural network processor based on the transport triggered architecture

ZHAO Boran;ZHANG Li;SHI Guangming;HUANG Rong;XU Xinran   

  1. (School of Electronic Engineering, Xidian Univ., Xian 710071, China)
  • Received:2017-10-11 Published:2018-09-25

摘要:

卷积神经网络算法存在着算法结构多样和数据交换计算量大的问题.为此,提出了一种基于传输触发体系架构的可编程卷积神经网络处理器.系统采用多通道直接存储器访问通道、多端口存储器和专用池化数据通道组成数据传输网络解决了数据交换问题.实验表明,该系统在实现卷积神经网络的加速计算方面,虽然吞吐速率比并行流水线方案慢11%,但与之相比具备可编程、适应不同神经网络的特性,节省了46.5%硬件乘法器资源,比其他非流水线实现方案吞吐速率至少快40%.该方案具有系统并行度大、可编程、可在线配置和处理速度较高的特点.

关键词: 深度学习, 卷积神经网络, 并行处理, 现场可编程门阵列

Abstract:

The convolutional neural networks have the problems of structure diversity and large amounts of data exchange and computation. A transport triggered architecture based convolutional neural network processor is presented in this paper. The data transport network is constructed with multi-channel direct memory access channels, the multi-port memory and the specialized pooling data path, which solves the inefficient data exchange problem. Experimental results show that, although the proposed architecture is 11% slower than the streamline structure, it can adapt to a variety of convolutional neural networks and save 46.5% multipliers. Compared with the schemes presented in other papers except pipeline implementation, our design improves the data throughput rate by 40% at least. Besides, this system has advantages of parallel efficiency, programmable flexibility, online architecture reconfiguration, high processing speed, etc.

Key words: deep learning, convolutional neural networks, parallel computing, field programmable gate array