电子科技 ›› 2022, Vol. 35 ›› Issue (12): 35-42.doi: 10.16180/j.cnki.issn1007-7820.2022.12.005

• • 上一篇    下一篇

基于Winograd算法的可重构卷积神经网络加速器

袁子昂,倪伟,冉敬楠   

  1. 合肥工业大学 微电子学院,安徽 合肥 230601
  • 收稿日期:2021-05-25 出版日期:2022-12-15 发布日期:2022-12-13
  • 作者简介:袁子昂(1996-),男,硕士研究生。研究方向:神经网络加速器。|倪伟(1977-),男,博士,副教授。研究方向:数字集成电路设计、可重构计算、人工智能、数字集成电路验证技术。|冉敬楠(1996-),男,硕士研究生。研究方向:神经网络加速器。
  • 基金资助:
    国家自然科学基金(61874156);安徽省高校协同创新资助项目(GXXT-2019-030)

Reconfigurable Convolutional Neural Network Accelerator Based on Winograd Algorithm

YUAN Ziang,NI Wei,RAN Jingnan   

  1. School of Microelectronics,Hefei University of Technology,Hefei 230601,China
  • Received:2021-05-25 Online:2022-12-15 Published:2022-12-13
  • Supported by:
    National Natural Science Foundation of China(61874156);Collaborative Innovation Funding Project for Universities in Anhui(GXXT-2019-030)

摘要:

神经网络被广泛应用于模式识别、预测分析、数据拟合等方面,是人工智能的重要基础。神经网络卷积计算量大且网络参数量多,导致了计算时间长且数据访存压力大等问题。针对以上问题,文中基于Winograd算法对卷积计算进行加速,设计了优化的硬件计算结构,提高了数据的复用效率和计算并行度。相较于滑窗卷积,文中所提加速器的计算效率提升了4.352倍。在卷积核梯度计算方面,该加速器采用优化的数据分配方式,减少了数据搬移且满足了多个PE并行计算的数据需求,与CPU相比性能提升了23倍。实验表明,该加速器在VGG-9网络模型下的卷积计算吞吐率可达192.55 GFLOPS,在训练后对CIFAR-10数据集的识别率为76.54%。

关键词: CNN硬件加速器, Winograd, FPGA, 可重构, 卷积加速, 多路并行, 图像识别, VGG网络

Abstract:

Neural network is widely used in pattern recognition, predictive analysis, data fitting and other aspects, and it is an important foundation of artificial intelligence. Due to the large calculation amount of convolution and the large amount of network parameters, neural networks have caused problems such as long calculation time and high data access pressure. In response to the above problems, this study accelerates the convolution calculation based on the Winograd algorithm, designs an optimized hardware calculation structure, which improves the data reuse efficiency and calculation parallelism. Compared with the sliding window convolution, this accelerator increases the calculation efficiency by 4.352 times. In terms of convolution kernel gradient calculation, this accelerator adopts an optimized data distribution method, which reduces data movement and meets the data requirements of multiple PE parallel calculations. Compared with the CPU, the performance is improved by 23 times. Experiments show that the convolution calculation throughput rate of the accelerator can reach 192.55 GFLOPS under the VGG-9 network model, and the recognition rate of the CIFAR-10 data set after training is 76.54%.

Key words: CNN hardware accelerator, Winograd, FPGA, reconfigurable, convolution acceleration, multiplexed parallelism, image identification, VGG network

中图分类号: 

  • TN47