[1] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks[C]//Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing System Foundation, 2012: 1097-1105.
[3] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-scale Image Recognition[J]. Computer Science, 2014, 41(5): 1409-1556.
[4] SUNG W, PARK J. Architecture Exploration of a Programmable Neural Network Processor for Embedded Systems[C]//Proceedings of the 2016 16th International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Piscataway: IEEE, 2017: 124-131.
[5] SUDA N, CHANDRA V, DASIKA G, et al. Throughput-optimized OpenCL-based FPGA Accelerator for Large-scale Convolutional Neural Networks[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. New York: ACM, 2016: 16-25.
[6] QIU J, WANG J, YAO S, et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network[C]//Proceedings of the 2016 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. New York: ACM, 2016: 26-35.
[7] LIU Z, DOU Y, JIANG J, et al. Automatic Code Generation of Convolutional Neural Networks in FPGA Implementation[C]//Proceedings of the 2016 International Conference on Field-programmable Technology. Piscataway: IEEE, 2017: 61-68.
[8] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks[C]//Proceedings of the 2015 ACM/SIGDA International Symposium on Field-programmable Gate Arrays. New York: ACM, 2015: 161-170.
[9] GOKHALE V, JIN J, DUNDAR A, et al. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks[C]//Proceedings of the 2014 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. Washington: IEEE Computer Society, 2014: 696-701.
[10] CHEN T, DU Z, SUN N, et al. DianNao: a Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning[C]//Proceedings of the 2014 International Conference on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2014: 269-284.
[11] PEEMEN M, SETIO A A A, MESMAN B, et al. Memory-centric Accelerator Design for Convolutional Neural Networks[C]//Proceedings of the 2013 31st IEEE International Conference on Computer Design. Piscataway: IEEE, 2013: 13-19.
[12] CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A Dynamically Configurable Coprocessor for Convolutional Neural Networks[C]//Proceedings of the 2010 International Symposium on Computer Architecture. Piscataway: IEEE, 2010: 247-257.
[13] CADAMBI S, MAJUMDAR A, BECCHI M, et al. A Programmable Parallel Accelerator for Learning and Classification[C]//Proceedings of the 2017 International Conference on Parallel Architectures and Compilation Techniques. Piscataway: IEEE, 2017: 273-284.
[14] FARABET C, POULET C, HAN J Y, et al. CNP: an FPGA-based Processor for Convolutional Networks[C]//Proceedings of the 2009 19th International Conference on Field Programmable Logic and Applications. Piscataway: IEEE, 2009: 32-37.
[15] SANKARADAS M, JAKKULA V, CADAMBI S, et al. A Massively Parallel Coprocessor for Convolutional Neural Networks[C]//Proceedings of the 2009 International Conference on Application-Specific Systems, Architectures and Processors. Piscataway: IEEE, 2009: 53-60.
[16] ZHANG C, WU D, SUN J, et al. Energy-efficient CNN Implementation on a Deeply Pipelined FPGA Cluster[C]//Proceedings of the 2016 International Symposium on Low Power Electronics and Design. New York: ACM, 2016:326-331.
[17] KEUTZER K, MALIK S, NEWTON A R. From ASIC to ASIP: the Next Design Discontinuity[C]//Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors: 2002. Piscataway: IEEE, 2002: 84-90.
[18] CORPORAAL H. Microprocessor Architectures-from VLIW to TTA[M]. Hoboken: Wiley, 1997.
[19] MORENO J H, MOUDGILL M, EBCIOGLU K, et al. Architecture, Compiler and Simulation of a Tree-based VLIW Processor: IBM Research Report RC20495 [R]. New York: IBM, 1996.
[20] 刘镇弢, 李涛, 黄虎才, 等. 一种用于实时图像处理的众核结构设计[J]. 西安电子科技大学学报, 2015, 42(2): 95-101.
LIU Zhentao, LI Tao, HUANG Hucai, et al. Novel Many-core Architecture Design for Real-time Image Processing [J]. Journal of Xidian University, 2015, 42(2): 95-101.
[21] 邓军勇, 李涛, 蒋林等. 面向 OpenGL 的图形加速器设计与实现[J]. 西安电子科技大学学报, 2015, 42(6): 124-130.
DENG Junyong, LI Tao, JIANG Lin, et al. Design and Implementation of the Graphics Accelerator Oriented to OpenGL [J]. Journal of Xidian University, 2015, 42(6): 124-130. |