[1] |
SCHMIDHUBER J . Deep Learning in Neural Networks: An Overview[J]. Neural Networks, 2015,61(1):85-117.
|
[2] |
IOFFE S, SZEGEDY C . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]// Proceedings of the 32nd International Conference on Machine Learning. Lille: IMLS, 2015: 448-456.
|
[3] |
GOOGLE INC TPUv2[EB/OL]. [2019-1-7]. https://www.tomshardware.com/ne-ws/tpu-v2-google-machine-learning-35370.html .
|
[4] |
LI J, YAN G, LU W , et al. TNPU: an Efficient Accelerator Architecture for Training Convolutional Neural Networks[C]// Proceedings of the Asia and South Pacific Design Automation Conference. Piscataway: IEEE, 2019: 487-492.
|
[5] |
乔瑞秀, 陈刚, 龚国良 , 等. 一种高性能可重构深度卷积神经网络加速器[J]. 西安电子科技大学学报, 2019,46(3):130-139.
|
|
QIAO Ruixiu, CHEN Gang, GONG Guoliang , et al. High Performance Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. Journal of Xidian University, 2019,46(3):130-139.
|
[6] |
HEGDE K, AGRAWAL R, YAO Y , et al. Morph: Flexible Acceleration for 3D CNN-based Video Understanding[C]// Proceedings of the Annual International Symposium on Microarchitecture. Washington: IEEE Computer Society, 2018: 933-946.
|
[7] |
LI J, YAN G, LU W , et al. SmartShuttle: Optimizing Off-chip Memory Accesses for Deep Learning Accelerators[C]// Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition. Piscataway: IEEE, 2018: 343-348.
|
[8] |
CHEN T, XU B, ZHANG C , et al. Training Deep Nets with Sublinear Memory Cost[J]. Computer Science, 2016.
|
[9] |
NARANG S, DIAMOS G, ELSEN E , et al. Mixed Precision Training[C]// Proceedings of the 6th International Conference on Learning Representations. San Diego: ICLR, 2018.
|
[10] |
JAIN A, PHANISHAYEE A, MARS J , et al. Gist: Efficient Data Encoding for Deep Neural Network Training[C]// Proceedings of the International Symposium on Computer Architecture. Piscataway: IEEE, 2018: 776-789.
|
[11] |
HE K, ZHANG X, REN S , et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778.
|
[12] |
SZEGEDY C, VANHOUCKE V, IOFFE S , et al. Rethinking the Inception Architecture for Computer Vision[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2818-2826.
|
[13] |
HUANG G, LIU Z, VAN DER MAATEN L , et al. Densely Connected Convolutional Networks[C]// Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269.
|
[14] |
REDMON J, FARHADI A . Yolov3: An Incremental Improvement[EB/OL]. [2018-12-25]. https://arxiv.org/pdf/1804.02767.pdfs.
|
[15] |
NVIDIA T . V100 GPU Architecture. The World’s Most Advanced Data Center GPU[EB/OL].[2018-10-10]. https://devblogs.nvidia.com/inside-volta/.
|
[16] |
YOU Y, ZHANG Z, DEMMEL J , et al. Imagenet Training in 24 Minutes[CP/OL].[2018-10-10]. https://arxiv.org/pdf/1709.05011v1.pdf.
|
[17] |
JIA Y, SHELHAMER E, DONAHUE J , et al. Caffe: Convolutional Architecture for Fast Feature Embedding[C]// Proceedings of the 2014 ACM Conference on Multimedia. New Nork: ACM, 2014: 675-678.
|