Journal of Xidian University ›› 2020, Vol. 47 ›› Issue (2): 98-107.doi: 10.19665/j.issn1001-2400.2020.02.014
Previous Articles Next Articles
WANG Jijun,HAO Ziyu,LI Hongliang
Received:
2019-11-14
Online:
2020-04-20
Published:
2020-04-26
CLC Number:
WANG Jijun,HAO Ziyu,LI Hongliang. Optimization of memory access for the convolutional neural network training[J].Journal of Xidian University, 2020, 47(2): 98-107.
"
层次 | 过程 | 输入特征图 | 参数 | 输出特征图 | 读访存量 | 写访存量 |
---|---|---|---|---|---|---|
卷积 | 前向计算 | [N,C1,H1,W1] | K2C1C2 | [N,C2,H2,W2] | NC1H1W1+K2C1C2 | NC1H1W1 |
计算参数误差 | [C1,N,H1,W1] | [C2,N,H2,W2] | K2C2C1 | NC1H1W1+K+C2NH2W2 | K2C2C1 | |
计算输入误差 | [N,C2,H2,W2] | K2C2C1 | [N,C1,H1,W1] | NC2H2W2+K2C2C1 | NC1H1W1 | |
批归一化 | 前向计算 | [C2,N,H2,W2] | [C2,N,H2,W2] | 2NC2H2W2 | ||
计算输入误差 | [C2,N,H2,W2] | [C2,N,H2,W2] | 4NC2H2W2 | NC2H2W2 | ||
激活 | 前向计算 | [N,C2,H2,W2] | [N,C2,H2,W2] | NC2H2W2 | ||
计算输入误差 | [N,C2,H2,W2] | [N,C2,H2,W2] | NC2H2W2 |
[1] | SCHMIDHUBER J . Deep Learning in Neural Networks: An Overview[J]. Neural Networks, 2015,61(1):85-117. |
[2] | IOFFE S, SZEGEDY C . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]// Proceedings of the 32nd International Conference on Machine Learning. Lille: IMLS, 2015: 448-456. |
[3] | GOOGLE INC TPUv2[EB/OL]. [2019-1-7]. https://www.tomshardware.com/ne-ws/tpu-v2-google-machine-learning-35370.html . |
[4] | LI J, YAN G, LU W , et al. TNPU: an Efficient Accelerator Architecture for Training Convolutional Neural Networks[C]// Proceedings of the Asia and South Pacific Design Automation Conference. Piscataway: IEEE, 2019: 487-492. |
[5] | 乔瑞秀, 陈刚, 龚国良 , 等. 一种高性能可重构深度卷积神经网络加速器[J]. 西安电子科技大学学报, 2019,46(3):130-139. |
QIAO Ruixiu, CHEN Gang, GONG Guoliang , et al. High Performance Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. Journal of Xidian University, 2019,46(3):130-139. | |
[6] | HEGDE K, AGRAWAL R, YAO Y , et al. Morph: Flexible Acceleration for 3D CNN-based Video Understanding[C]// Proceedings of the Annual International Symposium on Microarchitecture. Washington: IEEE Computer Society, 2018: 933-946. |
[7] | LI J, YAN G, LU W , et al. SmartShuttle: Optimizing Off-chip Memory Accesses for Deep Learning Accelerators[C]// Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition. Piscataway: IEEE, 2018: 343-348. |
[8] | CHEN T, XU B, ZHANG C , et al. Training Deep Nets with Sublinear Memory Cost[J]. Computer Science, 2016. |
[9] | NARANG S, DIAMOS G, ELSEN E , et al. Mixed Precision Training[C]// Proceedings of the 6th International Conference on Learning Representations. San Diego: ICLR, 2018. |
[10] | JAIN A, PHANISHAYEE A, MARS J , et al. Gist: Efficient Data Encoding for Deep Neural Network Training[C]// Proceedings of the International Symposium on Computer Architecture. Piscataway: IEEE, 2018: 776-789. |
[11] | HE K, ZHANG X, REN S , et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778. |
[12] | SZEGEDY C, VANHOUCKE V, IOFFE S , et al. Rethinking the Inception Architecture for Computer Vision[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2818-2826. |
[13] | HUANG G, LIU Z, VAN DER MAATEN L , et al. Densely Connected Convolutional Networks[C]// Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. |
[14] | REDMON J, FARHADI A . Yolov3: An Incremental Improvement[EB/OL]. [2018-12-25]. https://arxiv.org/pdf/1804.02767.pdfs. |
[15] | NVIDIA T . V100 GPU Architecture. The World’s Most Advanced Data Center GPU[EB/OL].[2018-10-10]. https://devblogs.nvidia.com/inside-volta/. |
[16] | YOU Y, ZHANG Z, DEMMEL J , et al. Imagenet Training in 24 Minutes[CP/OL].[2018-10-10]. https://arxiv.org/pdf/1709.05011v1.pdf. |
[17] | JIA Y, SHELHAMER E, DONAHUE J , et al. Caffe: Convolutional Architecture for Fast Feature Embedding[C]// Proceedings of the 2014 ACM Conference on Multimedia. New Nork: ACM, 2014: 675-678. |
[1] | CHEN Rong,XU Hongli,YANG Dongxue,HUANG Hua. Dense three-dimensional reconstruction algorithm based on spatially encoded structured light [J]. Journal of Xidian University, 2021, 48(6): 123-130. |
[2] | LIU Yunrui,ZHOU Shuisheng. Application of least squares loss in the multi-view learning algorithm [J]. Journal of Xidian University, 2021, 48(6): 151-160. |
[3] | ZHANG Chunxiang,ZHOU Xuesong,GAO Xueyao,LIU Huan. Semi-supervised word sense disambiguation by combining k-means clustering and the LSTM network [J]. Journal of Xidian University, 2021, 48(6): 161-171. |
[4] | LI Yuan,CUI Yushuang,WANG Wei. Method for the analysis of text sentiment based on the word dual-channel network [J]. Journal of Xidian University, 2021, 48(6): 179-186. |
[5] | DAI Mingjun,LI Xiaofeng,DENG Haiyan,CHEN Bin. Private information retrieval with low encoding/decoding complexity [J]. Journal of Xidian University, 2021, 48(6): 212-220. |
[6] | TAN Wen,GAN Xinbiao,BAI Hao,XIAO Tiaojie,CHEN Xuguang,LEI Shumeng,LIU Jie. Optimization of large-scale graph traversal for supercomputers [J]. Journal of Xidian University, 2021, 48(6): 84-95. |
[7] | GU Zhaojun,CHEN Hui,WANG Jialiang,GAO Bing. Target tracking control algorithm for small size quad-rotor helicopter [J]. Journal of Xidian University, 2021, 48(5): 117-127. |
[8] | DONG Ruchan,JIAO Licheng,ZHAO Jin,SHEN Weiyan. Application of the deep fusion mechanism in object detection of remote sensing images [J]. Journal of Xidian University, 2021, 48(5): 128-138. |
[9] | WANG Haijun,ZHANG Shengyan,DU Yujie. UAV object tracking via the correlation filter with the response divergence constraint [J]. Journal of Xidian University, 2021, 48(5): 149-155. |
[10] | ZHANG Yuhao,CHENG Peitao,ZHANG Shuhao,WANG Xiumei. Lightweight image super-resolution with the adaptive weight learning network [J]. Journal of Xidian University, 2021, 48(5): 15-22. |
[11] | CHENG De,HAO Yi,ZHOU Jingyu,WANG Nannan,GAO Xinbo. Cross-modality person re-identification utilizing the hybrid two-stream neural networks [J]. Journal of Xidian University, 2021, 48(5): 190-200. |
[12] | SUN Yanjing,WEI Li,ZHANG Nianlong,YUN Xiao,DONG Kaiwen,GE Min,CHENG Xiaozhou,HOU Xiaofeng. Person re-identification method combining the DD-GAN and Global feature in a coal mine [J]. Journal of Xidian University, 2021, 48(5): 201-211. |
[13] | YAN Jia,CAO Yudong,REN Jiaxing,CHEN Donghao,LI Xiaohui. Deep asymmetric compression Hashing algorithm [J]. Journal of Xidian University, 2021, 48(5): 212-221. |
[14] | TIAN Chunna,YE Yanyu,SHAN Xiao,DING Yuxuan,ZHANG Xiangnan. Survey of self-supervised video representation learning [J]. Journal of Xidian University, 2021, 48(5): 222-230. |
[15] | WANG Junjun,SUN Yue,LI Ying. Cloud removal method for the remote sensing image based on the GAN [J]. Journal of Xidian University, 2021, 48(5): 23-29. |
|