Journal of Xidian University ›› 2020, Vol. 47 ›› Issue (2): 98-107.doi: 10.19665/j.issn1001-2400.2020.02.014
Previous Articles Next Articles
WANG Jijun,HAO Ziyu,LI Hongliang
Received:
2019-11-14
Online:
2020-04-20
Published:
2020-04-26
CLC Number:
WANG Jijun,HAO Ziyu,LI Hongliang. Optimization of memory access for the convolutional neural network training[J].Journal of Xidian University, 2020, 47(2): 98-107.
"
层次 | 过程 | 输入特征图 | 参数 | 输出特征图 | 读访存量 | 写访存量 |
---|---|---|---|---|---|---|
卷积 | 前向计算 | [N,C1,H1,W1] | K2C1C2 | [N,C2,H2,W2] | NC1H1W1+K2C1C2 | NC1H1W1 |
计算参数误差 | [C1,N,H1,W1] | [C2,N,H2,W2] | K2C2C1 | NC1H1W1+K+C2NH2W2 | K2C2C1 | |
计算输入误差 | [N,C2,H2,W2] | K2C2C1 | [N,C1,H1,W1] | NC2H2W2+K2C2C1 | NC1H1W1 | |
批归一化 | 前向计算 | [C2,N,H2,W2] | [C2,N,H2,W2] | 2NC2H2W2 | ||
计算输入误差 | [C2,N,H2,W2] | [C2,N,H2,W2] | 4NC2H2W2 | NC2H2W2 | ||
激活 | 前向计算 | [N,C2,H2,W2] | [N,C2,H2,W2] | NC2H2W2 | ||
计算输入误差 | [N,C2,H2,W2] | [N,C2,H2,W2] | NC2H2W2 |
[1] | SCHMIDHUBER J . Deep Learning in Neural Networks: An Overview[J]. Neural Networks, 2015,61(1):85-117. |
[2] | IOFFE S, SZEGEDY C . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift[C]// Proceedings of the 32nd International Conference on Machine Learning. Lille: IMLS, 2015: 448-456. |
[3] | GOOGLE INC TPUv2[EB/OL]. [2019-1-7]. https://www.tomshardware.com/ne-ws/tpu-v2-google-machine-learning-35370.html . |
[4] | LI J, YAN G, LU W , et al. TNPU: an Efficient Accelerator Architecture for Training Convolutional Neural Networks[C]// Proceedings of the Asia and South Pacific Design Automation Conference. Piscataway: IEEE, 2019: 487-492. |
[5] | 乔瑞秀, 陈刚, 龚国良 , 等. 一种高性能可重构深度卷积神经网络加速器[J]. 西安电子科技大学学报, 2019,46(3):130-139. |
QIAO Ruixiu, CHEN Gang, GONG Guoliang , et al. High Performance Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. Journal of Xidian University, 2019,46(3):130-139. | |
[6] | HEGDE K, AGRAWAL R, YAO Y , et al. Morph: Flexible Acceleration for 3D CNN-based Video Understanding[C]// Proceedings of the Annual International Symposium on Microarchitecture. Washington: IEEE Computer Society, 2018: 933-946. |
[7] | LI J, YAN G, LU W , et al. SmartShuttle: Optimizing Off-chip Memory Accesses for Deep Learning Accelerators[C]// Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition. Piscataway: IEEE, 2018: 343-348. |
[8] | CHEN T, XU B, ZHANG C , et al. Training Deep Nets with Sublinear Memory Cost[J]. Computer Science, 2016. |
[9] | NARANG S, DIAMOS G, ELSEN E , et al. Mixed Precision Training[C]// Proceedings of the 6th International Conference on Learning Representations. San Diego: ICLR, 2018. |
[10] | JAIN A, PHANISHAYEE A, MARS J , et al. Gist: Efficient Data Encoding for Deep Neural Network Training[C]// Proceedings of the International Symposium on Computer Architecture. Piscataway: IEEE, 2018: 776-789. |
[11] | HE K, ZHANG X, REN S , et al. Deep Residual Learning for Image Recognition[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 770-778. |
[12] | SZEGEDY C, VANHOUCKE V, IOFFE S , et al. Rethinking the Inception Architecture for Computer Vision[C]// Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington: IEEE Computer Society, 2016: 2818-2826. |
[13] | HUANG G, LIU Z, VAN DER MAATEN L , et al. Densely Connected Convolutional Networks[C]// Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2261-2269. |
[14] | REDMON J, FARHADI A . Yolov3: An Incremental Improvement[EB/OL]. [2018-12-25]. https://arxiv.org/pdf/1804.02767.pdfs. |
[15] | NVIDIA T . V100 GPU Architecture. The World’s Most Advanced Data Center GPU[EB/OL].[2018-10-10]. https://devblogs.nvidia.com/inside-volta/. |
[16] | YOU Y, ZHANG Z, DEMMEL J , et al. Imagenet Training in 24 Minutes[CP/OL].[2018-10-10]. https://arxiv.org/pdf/1709.05011v1.pdf. |
[17] | JIA Y, SHELHAMER E, DONAHUE J , et al. Caffe: Convolutional Architecture for Fast Feature Embedding[C]// Proceedings of the 2014 ACM Conference on Multimedia. New Nork: ACM, 2014: 675-678. |
[1] | WANG Jinhua, WEI Ting, CAO Jie, CHEN Li. Improved SwinIR for multi-feature fusion image super-resolution reconstruction [J]. Journal of Xidian University, 2024, 51(6): 171-181. |
[2] | TANG Shuyuan, ZHOU Yiqing, LI Jintao, LIU Chang, SHI Jinglin. Dual attention pedestrian detector for occlusion scenario based on feature calibration [J]. Journal of Xidian University, 2024, 51(6): 25-39. |
[3] | CAI Gushun, LIU Jinhui, ZHANG Xindan, HUANG Zhao, WANG Quan. PINN-based method for solving DC operating points in nonlinear circuits [J]. Journal of Xidian University, 2024, 51(6): 91-103. |
[4] | ZHANG Jing, WU Huixue, ZHANG Shaobo, LI Yunsong. Decoder-side enhanced image compression network under distributed strategy [J]. Journal of Xidian University, 2025, 52(1): 1-13. |
[5] | LIANG Liming, JIN Jiaxin, LI Yulin, DONG Xin. Retina grading algorithm integrating PVTv2 and dynamic perception [J]. Journal of Xidian University, 2024, 51(6): 159-170. |
[6] | LIANG Liming, DONG Xin, LEI Kun, XIA Yuchen, WU Jian. Retinal image quality grading for fused attention spectrum non-local blocks [J]. Journal of Xidian University, 2024, 51(4): 102-113. |
[7] | GAO Dihui, SHENG Lijie, XU Xiaodong, MIAO Qiguang. Joint feature approach for image-text cross-modal retrieval [J]. Journal of Xidian University, 2024, 51(4): 128-138. |
[8] | CHEN Yong, CHANG Ting, ZHANG Bingwang. Handover authentication enhancement scheme based on the chaos map and Chinese remainder theorem [J]. Journal of Xidian University, 2024, 51(4): 192-205. |
[9] | LU Yan, LIAO Guisheng, WANG Xiaopeng. Algorithm for the reconstruction of adaptive acceleration multi-path matching pursuit [J]. Journal of Xidian University, 2024, 51(4): 39-50. |
[10] | CHEN Kejia, ZHANG Yupeng, LIN Hongxi. Aspect-based sentiment analysis of syntactic perception and knowledge enhancement [J]. Journal of Xidian University, 2024, 51(5): 165-178. |
[11] | GUAN Yepeng, SU Guangyao, SHENG Yi. Time series prediction method based on the bidirectional long short-term memory network [J]. Journal of Xidian University, 2024, 51(3): 103-112. |
[12] | ZHANG Xiangnan, GAO Xinbo, TIAN Chunna. Complex text region detection based on polygon feature pooling and the transformer [J]. Journal of Xidian University, 2024, 51(3): 113-123. |
[13] | XIA Yilan, WANG Xiumei, CHENG Peitao. Texture-aware video inpainting algorithm based on the multi-attention mechanism [J]. Journal of Xidian University, 2024, 51(3): 136-146. |
[14] | WANG Jing, HE Miaomiao, DING Jianli, LI Yonghua. Spatial-temporal graph convolutional networks foranomaly detection in multivariate time series [J]. Journal of Xidian University, 2024, 51(3): 170-181. |
[15] | HENG Hongjun, YU Longwei. Time series anomaly detection based on multi-scale feature information fusion [J]. Journal of Xidian University, 2024, 51(3): 203-214. |
|