Journal of Xidian University ›› 2023, Vol. 50 ›› Issue (2): 92-100.doi: 10.19665/j.issn1001-2400.2023.02.010

• nformation and Communications Engineering • Previous Articles     Next Articles

GPGPU cache bypassing system for 2D and 3D convolution

JIA Shiwei1(),ZHANG Yuming1(),QIN Xiang2(),SUN Chenglu2(),TIAN Ze3()   

  1. 1. School of Microelectronics,Xidian University,Xi’an 710071,China
    2. Department of Integrated Circuit R&D,Xiangteng microelectronics corporation,Xi’an 710068,China
    3. Key Laboratory of Aviation and Technology on Integrated Circuit and Micro-System Design,China Institute of Aeronautical Computing Technology,Xi’an 710068,China
  • Received:2022-05-23 Online:2023-04-20 Published:2023-05-12

Abstract:

As the core computing platform of the convolution neural network,general-purpose graphics processor(GPGPU),its performance of processing two-dimensional and three-dimensional convolution determines the application of the neural network in real-time target recognition and detection.However,limited by inherent cache system design,the current GPGPU architecture cannot achieve efficient acceleration of 2D and 3D convolution computing.Aiming at this problem,a dynamic L1Dcache bypassing design for this problem is proposed.First,we define a new data structure that can dynamically reflect the cache access characteristics of an instruction,and then defines a memory-access-feature record table based on this information,in order to record the execution status of different memory accesses.Second,the warp scheduling strategy with the priority thread block is adopted,which can speed up the sampling of the memory access state.Next,the L1Dcache bypassing decision of memory accesses under different PCs is obtained due to the sampling results.Finally,the L1Dcache bypassing of some low-locality data accesses is completed.As a result,the L1Dcache space is reserved for data with high locality and the memory access stall cycle of 2D and 3D convolution is reduced.In addition,the memory access efficiency of 2D and 3D convolution has been improved.Compared with the original design,experimental results show that the L1Dcache bypassing design brings 2.16% performance improvements in 2D convolution and 19.79% in 3D convolution.Experiments prove the effectiveness and practicality of this design.

Key words: convolution, GPGPU, memory system, cache bypassing

CLC Number: 

  • TN4