电子科技 ›› 2022, Vol. 35 ›› Issue (11): 21-28.doi: 10.16180/j.cnki.issn1007-7820.2022.11.004

• • 上一篇    下一篇

基于改进QR算法的矩阵分解器设计

陈文杰,宋宇鲲,张多利   

  1. 合肥工业大学 电子科学与应用物理学院,安徽 合肥 230009
  • 收稿日期:2021-04-26 出版日期:2022-11-15 发布日期:2022-11-11
  • 作者简介:陈文杰(1996-),男,硕士研究生。研究方向:高速大维度矩阵求逆器设计。|宋宇鲲(1975-),男,博士,副研究员。研究方向:多核系统设计、数字信号处理的VLSI实现、片上网络优化。|张多利(1972-),男,博士,研究员。研究方向:多核系统设计、数字信号处理的VLSI实现、片上网络优化。
  • 基金资助:
    国家自然科学基金(61874156);安徽省高校协同创新资助项目(GXXT-2019-030)

Design of Matrix Decomposer Based on Improved QR Algorithm

CHEN Wenjie,SONG Yukun,ZHANG Duoli   

  1. School of Electronic Science & Applied Physics,Hefei University of Technology, Hefei 230009,China
  • Received:2021-04-26 Online:2022-11-15 Published:2022-11-11
  • Supported by:
    National Natural Science Foundation of China(61874156);Collaborative Innovation Funding Project for Universities in Anhui(GXXT-2019-030)

摘要:

矩阵分解是矩阵求逆中重要的运算之一,被广泛运用在神经网络、数字信号处理、无线通信技术等领域中。针对传统的分解算法运算不利于硬件实现的缺陷,文中在一种列向量优化QR分解算法的基础上,提出了一种一维线性矩阵分解结构,并完成了其ASIC设计。该分解器支持2~32阶矩阵分解运算,在TSMC 28 nm工艺下工作主频为700 MHz。仿真和FPGA测试结果表明,该分解器与MATLAB运算结果的相对误差小于10-12。在执行12阶级以上规模矩阵分解时,该分解器的运算周期相比传统一维线性结构具有2.3倍的加速比。在执行32阶矩阵分解时,该分解器的运算周期相比于NIVIDA RTX2070具有22.8倍的加速比。

关键词: 矩阵分解, QR分解, Givens旋转, Column-wise Givens Rotation, FPGA实现, 硬件加速, 一维线性结构, ASIC实现

Abstract:

Matrix decomposition is one of the important operations in matrix inversion, which is widely used in neural networks, digital signal processing, wireless communication technology and other fields. Based on a column-vector optimized QR decomposition algorithm, this study proposes a one-dimensional linear matrix decomposition structure and completes the ASIC implementation of the structure to address the shortcomings of the traditional decomposition algorithm operations that are not conducive to hardware implementation. The matrix decomposer supports matrix decomposition operations of order 2~32 and operates at 700 MHz at TSMC 28 nm process. Simulation and FPGA test results show that the relative error between the decomposer and MATLAB results is less than 10-12. When performing matrix decomposition of more than 12-orders, the operation cycle of the decomposer has a speedup ratio of 2.3 times compared with the traditional one-dimensional linear structure. When performing 32-order matrix decomposition, the operation cycle of the decomposer has a speedup ratio of 22.8 times compared with NIVIDA RTX2070.

Key words: matrix decomposition, QR decomposition, Givens rotation, Column-wise Givens Rotation, FPGA implementation, hardware acceleration, one-dimensional linear structure, ASIC implementation

中图分类号: 

  • TN47