电子科技 ›› 2023, Vol. 36 ›› Issue (7): 1-7.doi: 10.16180/j.cnki.issn1007-7820.2023.07.001

• •    下一篇

基于LDL算法的大规模矩阵求逆加速器设计及其FPGA实现

余浩然,肖昊   

  1. 合肥工业大学 微电子学院,安徽 合肥 230009
  • 收稿日期:2022-01-13 出版日期:2023-07-15 发布日期:2023-06-21
  • 作者简介:余浩然(1997-)男,硕士研究生。研究方向:集成电路设计与测试。|肖昊(1982-)男,教授,博士生导师。研究方向:专用硬件加速器、多核片上系统(MPSoC)设计。
  • 基金资助:
    国家自然科学基金(61974039);航空科学基金(2018ZCP4003)

Design and FPGA Implementation of Large Scale Matrix Inversion Accelerator Based on LDL Algorithm

YU Haoran,XIAO Hao   

  1. School of Microelectronics,Hefei University of Technology,Hefei 230009,China
  • Received:2022-01-13 Online:2023-07-15 Published:2023-06-21
  • Supported by:
    National Natural Science Foundation of China(61974039);Aero Science Foundation of China(2018ZCP4003)

摘要:

矩阵求逆是工程计算中的基本问题,在大规模MIMO系统、阵列信号处理以及图像信号处理等应用中,大规模矩阵求逆的处理速度对系统性能至关重要,但传统矩阵求逆方法运算复杂度高、并行性低且消耗大量存储空间,不利于硬件加速。针对大规模矩阵求逆硬件加速问题,文中研究了基于LDL分解的矩阵求逆算法,并提出了一种基于该算法的大规模矩阵求逆加速架构。利用LDL分解后三角矩阵对角线元素全为1的特点,对矩阵进行分块迭代设计,减少了求逆运算的计算量,提高了计算速度。文中基于Xilinx Virtex7 FPGA设计实现了该加速器,实验结果表明,在128阶矩阵下,吞吐量达105.2 Inv·s-1,最高时钟频率达200 MHz。与现有矩阵求逆加速方案相比,该设计占用的硬件资源更少,且具有更高的性能。

关键词: LDL分解, 矩阵求逆, Cholesky分解, 矩阵分块, 三角矩阵变换, 矩阵相乘, 硬件加速, 现场可编程门阵列

Abstract:

Matrix inversion is a basic problem in engineering calculation. In large-scale MIMO systems, array signal processing, image signal processing and other applications, the processing speed of large-scale matrix inversion is very important to the system performance. However, the traditional matrix inversion method has high computational complexity, low parallelism and consumes a lot of storage space, which is not conducive to hardware acceleration. Aiming at the hardware acceleration problem of large-scale matrix inversion, this study studies the matrix inversion algorithm based on LDL decomposition and proposes a large-scale matrix inversion acceleration architecture based on this algorithm. Using the characteristic that the diagonal elements of triangular matrix after LDL decomposition are all 1, the matrix is designed by block iteration, which reduces the amount of calculation and improves the calculation speed. This study designs and implements the accelerator based on Xilinx Virtex7 FPGA. The experimental results show that under the 128 order matrix, the throughput is 105.2 Inv·s-1 and the maximum clock frequency is 200 MHz. Compared with the existing matrix inversion acceleration scheme, this design occupies less hardware resources and has higher performance.

Key words: LDL decomposition, matrix inversion, Cholesky decomposition, matrix block, triangular matrix transformation, matrix multiplication, hardware acceleration, field programmable gate array

中图分类号: 

  • TP309.7