西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (3): 147-157.doi: 10.19665/j.issn1001-2400.20240301

• 计算机科学与技术 & 人工智能 • 上一篇    下一篇

求解一类非光滑凸优化问题的相对加速SGD算法

张文娟1(), 冯象初(), 肖锋3(), 黄姝娟3(), 李欢1()   

  1. 1.西安工业大学 基础学院,陕西 西安 710021
    2.西安电子科技大学 数学与统计学院,陕西 西安 710071
    3.西安工业大学 计算机科学与工程学院,陕西 西安 710021
  • 收稿日期:2023-01-03 出版日期:2024-03-26 发布日期:2024-03-26
  • 通讯作者: 肖 锋(1976—),男,教授,E-mail:544070146@mail.xidian.edu.cn
  • 作者简介:张文娟(1980—),女,副教授,E-mail:zhangwenjuan@xatu.edu.cn
    冯象初(1962—),男,教授,E-mail:xcfeng@xidian.edu.cn
    黄姝娟(1974—),女,副教授,E-mail:349242386@qq.com
    李 欢(1998—),女,西安工业大学硕士研究生,E-mail:1186858891@qq.com
  • 基金资助:
    陕西省自然科学基础研究计划(2021-JM440);国家自然科学基金(62171361);陕西省重点研发计划(2022GY-119)

Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem

ZHANG Wenjuan1(), FENG Xiangchu(), XIAO Feng3(), HUANG Shujuan3(), LI Huan1()   

  1. 1. School of Sciences,Xi’an Technological University,Xi’an 710021,China
    2. School of Mathematics and Statistics,Xidian University,Xi’an 710071,China
    3. School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China
  • Received:2023-01-03 Online:2024-03-26 Published:2024-03-26

摘要:

一阶优化算法由于其计算简单、代价小,被广泛应用于机器学习、大数据科学、计算机视觉等领域,然而,现有的一阶算法大多要求目标函数具有Lipschitz连续梯度,而实际中的很多应用问题不满足该要求。在经典的梯度下降算法基础上,引入随机和加速,提出一种相对加速随机梯度下降算法。该算法不要求目标函数具有Lipschitz连续梯度,而是通过将欧氏距离推广为Bregman距离,从而将Lipschitz连续梯度条件减弱为相对光滑性条件。相对加速随机梯度下降算法的收敛性与一致三角尺度指数有关,为避免调节最优一致三角尺度指数参数的工作量,给出一种自适应相对加速随机梯度下降算法。该算法可自适应地选取一致三角尺度指数参数。对算法收敛性的理论分析表明,算法迭代序列的目标函数值收敛于最优目标函数值。针对Possion反问题和目标函数的Hessian阵算子范数随变量范数多项式增长的极小化问题的数值实验表明,自适应相对加速随机梯度下降算法和相对加速随机梯度下降算法的收敛性能优于相对随机梯度下降算法。

关键词: 凸优化, 非光滑优化, 相对光滑, 随机规划, 梯度方法, 加速随机梯度下降

Abstract:

The first order method is widely used in the fields such as machine learning,big data science,computer vision,etc.A crucial and standard assumption for almost all first order methods is that the gradient of the objective function has to be globally Lipschitz continuous,which,however,can’t be satisfied by a lot of practical problems.By introducing stochasticity and acceleration to the vanilla GD (Gradient Descent) algorithm,a RASGD (Relatively Accelerated Stochastic Gradient Descent) algorithm is developed,and a wild relatively smooth condition rather than the gradient Lipschitz is needed to be satisfied by the objective function.The convergence of the RASGD is related to the UTSE (Uniformly Triangle Scaling Exponent).To avoid the cost of tuning this parameter,a ARASGD(Adaptively Relatively Accelerated Stochastic Gradient Descent)algorithm is further proposed.The theoretical convergence analysis shows that the objective function values of the iterates converge to the optimal value.Numerical experiments are conducted on the Poisson inverse problem and the minimization problem with the operator norm of Hessian of the objective function growing as a polynomial in variable norm,and the results show that the convergence performance of the ARASGD method and RASGD method is better than that of the RSGD method.

Key words: convex optimization, nonsmooth optimization, relatively smooth, stochastic programming, gradient method, accelerated stochastic gradient descent

中图分类号: 

  • O24