求解一类非光滑凸优化问题的相对加速SGD算法

doi:10.19665/j.issn1001-2400.20240301

Abstract

Abstract:

The first order method is widely used in the fields such as machine learning,big data science,computer vision,etc.A crucial and standard assumption for almost all first order methods is that the gradient of the objective function has to be globally Lipschitz continuous,which,however,can’t be satisfied by a lot of practical problems.By introducing stochasticity and acceleration to the vanilla GD (Gradient Descent) algorithm,a RASGD (Relatively Accelerated Stochastic Gradient Descent) algorithm is developed,and a wild relatively smooth condition rather than the gradient Lipschitz is needed to be satisfied by the objective function.The convergence of the RASGD is related to the UTSE (Uniformly Triangle Scaling Exponent).To avoid the cost of tuning this parameter,a ARASGD(Adaptively Relatively Accelerated Stochastic Gradient Descent)algorithm is further proposed.The theoretical convergence analysis shows that the objective function values of the iterates converge to the optimal value.Numerical experiments are conducted on the Poisson inverse problem and the minimization problem with the operator norm of Hessian of the objective function growing as a polynomial in variable norm,and the results show that the convergence performance of the ARASGD method and RASGD method is better than that of the RSGD method.

Key words: convex optimization, nonsmooth optimization, relatively smooth, stochastic programming, gradient method, accelerated stochastic gradient descent

CLC Number:

ZHANG Wenjuan, FENG Xiangchu, XIAO Feng, HUANG Shujuan, LI Huan. Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem[J].Journal of Xidian University, 2024, 51(3): 147-157.

Figures/Tables 5

h(x)	强凸、光滑	$\frac{1}{2}\\|\boldsymbol{x}\\|^{2}_{2}$	$∑ i = 1 n$ x_i logx_i	- $∑ i = 1 n$ logx_i	$1 p$ ‖x‖^p (p>2)	$\begin{array}{c}\frac{1}{2}\\|\boldsymbol{x}\\|{ }_{2}^{2}+\frac{1}{p}\\|\boldsymbol{x}\\| \frac{p}{2} \\(p \geqslant 4)\end{array}$
UTSE	γ=2	γ=2	γ=1	0<γ≤0.5	0<γ<1	γ>1

References 21

[1]	王勇, 王喜媛, 任泽洋. 毫米波MIMO的DNN混合预编码梯度优化方法[J]. 西安电子科技大学学报, 2022, 49(1):202-207.
	WANG Yong, WANG Xiyuan, REN Zeyang. Algorithm for Gradient Optimization of Hybrid Precoding Based on DNN in the Millimeter Wave MIMO System[J]. Journal of Xidian University, 2022, 49(1):202-207.
[2]	LI J, XIAO M, FENG C, et al. Training Neural Networks by Lifted Proximal Operator Machines[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6):3334-3348.
[3]	POLYAK B T. Introduction to Optimization[M]. New York: Optimization Software,1987.
[4]	NESTEROV Y. Introductory Lectures on Convex Optimization:a Basic Course[M]. Boston: Kluwer Academic Publishers, 2004.
[5]	BIRNBAUM B, DEVANUR N R D, XIAO L. Distributed Algorithms via Gradient Descent for Fisher Markets[C]//Proceedings of the 12th ACM Conference on Electronic Commerce. New York: ACM, 2011:127-136.
[6]	BAUSCHKE H H, BOLTE J, TEBOULLE M. A Descent Lemma Beyond Lipschitz Gradient Continuity:First-Order Methods Revisited and Applications[J]. Mathematics of Operations Research, 2017, 42(2):330-348.
[7]	LU H, FREUD R M, NESTEROV Y. Relatively-Smooth Convex Optimization by First Order Methods,and Applications[J]. SIAM Journal on Optimization, 2018, 28(1):333-354.
[8]	HANZELY F, RICHTARIK P, XIAO L. Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization(2020)[R/OL].[2020-01-01].http://10.48550/arXiv.1808.03045.
[9]	NESTEROV Y. Implementable Tensor Methods in Unconstrained Convex Optimization[J]. Mathematical Programming, 2021, 186:157-183.
[10]	ZHOU Y, LIANG Y, SHEN L. A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm[J]. Computational Optimization and Applications, 2019, 93:903-912.
[11]	LI H, LIN Z, FANG Y. Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization[J]. Journal of Machine Learning Research, 2022, 23(1):10057-10097.
[12]	ZHOU P, YUAN X, LIN Z,et.al. A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10):5933-5946.
[13]	SCHMIDT M, ROUX L N, BACH F. Minimizing Finite Sums with the Stochastic Average Gradient[J]. Mathematical Programming, 2017, 162(1-2):83-112.
[14]	JOHNSON R, ZHANG T. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction[C]//Advances in Neural Information Processing Systems. San Diego: NEURIPS, 2013,315-323.
[15]	HANZELY F, RICHTARIKP. Fastest Rates for Stochastic Mirror Descent Methods(2018)[R/OL].[2018-01-01].https://doi.org/10.48550/arXiv.1803.07374v1.
[16]	XIE X, ZHOU P, LI H,et.al. Adan:Adaptive Nesterov Momentum Algorithm for Faster Optimizing DeepModels(2023)[R/OL].[2023-01-01].https://doi.org/10.48550/arXiv.2208.06677.
[17]	ZHUANG Z, LIU M, CUTKOSKY A,et.al. Understanding Adamw through Proximal Methods and Scale-Freeness(2022)[R/OL].[2022-08-09].https://doi.org/10.48550/arXiv.2202.00089.
[18]	LI H, LIN Z. Restarted Nonconvex Accelerated Gradient Descent:No More Polylogarithmic Factor in the O(ε^-7/4) Complexity(2022)[R/OL].[2022-01-01].https://doi.org/10.48550/arXiv.2201.11411.
[19]	POLYAK B T. Some Methods of Speeding up the Convergence of Iteration Methods[J]. Ussr Computational Mathematics and Mathematical Physics, 1964, 4(5):1-17.
[20]	NESTEROV Y. On an Approach to the Construction of Optimal Methods of Minimization of Smooth Convex Functions[J]. Ekonomika I Mateaticheskie Metody, 1988, 24(3):509-517.
[21]	ALLEN ZZ. Katyusha:the First Truly Accelerated Stochastic Gradient Method[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: ACM, 2017:1200-1206.

Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 5

References 21

Related Articles 14

Metrics

Comments

Recommended 0

[1]	XU Shasha,ZHOU Fang,LI Yangjian,JIANG Junzheng. New distributed positioning algorithm for sensor nodes [J]. Journal of Xidian University, 2022, 49(2): 89-96.
[2]	CAO Yi,CAI Xiaodong. Effective learning strategy for hard samples [J]. Journal of Xidian University, 2021, 48(3): 99-105.
[3]	WANG Hongyan,QIU Helei,PEI Tengda. Visual tracking method using discriminant dictionary learning [J]. Journal of Xidian University, 2019, 46(4): 150-158.
[4]	HE Wangpeng;SUN Wei;SU Bo;YANG Yunyi;GUO Baolong. Sparse feature extraction technique and its applications to machinery fault diagnosis [J]. Journal of Xidian University, 2018, 45(2): 154-159.
[5]	YUAN Haobo;WU Zhengguo;HOU Jianqiang;DANG Xiaojie. Accurate optimization algorithm for the antenna phase center [J]. Journal of Xidian University, 2017, 44(5): 64-68.
[6]	LIU Shuaiqi;WANG Buhong;LI Longjun;LI Xia;CAO Shuai. DOA estimation method for two-dimensional hybrid MIMO phased-array radar [J]. Journal of Xidian University, 2017, 44(3): 157-163.
[7]	YANG Ming;LI Xiang;YANG Hao;LIU Xin;CHEN Kunqi. Joint optimization of cooperative sensing and transmission in energy-efficiency cognitive radio [J]. Journal of Xidian University, 2017, 44(3): 101-107.
[8]	KONG Fanqiang;BIAN Chending;LI Yunsong;GUO Wenjun. Hyperspectral unmixing method based on the non-convex sparse and low-rank constraints [J]. Journal of Xidian University, 2016, 43(6): 116-121.
[9]	GAO Song;LI Yushan;YAN Xu. Fast convex optimization method for passivity enforcement of macromodels [J]. J4, 2012, 39(1): 62-66+121.
[10]	LI Xinmin;BAI Baoming;TONG Sheng. Precoding for MIMO downlinks with imperfect CSI [J]. J4, 2011, 38(5): 7-12.
[11]	LI Jun;XING Meng-dao;ZHANG Lei;WU Sun-jun. High resolution imaging method for the sparse aperture of ISAR [J]. J4, 2010, 37(3): 441-446+453.
[12]	HE Xue-hui;WU Zhao-ping;SU Tao;WU Shun-jun. Optimal design method combined aribtary phase codes with pulse compression filters optimization [J]. J4, 2009, 36(6): 1027-1033.
[13]	CHAI Jing;LIU Hong-wei;BAO Zheng. New kernel learning method to improve radar HRRP target recognition and rejection performance [J]. J4, 2009, 36(5): 793-800.
[14]	HUO Yong-liang1;2;LIU San-yang1. The upper semiconvergence of the optimal solution set of approximations for stochastic programming [J]. J4, 2005, 32(6): 953-957.