广西科学  2016, Vol. 23 Issue (5): 422-427   PDF    
非凸两分块问题乘子交替方向法的收敛性分析
邓钊1, 晁绵涛1, 简金宝2     
1. 广西大学数学与信息科学学院, 广西 南宁 530004;
2. 玉林师范学院数学与统计学院, 广西 玉林 537000
摘要: 乘子交替方向法(ADMM)求解大规模问题十分有效.ADMM在凸情形下的收敛性已被清晰认识,但非凸问题ADMM的收敛性结果还很少.本文针对非凸两分块优化问题,在增广拉格朗日函数满足Kurdyka-Lojasiewicz不等式性质且罚参数大于某个常数的条件下,证明了ADMM的收敛性.
关键词: 乘子交替方向法     Kurdyka-Lojasiewicz 不等式     非凸优化     收敛性    
Convergence Analysis of Alternating Direction Method of Multipliers for Two Block Nonconvex Problems
DENG Zhao1 , CHAO Miantao1 , JIAN Jinbao2     
1. College of Mathematics and Information Science,Guangxi University,Nanning,Guangxi 530004,China;
2. School of Mathematics and Statistics,Yulin Normal University,Yulin,Guangxi,537000,China
Abstract: The Alternating Direction Method of Multipliers(ADMM) is an effective method for large scale optimization problems.While the convergence of ADMM has been clearly recognized in the case of convex,the convergence result of ADMM in the case of nonconvex is still an open problem.In this paper,under the assumption that the augmented Lagrangian function satisfies the Kurdyka-Lojasiewicz inequality and the penalty parameter is greater than a constant,we analyze the convergence of ADMM for a class of nonconvex optimization problems whose objective function is the sum of two block nonconvex functions.
Key words: Alternating Direction Method of Multipliers     Kurdyka-Lojasiewicz inequality     nonconvex optimization     convergence    
0 引言

研究意义】乘子交替方向法(ADMM)求解大规模分布式计算问题十分有效.ADMM既能分散地收集和存储这些数据集,又能在并行和分布式的环境下求解这些问题.ADMM适合求解分布式凸优化问题,尤其适用于出现在统计学、机器学习和相关领域中的大规模问题,其重要性日益凸显.【前人研究进展】ADMM的思想最早起源于20世纪50年代,算法在20世纪70年代中期由Glowinski和Marrocco[1],以及Gabay和Mercier[2]首次提出.20世纪80年代,乘子交替方向法的研究和应用非常广泛,直到20世纪90年代中期,乘子交替方向法求解凸优化问题的很多理论结果,已经得到很好的证明.传统ADMM是求解凸两分块问题十分有效的方法[1-2],其直接应用到问题(0.5)的迭代框架如下

$\left\{ {\begin{array}{*{20}{l}} {{x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\},}\\ {{y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\},}\\ {{\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}}} \right),} \end{array}} \right.$ (0.1)

其中Lβ(x,y,λ)为问题(0.5)的增广拉格朗日函数,定义如下

$\begin{array}{l} {L_\beta }\left( {x,y,\lambda } \right) = f\left( x \right) + g\left( y \right) - {\lambda ^{\rm{T}}}\left( {Ax + By} \right){\rm{ + }}\\ \frac{\beta }{2}{\left\| {Ax + By} \right\|^2}. \end{array}$ (0.2)

在凸情形下,ADMM的收敛性已被充分认识[3].非凸问题ADMM的收敛性分析仅有初步的结果,其研究是当前的热点问题[4-6].文献[7]考虑如下非凸问题

$\begin{array}{*{20}{c}} {\min }&{f\left( x \right) + g\left( y \right)}\\ {{\rm{s}}.{\rm{t}}.}&{Ax = y,} \end{array}$ (0.3)

并提出如下改进的ADMM

$\left\{ {\begin{array}{*{20}{l}} {{x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\},}\\ {{y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {y,{y^k}} \right),}\\ {{\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} - {y^{k + 1}}} \right),} \end{array}} \right.$ (0.4)

其中${\Delta _\phi }$是关于强凸函数$\phi $的Bregman距离.当$\phi $=0时,算法(0.4)为传统ADMM.在罚参数充分大且目标函数二阶连续可微的条件下,文献[7]证明算法产生点列的聚点是问题(0.3)的稳定点.

文献[8]分析如下Bregman ADMM算法的收敛性

$\left\{ \begin{array}{l} {x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {x,{x^k}} \right),\\ {y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {y,{y^k}} \right),\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} - {y^{k + 1}}} \right), \end{array} \right.$

本研究切入点】最近,文献[9]在矩阵A列满秩,增广拉格朗日函数满足KL性质(参见定义1.1)且罚参数大于某个常数的条件下,分析了传统ADMM算法(0.1)求解非凸问题(0.3)的全局收敛性.

我们考虑如下两分块极小化问题

$\begin{array}{*{20}{c}} {\min }&{f\left( x \right) + g\left( y \right)}\\ {{\rm{s}}.{\rm{t}}.}&{Ax + By = 0,} \end{array}$ (0.5)

其中函数$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是一个非凸正常下半连续函数,函数$g:{^m} \to $L-Lipschitz可微,即存在L>0使得‖▽ g(x)-▽g(y)‖≤Lx-y‖,$\forall x,y \in {^n}$),矩阵$A \in {^{m \times n}},B \in {^{m \times n}}$.这类问题广泛出现在实际应用中,如矩阵分解、图像处理、信号处理等[4-5].

拟解决的关键问题】本文在不要求矩阵A列满秩,B不一定是单位阵,在Lβ(w)满足KL性质且罚参数β大于某个常数的条件下分析了ADMM算法(0.1)求解问题(0.5)的收敛性.

1 预备知识

下面,给出本文理论分析所需的一些概念与性质.

λ++(BTB)表示矩阵BTB最小正特征值.$\partial f\left( x \right)$表示函数f在点x处的极限次微分[5],对于任意x∈${^n}$是函数f的极小值点的必要条件是:0∈$\partial f\left( x \right)$,满足这个条件的点称为关键点或稳定点,函数f关键点集记为critf.

定义1.1[10](Kurdyka-Lojasiewicz性质)设函数f$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是正常下半连续函数,对于任意实数η1,η2(η1η2),令[η1fη2]=x∈{${^n}$∶η1f(x)<η2,设x*∈${\rm{dom}}\partial f$, 若存在η∈(0,+∞],x*的邻域U,以及一个连续的凹函数φ:[0,η)→${_ + }$,满足如下条件

(i) φ(0)=0;

(ii) φ在0处连续,在区间(0,η)上一阶连续可微;

(iii) φ′(s)>0,s∈(0,η);

(iv) 对于任意的xU∩[f(x*)<ff(x*)+η],有如下Kurdyka-Lojasiewicz不等式成立

$\varphi '\left( {f\left( x \right) - f\left( {{x^*}} \right)} \right)d\left( {0,\partial f\left( x \right)} \right) \ge 1.$

则称函数f在点x*处满足Kurdyka-Lojasiewicz性质(简称KL性质).

满足上述性质(i)、(ii)、(iii)的函数全体记为Φη.

引理1.1[11](uniformized KL property)设Ω是一个紧集,函数$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是正常下半连续函数.设函数f在集合Ω上取常数,并在Ω中任一点处满足KL性质,则存在$ \in > 0,\eta > 0,\varphi {\rm{ }} \in {\Phi _\eta }$,对于任意的xΩx∈{x∈${^n}$:d(x,ω)<$ \in $}∩[f(x)<ff(x)+η],有

$\varphi {\rm{ }}\prime (f\left( x \right) - f(\bar x))d\left( {0,\partial {\rm{ }}f\left( x \right)} \right) \ge 1.$

引理1.2[12]若$h:{^n} \to $为L-Lipschitz可微函数,则对任意的x,y∈${^n}$,有

$\left| {h\left( y \right) - h\left( x \right) - < \nabla h\left( x \right),y - x > } \right| \le \frac{L}{2}y - x{^2}.$
2 收敛性分析

假设{wk=(xk,yk,λk)}是算法(0.1)产生的有界序列,算法(0.1)的收敛性分析框架如下:第一步,证明{Lβ(wk)}单调递减;第二步,证明$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $;第三步,利用Lβ(·)的KL性质,得出序列{wk}的任一聚点都是问题(0.5)的一个稳定点.

由算法(0.1)中每个子问题的最优性条件,有

$\left\{ \begin{array}{l} 0 \in \partial f\left( {{x^{k + 1}}} \right) - {A^{\rm{T}}}{\lambda ^k} + \beta {A^{\rm{T}}}\left( {A{x^{k + 1}} + B{y^k}} \right),\\ 0 = \nabla g({y^{k + 1}}) - {B^{\rm{T}}}{\lambda ^k} + \beta {B^{\rm{T}}}(A{x^{k + 1}} + B{y^{k + 1}}),\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ (2.1)

进一步,可得

$\left\{ \begin{array}{l} {A^{\rm{T}}}{\lambda ^{k + 1}} + \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) \in \partial f\left( {{x^{k + 1}}} \right),\\ \nabla g({y^{k + 1}}) = {B^{\rm{T}}}{\lambda ^{k + 1}},\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ (2.2)

本文的收敛性建立在如下假设条件下.

假设2.1 假设以下条件成立

(i) Im(A)$ \subseteq $Im(B);

(ii) 存在利普希茨系数分别为M>0,P>0的利普希茨连续函数H:Im(B)→${^n}$,F:Im(A)→${^n}$,使得H(u)=arg $\mathop {\min }\limits_y ${g(y):By=u},$F\left( u \right) = \arg \mathop {\min }\limits_x \left\{ {f\left( x \right):Ax = u} \right\}$;

(iii)$\delta \sim = \frac{\beta }{{2{M^2}}} - \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2} > 0$,其中$\mu = \lambda _{{\rm{ + + }}}^{ - \frac{1}{2}}({B^{\rm{T}}}B)$.

首先,证明{Lβ(wk)}kN是递减序列.

引理2.1 ${L_\beta }({w^{k + 1}}) \le {L_\beta }({w^k}) - \delta {y^{k + 1}} - {y^k}{^2}$,其中$\delta = \frac{\beta }{{2{M^2}}} - \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2} > 0$.

证明 由增广拉格朗日函数的定义,可得

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) = {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) + < {\lambda ^k} - \\ {\lambda ^{k + 1}},A{x^{k + 1}} + B{y^{k + 1}} > = {L_\beta }({x^{k + 1}},{y^{k{\rm{ }} + 1}},{\lambda ^k}) + \frac{1}{\beta }{\lambda ^k} - \\ {\lambda ^{k + 1}}{^2}. \end{array}$ (2.3)

利用yk+1的最优性可得

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}){\rm{ }} = f({x^{k + 1}}) + \\ g({y^k}) - < {\lambda ^k},A{x^{k + 1}} + B{y^k} > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \\ \{ f({x^{k + 1}}) + g({y^{k + 1}}){\rm{ }} - < {\lambda ^k},A{x^{k + 1}} + B{y^{k + 1}} > + \\ \frac{\beta }{2}A{x^{k + 1}} + B{y^{k + 1}}{^2}\} = g({y^k}) - g({y^{k + 1}}) + < {\lambda ^k},\\ B({y^{k + 1}} - {y^k}) > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \frac{\beta }{2}A{x^{k + 1}} + \\ B{y^{k + 1}}{^2}. \end{array}$ (2.4)

由引理1.2和式(2.2)中第二式可得

$\begin{array}{l} g({y^k}) - g({y^{k + 1}}) \ge < {\lambda ^{k + 1}},B({y^k} - {y^{k + 1}}) > - \\ \frac{L}{2}{y^k} - {y^{k + 1}}{^2}. \end{array}$

把上式代入式(2.4)可得

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) \ge < {\lambda ^{k + 1}} - \\ {\lambda ^k},B({y^k} - {y^{k + 1}}) > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \frac{\beta }{2}A\\ {x^{k + 1}} + B{y^{k + 1}}{^2} - \frac{L}{2}{y^k} - {y^{k + 1}}{^2}, \end{array}$ (2.5)

由${\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}})$可得

$A{x^{k + 1}} + B{y^k} = \frac{1}{\beta }({\lambda ^k} - {\lambda ^{k + 1}}) + B({y^k} - {y^{k + 1}}).$

$\begin{array}{l} < {\lambda ^{k + 1}} - {\lambda ^k},B\left( {{y^k} - {y^{k + 1}}} \right) > + \frac{{\bar \beta }}{2}A{x^{k + 1}} + \\ B{y^k}{^2} = < {\lambda ^{k + 1}} - {\lambda ^k},B({y^k} - {y^{k + 1}}) > {\rm{ }} + \frac{\beta }{2}\frac{1}{\beta }({\lambda ^k} - \\ {\lambda ^{k + 1}}) + B({y^k} - {y^{k + 1}}){^2} = \frac{\beta }{2}B({y^k} - {y^{k + 1}}){^2} + \frac{1}{{2\beta }}\\ {\lambda ^{k + 1}} - {\lambda ^k}{^2}. \end{array}$

把上式代入式(2.5)右端可得

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}){\rm{ }} \le \frac{L}{2}{y^k} - \\ {y^{k + 1}}{^2} - \frac{\beta }{2}B({y^k} - {y^{k + 1}}){^2}. \end{array}$ (2.6)

由${\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}})$可得

${\lambda ^{k + 1}} - {\lambda ^k} = - \beta (A{x^{k + 1}} + B{y^{k + 1}}) \in {\mathop{\rm Im}\nolimits} \left( B \right).$

$\mu = \lambda _{{\rm{ + + }}}^{ - \frac{1}{2}}({B^{\rm{T}}}B) > 0$,则

$\begin{array}{l} {\lambda ^{k + 1}} - {\lambda ^k} \le \mu B({\lambda ^{k + 1}} - {\lambda ^k}){\rm{ }} = \\ \mu \nabla g({y^{k + 1}}) - \nabla g({y^k}) \le \mu L{y^{k + 1}} - {y^k}. \end{array}$ (2.7)

H的性质可得yk=H(Byk),从而

$\begin{array}{l} {y^{k + 1}} - {y^k} = H(B{y^{k + 1}}) - H(B{y^k}){\rm{ }} \le \\ MB({y^{k + 1}} - {y^k}). \end{array}$ (2.8)

结合式(2.6)和引理2.1可得

$\begin{array}{l} {L_\beta }\left( {{x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}} \right) \le {L_\beta }\left( {{x^{k + 1}},{y^k},{\lambda ^k}} \right){\rm{ }} + \\ \frac{1}{\beta }{\lambda ^k} - {\lambda ^{k + 1}}{^2} + \frac{L}{2}{y^k} - {y^{k + 1}}{^2} - \frac{\beta }{2}B({y^k} - \\ {y^{k + 1}}){^2}. \end{array}$

结合式(2.7)、式(2.8)有

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) \le {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}){\rm{ }} - (\frac{\beta }{{2{M^2}}} - \\ \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2}){y^{k + 1}} - {y^k}{^2}. \end{array}$

利用xk+1的最优性可得

$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) \le {L_\beta }({x^k},{y^k},{\lambda ^k}){\rm{ }} - (\frac{\beta }{{2{M^2}}} - \\ \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2}){y^{k + 1}} - {y^k}{^2}. \end{array}$

引理2.2 $\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $.

证明 由于序列{wk=(xk,yk,λk)}有界,则存在收敛子列{wkj},设$\mathop {\lim }\limits_{{k_j} = + \infty } {w^{{k_j}}} \to {w^*}$. 由f下半连续及g连续,可知Lβ(w)下半连续.从而

${L_\beta }({w^*}) \le \mathop {\lim \inf }\limits_{j \to + \infty } {L_\beta }({w^{{k_j}}}).$

故序列{Lβ(wkj)}有下界,又序列{Lβ(wk)}kN单调递减,所以Lβ(wkj)收敛并且Lβ(wk)≥Lβ(w*).由引理2.1可得

$\begin{array}{l} \sum\limits_{k = 0}^n \delta {y^{k + 1}} - {y^k}{^2} \le {L_\beta }({w^0}) - {L_\beta }({w^{n + 1}}){\rm{ }} \le \\ {L_\beta }({w^0}) - {L_\beta }({w^*}) < + \infty , \end{array}$

δ>0及n的任意性可得

$\sum\limits_{k = 0}^{ + \infty } {{y^{k + 1}} - {y^k}{^2} < + \infty .} $

上式结合式(2.7)可得

$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {\lambda ^k}{^2} < + \infty .} $

接下来只需证明:$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {x^k}{^2} < + \infty .} $由ADMM算法(0.1)第三式有

$\left\{ \begin{array}{l} {\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}}} \right),\\ {\lambda ^k} = {\lambda ^{k - 1}} - \beta (A{x^k} + B{y^k}). \end{array} \right.$

两式相减,可得

$\begin{array}{l} {\lambda ^{k + 1}} - {\lambda ^k} = \left( {{\lambda ^k} - {\lambda ^{k - 1}}} \right){\rm{ }} + \beta \left( {A{x^k} - A{x^{k + 1}}} \right) + \\ \beta \left( {B{y^k} - B{y^{k + 1}}} \right). \end{array}$

利用不等式(a+b+c)2≤3(a2+b2+c2)可得

$\begin{array}{l} \beta (A{x^k} - A{x^{k + 1}}){^2} \le 3({\lambda ^{k + 1}} - {\lambda ^k}{^2} + \\ {\lambda ^k} - {\lambda ^{k - 1}}{^2} + {\beta ^{2}}B({y^{k + 1}} - {y^k}){^2}). \end{array}$ (2.9)

F的性质可得xk=F(Axk),进而

$\beta (A{x^k} - A{x^{k + 1}}){^2} \ge \frac{{{\beta ^2}}}{P}{x^k} - {x^{k + 1}}{^2}.$ (2.10)

由式(2.9)和式(2.10)可得

$\sum\limits_{k = 0}^{ + \infty } {{x^{k + 1}} - {x^k}{^2} < + \infty .} $

故$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $.

引理2.3 存在ζ>0,对于任意k

$d(0,\partial {L_\beta }({w^{k + 1}})) \le \zeta {y^{k + 1}} - {y^k}.$

证明 由增广拉格朗日函数定义,可得

$\left\{ \begin{array}{l} \begin{array}{*{20}{l}} {{\partial _x}{L_\beta }\left( {{w^{k + 1}}} \right) = \partial f\left( {{x^{k + 1}}} \right) - {A^{\rm{T}}}{\lambda ^{k + 1}} + \beta {A^{\rm{T}}}(A{x^{k + 1}} + }\\ {B{y^{k + 1}}),} \end{array}\\ {\partial _y}{L_\beta }({w^{k + 1}}) = \nabla g({y^{k + 1}}) - {B^{\rm{T}}}{\lambda ^{k + 1}} + \beta {B^{\rm{T}}}(A{x^{k + 1}} + \\ B{y^{k + 1}})\\ {\partial _\lambda }{L_\beta }({w^{k + 1}}) = - (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ (2.11)

进一步结合式(2.2)可得

$\left\{ \begin{array}{l} \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) + {A^{\rm{T}}}\left( {{\lambda ^k} - {\lambda ^{k + 1}}} \right){\rm{ }} \in {\partial _x}{L_\beta }\left( {{w^{k + 1}}} \right)\\ {B^{\rm{T}}}({\lambda ^k} - {\lambda ^{k + 1}}) \in {\partial _y}{L_\beta }({w^{k + 1}}),\\ \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k}) \in {\partial _\lambda }{L_\beta }({w^{k + 1}}). \end{array} \right.$ (2.12)

令x${x^*}_{k + 1} = \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) + {A^{\rm{T}}}\left( {{\lambda ^k} - {\lambda ^{k + 1}}} \right)$,${y^*}_{k + 1} = {B^{\rm{T}}}({\lambda ^k} - {\lambda ^{k + 1}}),{\lambda ^*}_{k + 1} = \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k})$,则

$({x^*}_{k + 1},{y^*}_{k + 1},{\lambda ^*}_{k + 1}) \in \partial {L_\beta }({w^{k + 1}})$且存在ζ1,ζ2>0,使得

$\begin{array}{l} ({x^*}_{k + 1},{y^*}_{k + 1},{\lambda ^*}_{k + 1}) \le {\zeta _1}{y^{k + 1}} - {y^k} + \\ {\zeta _2}{\lambda ^{k + 1}} - {\lambda ^k}. \end{array}$

ζ:=ζ1+μLζ2,结合式(2.7)可得

$d(0,\partial {L_\beta }({w^{k + 1}})) \le \zeta {y^{k + 1}} - {y^k}.$

引理2.4 设序列{wk}的全体极限点记为S(w0),则

(i) S(w0)是一个非空紧集,并且

d(wk,S(w0))→0,k→+∞;

(ii) $S({w^0}) \subset {\rm{crit}}{L_\beta }$;

(iii) Lβ(·)在S(w0)上取有限值且为常数,且

$\mathop {\inf }\limits_{k \in N} {L_\beta }({w^k}) = \mathop {\lim }\limits_{k \to + \infty } {L_\beta }({w^k}).$

证明 (i) 式由S(w0)的定义直接可得.

(ii) 设(x*,y*,λ*)∈S(w0),则存在子列{(xkj,ykj,λkj)}使得(xkj,ykj,λkj)→(x*,y*,λ*),(kj→+∞).由xk+1的最优性有

${L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) \le {L_\beta }({x^*},{y^k},{\lambda ^k}).$

由引理2.2可知‖wk+1-wk‖→0,从而(xkj+1,ykj+1,λkj+1)→(x*,y*,λ*).又Lβ(·)关于y,λ连续,则有

$\begin{array}{l} \mathop {\lim }\limits_{j \to + \infty } \sup {L_\beta }({x^{{k_j} + 1}},{y^{{k_j}}},{\lambda ^{{k_j}}}){\rm{ }} = \mathop {\lim }\limits_{j \to + \infty } \sup {L_\beta }({x^{{k_j} + 1}},\\ {y^{{k_j} + 1}},{\lambda ^{{k_j} + 1}}){\rm{ }} \le {L_\beta }({x^*},{y^*},{\lambda ^*}). \end{array}$

由函数Lβ(·)的下半连续性,有

$\mathop {\lim \inf }\limits_{j \to + \infty {\rm{ }}} {L_\beta }({x^{{k_j}}} + 1,{y^{{k_j}}} + 1,{\lambda ^{{k_j}}} + 1) \ge {L_\beta }({x^*},{y^*},{\lambda ^*}).$

由上式及式(2.13)可得$\mathop {\lim }\limits_{j \to + \infty } f({x^{{k_j} + 1}}) = f({x^*})$.结合▽g的连续性和$\partial f$的闭性,在式(2.2)中对序列$({x^{{k_j} + 1}},{y^{{k_j} + 1}},{\lambda ^{{k_j} + 1}})$取极限,可得

$\left\{ \begin{array}{l} {A^{\rm{T}}}{\lambda ^*} \in \partial f\left( {{x^*}} \right),\\ \nabla g({y^*}) = {B^{\rm{T}}}{\lambda ^*},\\ A{x^*} + B{y^*} = 0. \end{array} \right.$

即${w^*} \subset {\rm{crit}}{L_\beta }$.

(iii) 对于任意点(x*,y*,λ*)∈S(w0),存在子列(xkj,ykj,λkj)→(x*,y*,λ*).结合Lβ(xkj+1)收敛,以及{Lβ(wk)}kN单调递减,可得

$\mathop {\lim }\limits_{k \to + \infty } {L_\beta }({x^k},{y^k},{\lambda ^k}) = {L_\beta }({x^*},{y^*},{\lambda ^*}).$

因此,函数Lβ(·)在集合S(w0)上是常数,并且有$\mathop {\inf }\limits_{k \in N} {\rm{ }}{L_\beta }({w^k}) = \mathop {\lim }\limits_{k \to + \infty } {L_\beta }({w^k}).$.

最后,给出非凸问题(0.5)的ADMM算法(0.1)的收敛性分析.

定理2.1Lβ(w)满足KL性质,则

(i) $\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k} < + \infty ;} $;

(ii) 序列{wk}收敛到函数Lβ(·)的一个关键点.

证明 由引理2.4知$\mathop {\lim }\limits_{k \to + \infty } {L_\beta }\left( {{w^k}} \right) = {L_\beta }({w^*})(\forall {w^*} \in S({w^0}))$成立.考虑如下两种情形:

(i) 存在整数k0使得Lβ(wk0)=Lβ(w*)成立,由引理2.1可知,对于任意的kk0,有

$\begin{array}{l} {y^{k + 1}} - {y^k}{^2} \le {L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^{k + 1}}} \right){\rm{ }} \le \\ {L_\beta }({w^{{k_0}}}) - {L_\beta }({w^*}) = 0, \end{array}$

因此,对于任意的k>k0,有yk+1=yk.结合式(2.7)、式(2.9)、式(2.10)可知,对于任意的kk0+1,有wk+1=wk成立,结论成立.

(ii) 对任意的k均有Lβ(wk)>Lβ(w*)成立.由d(wk,S(w0))→0可知,对于任意给定的ε>0,存在k1>0,当kk1时,有d(wk,S(w0))<ε.又根据Lβ(wk)→Lβ(w*)可知,对于任意给定的η>0,存在k2>0,当kk2时,有Lβ(wk)<Lβ(w*)+η.从而当$k > \tilde k = \max \{ {k_1},{k_2}\} $时,有

$\begin{array}{l} d\left( {{w^k},S\left( {{w^0}} \right)} \right) < \varepsilon ,\\ {L_\beta }\left( {{w^*}} \right)\beta \left( {{w^k}} \right)\beta \left( {{w^*}} \right) + \eta . \end{array}$

由于S(w0)是非空紧集,函数Lβ(·)在集合S(w0)上是常数,由引理1.1得

$\begin{array}{l} \varphi \prime \left( {{L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^*}} \right)} \right)d\left( {0,\partial {L_\beta }\left( {{w^k}} \right)} \right) \ge \\ 1\left( {\forall k > \tilde k} \right). \end{array}$

由函数φ的凹性,可得

$\begin{array}{l} \varphi \left( {{L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^*}} \right)} \right) - \varphi ({L_\beta }\left( {{w^{k + 1}}} \right) - \\ {L_\beta }({w^*})){\rm{ }} \ge \varphi \prime ({L_\beta }({w^k}) - {L_\beta }({w^*}))({L_\beta }({w^k}) - \\ {L_\beta }({w^{k + 1}})). \end{array}$

由引理2.3及φ′(Lβ(wk)-Lβ(w*))>0,可得

$\begin{array}{l} {L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^{k + 1}}} \right) \le \\ \frac{{\varphi ({L_\beta }({w^k}) - {L_\beta }({w^*})) - {\rm{ }}\varphi ({L_\beta }({w^{k + 1}}) - {L_\beta }({w^*}))}}{{\varphi \prime ({L_\beta }({w^k}) - {L_\beta }({w^*}))}} \le \\ \zeta {y^k} - {y^{k - 1}}[\varphi ({L_\beta }({w^k}) - {L_\beta }({w^*})) - \\ \varphi ({L_\beta }({w^{k + 1}}) - {L_\beta }({w^*}))]. \end{array}$

令Δp,q=φ(Lβ(wp)-Lβ(w*))-φ(Lβ(wq)-Lβ(w*)).由引理2.1可得,对于任意的kk

$\delta {y^{k + 1}} - {y^k}{^2} \le \zeta {y^k} - {y^{k - 1}}{\Delta _{k,k + 1}},$

${y^{k + 1}} - {y^k} \le \sqrt {\frac{\zeta }{\delta }{\Delta _{k,k + 1}}} {y^k} - {y^{k - 1}}{^{\frac{1}{2}}}.$

利用不等式$2\sqrt {\alpha \beta } \le \left( {\alpha + \beta > 0} \right)$可得

$2{y^{k + 1}} - {y^k} \le {y^k} - {y^{k - 1}} + \frac{\zeta }{\delta }{\Delta _{k,k + 1}},$

由上式可得

$\begin{array}{l} 2\sum\limits_{k = + 1}^m {{y^{k + 1}} - {y^k}} \le \sum\limits_{k = + 1}^m {{\rm{ }}{y^k} - {y^{k - 1}}} + \\ \frac{\zeta }{\delta }{\Delta _{ + 1,m + 1}}. \end{array}$

注意到φ(Lβ(wm+1)-Lβ(w*))>0,移项并且令m→+∞,可得

$\begin{array}{l} \sum\limits_{k = + 1}^{ + \infty } {{y^{k + 1}} - {y^k}} \le {y^{\bar k + 1}} - {y^{\bar k}}{\rm{ }} + \\ \frac{\zeta }{\delta }\varphi ({L_\beta }({w^{\bar k + 1}}) - {L_\beta }({w^*})), \end{array}$

所以

$\sum\limits_{k = 0}^{ + \infty } {{y^{k + 1}} - {y^k} < + \infty .} $

由式(2.7)可得

$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {\lambda ^k} < + \infty .} $

另一方面,由式(2.9)、式(2.10)可得

$\begin{align} & \left\| {{x}^{k+1}}-{{x}^{k}} \right\|\le \sqrt{\frac{3P}{{{\beta }^{2}}}}({{\left\| {{\lambda }^{k+1}}-{{\lambda }^{k}} \right\|}^{2}}+{{\left\| {{\lambda }^{k}}-{{\lambda }^{k-1}} \right\|}^{2}}+ \\ & {{\beta }^{2}}{{\left\| B({{y}^{k+1}}-{{y}^{k}}) \right\|}^{2}}{{)}^{\frac{1}{2}}}\le \sqrt{\frac{3P}{{{\beta }^{2}}}} \\ & ({{\lambda }^{k+1}}->{{\lambda }^{k}}+{{\lambda }^{k}}-{{\lambda }^{k-1}}+\beta B({{y}^{k+1}}-{{y}^{k}})), \\ \end{align}$

故$\sum\limits_{k = 1}^{ + \infty } {{x^{k + 1}} - {x^k} < + \infty .} $.又因为

$\begin{array}{l} {w^{k + 1}} - {w^k} = \left( {{x^{k + 1}} - {x^k}{^2} + {y^{k + 1}} - } \right.\\ {\left. {{y^k}{^2} + {\lambda ^{k + 1}} - {\lambda ^k}{^2}} \right)^{\frac{1}{2}}} \le {x^{k + 1}} - {x^k} + \\ {y^{k + 1}} - {y^k} + {\lambda ^{k + 1}} - {\lambda ^k}, \end{array}$

所以

$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k} < + \infty .} $

进一步可知序列wk是Cauchy序列(参见文献[11]),所以序列{wk}收敛,定理得证.

3 结论

本文针对非凸两分块优化问题,在不要求矩阵A列满秩,B不一定是单位阵,在Lβ(w)满足Kurdyka-Lojasiewicz性质且罚参数大于某个常数的条件下,证明了非凸问题ADMM的收敛性.

参考文献
[1]
GLOWINSKI R, MARROCO A. Sur l’approximation,par elements finis d’ordre un,et la resolution,par penalisation-dualité,d’une classe de problèms de Dirichlet nonlineares[J]. Revue Francaise d’Automatique,Informatique et Recherche Opérationelle,Series R, 1975, 31(5/6): 41-76.
[2]
GABAY D, MERCIER B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation[J]. Computers &Mathematics with Applications, 1976, 2(1): 17-40.
[3]
BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends in Machine Learning, 2011, 3(1): 1-122.
[4]
YANG L, PONG T K, CHEN X J. Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction[J]. Mathematics, 2016.
[5]
HONG M Y, LUO Z Q, RAZAVIYAYN M. Convergen-ce analysis of alternating direction method of multipliers for a family of nonconvex problems[J]. SIAM Journal on Optimization, 2014, 26(1): 337-364.
[6]
WANG Y,YIN W T,ZENG J S.Global Convergence of ADMM in Nonconvex Nonsmooth Optimization[R].UCLA CAM Report 15-62,2015.
[7]
LI G Y, PONG T K. Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems[J]. Mathematical Programming, 2016, 159(1/2): 374-401.
[8]
WANG F H, XU Z B, XU H K. Convergence of multi-block Bregman ADMM for nonconvex composite problems[J]. Mathematics, 2015, 1-25.
[9]
GUO K, HAND R, WUT T. Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints[J]. International Journal of Computer Mathematics, 2016, 1-17.
[10]
BOLTE J, DANIILIDIS A, LEY O, et al. Characterizations of Lojasiewicz inequalities and applications[J]. arXiv:0802.0826vl, 2008, 1-48.
[11]
ATTOUCH H, BOLTE J, REDONT P, et al. A Proximal alternating minimization and projection methods for nonconvex problems:An approach based on the Kurdyka-Lojasiewicz inequality[J]. Mathematics of Operations Research, 2010, 35(2): 438-457. DOI:10.1287/moor.1100.0449
[12]
BOLTE J, SABACH S, TEBOULLE M. Proximal alternating linearized minimization for nonconvex and nonsmooth problem[J]. Mathematical Programming, 2013, 146(1/2): 459-494.