2. 玉林师范学院数学与统计学院, 广西 玉林 537000
2. School of Mathematics and Statistics,Yulin Normal University,Yulin,Guangxi,537000,China
【研究意义】乘子交替方向法(ADMM)求解大规模分布式计算问题十分有效.ADMM既能分散地收集和存储这些数据集,又能在并行和分布式的环境下求解这些问题.ADMM适合求解分布式凸优化问题,尤其适用于出现在统计学、机器学习和相关领域中的大规模问题,其重要性日益凸显.【前人研究进展】ADMM的思想最早起源于20世纪50年代,算法在20世纪70年代中期由Glowinski和Marrocco[1],以及Gabay和Mercier[2]首次提出.20世纪80年代,乘子交替方向法的研究和应用非常广泛,直到20世纪90年代中期,乘子交替方向法求解凸优化问题的很多理论结果,已经得到很好的证明.传统ADMM是求解凸两分块问题十分有效的方法[1-2],其直接应用到问题(0.5)的迭代框架如下
$\left\{ {\begin{array}{*{20}{l}} {{x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\},}\\ {{y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\},}\\ {{\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}}} \right),} \end{array}} \right.$ | (0.1) |
其中Lβ(x,y,λ)为问题(0.5)的增广拉格朗日函数,定义如下
$\begin{array}{l} {L_\beta }\left( {x,y,\lambda } \right) = f\left( x \right) + g\left( y \right) - {\lambda ^{\rm{T}}}\left( {Ax + By} \right){\rm{ + }}\\ \frac{\beta }{2}{\left\| {Ax + By} \right\|^2}. \end{array}$ | (0.2) |
在凸情形下,ADMM的收敛性已被充分认识[3].非凸问题ADMM的收敛性分析仅有初步的结果,其研究是当前的热点问题[4-6].文献[7]考虑如下非凸问题
$\begin{array}{*{20}{c}} {\min }&{f\left( x \right) + g\left( y \right)}\\ {{\rm{s}}.{\rm{t}}.}&{Ax = y,} \end{array}$ | (0.3) |
并提出如下改进的ADMM
$\left\{ {\begin{array}{*{20}{l}} {{x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\},}\\ {{y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {y,{y^k}} \right),}\\ {{\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} - {y^{k + 1}}} \right),} \end{array}} \right.$ | (0.4) |
其中${\Delta _\phi }$是关于强凸函数$\phi $的Bregman距离.当$\phi $=0时,算法(0.4)为传统ADMM.在罚参数充分大且目标函数二阶连续可微的条件下,文献[7]证明算法产生点列的聚点是问题(0.3)的稳定点.
文献[8]分析如下Bregman ADMM算法的收敛性
$\left\{ \begin{array}{l} {x^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {x,{y^k},{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {x,{x^k}} \right),\\ {y^{k + 1}} \in \arg \min \left\{ {{L_\beta }\left( {{x^{k + 1}},y,{\lambda ^k}} \right)} \right\} + {\Delta _\phi }\left( {y,{y^k}} \right),\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} - {y^{k + 1}}} \right), \end{array} \right.$ |
【本研究切入点】最近,文献[9]在矩阵A列满秩,增广拉格朗日函数满足KL性质(参见定义1.1)且罚参数大于某个常数的条件下,分析了传统ADMM算法(0.1)求解非凸问题(0.3)的全局收敛性.
我们考虑如下两分块极小化问题
$\begin{array}{*{20}{c}} {\min }&{f\left( x \right) + g\left( y \right)}\\ {{\rm{s}}.{\rm{t}}.}&{Ax + By = 0,} \end{array}$ | (0.5) |
其中函数$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是一个非凸正常下半连续函数,函数$g:{^m} \to $L-Lipschitz可微,即存在L>0使得‖▽ g(x)-▽g(y)‖≤L‖x-y‖,$\forall x,y \in {^n}$),矩阵$A \in {^{m \times n}},B \in {^{m \times n}}$.这类问题广泛出现在实际应用中,如矩阵分解、图像处理、信号处理等[4-5].
【拟解决的关键问题】本文在不要求矩阵A列满秩,B不一定是单位阵,在Lβ(w)满足KL性质且罚参数β大于某个常数的条件下分析了ADMM算法(0.1)求解问题(0.5)的收敛性.
1 预备知识下面,给出本文理论分析所需的一些概念与性质.
λ++(BTB)表示矩阵BTB最小正特征值.$\partial f\left( x \right)$表示函数f在点x处的极限次微分[5],对于任意x∈${^n}$是函数f的极小值点的必要条件是:0∈$\partial f\left( x \right)$,满足这个条件的点称为关键点或稳定点,函数f关键点集记为critf.
定义1.1[10](Kurdyka-Lojasiewicz性质)设函数f$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是正常下半连续函数,对于任意实数η1,η2(η1<η2),令[η1<f<η2]=x∈{${^n}$∶η1<f(x)<η2,设x*∈${\rm{dom}}\partial f$, 若存在η∈(0,+∞],x*的邻域U,以及一个连续的凹函数φ:[0,η)→${_ + }$,满足如下条件
(i) φ(0)=0;
(ii) φ在0处连续,在区间(0,η)上一阶连续可微;
(iii) φ′(s)>0,s∈(0,η);
(iv) 对于任意的x∈U∩[f(x*)<f<f(x*)+η],有如下Kurdyka-Lojasiewicz不等式成立
$\varphi '\left( {f\left( x \right) - f\left( {{x^*}} \right)} \right)d\left( {0,\partial f\left( x \right)} \right) \ge 1.$ |
则称函数f在点x*处满足Kurdyka-Lojasiewicz性质(简称KL性质).
满足上述性质(i)、(ii)、(iii)的函数全体记为Φη.
引理1.1[11](uniformized KL property)设Ω是一个紧集,函数$f:{^n} \to {^n} \cup \left\{ { + \infty } \right\}$是正常下半连续函数.设函数f在集合Ω上取常数,并在Ω中任一点处满足KL性质,则存在$ \in > 0,\eta > 0,\varphi {\rm{ }} \in {\Phi _\eta }$,对于任意的x∈Ω和x∈{x∈${^n}$:d(x,ω)<$ \in $}∩[f(x)<f<f(x)+η],有
$\varphi {\rm{ }}\prime (f\left( x \right) - f(\bar x))d\left( {0,\partial {\rm{ }}f\left( x \right)} \right) \ge 1.$ |
引理1.2[12]若$h:{^n} \to $为L-Lipschitz可微函数,则对任意的x,y∈${^n}$,有
$\left| {h\left( y \right) - h\left( x \right) - < \nabla h\left( x \right),y - x > } \right| \le \frac{L}{2}y - x{^2}.$ |
假设{wk=(xk,yk,λk)}是算法(0.1)产生的有界序列,算法(0.1)的收敛性分析框架如下:第一步,证明{Lβ(wk)}单调递减;第二步,证明$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $;第三步,利用Lβ(·)的KL性质,得出序列{wk}的任一聚点都是问题(0.5)的一个稳定点.
由算法(0.1)中每个子问题的最优性条件,有
$\left\{ \begin{array}{l} 0 \in \partial f\left( {{x^{k + 1}}} \right) - {A^{\rm{T}}}{\lambda ^k} + \beta {A^{\rm{T}}}\left( {A{x^{k + 1}} + B{y^k}} \right),\\ 0 = \nabla g({y^{k + 1}}) - {B^{\rm{T}}}{\lambda ^k} + \beta {B^{\rm{T}}}(A{x^{k + 1}} + B{y^{k + 1}}),\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ | (2.1) |
进一步,可得
$\left\{ \begin{array}{l} {A^{\rm{T}}}{\lambda ^{k + 1}} + \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) \in \partial f\left( {{x^{k + 1}}} \right),\\ \nabla g({y^{k + 1}}) = {B^{\rm{T}}}{\lambda ^{k + 1}},\\ {\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ | (2.2) |
本文的收敛性建立在如下假设条件下.
假设2.1 假设以下条件成立
(i) Im(A)$ \subseteq $Im(B);
(ii) 存在利普希茨系数分别为M>0,P>0的利普希茨连续函数H:Im(B)→${^n}$,F:Im(A)→${^n}$,使得H(u)=arg $\mathop {\min }\limits_y ${g(y):By=u},$F\left( u \right) = \arg \mathop {\min }\limits_x \left\{ {f\left( x \right):Ax = u} \right\}$;
(iii)$\delta \sim = \frac{\beta }{{2{M^2}}} - \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2} > 0$,其中$\mu = \lambda _{{\rm{ + + }}}^{ - \frac{1}{2}}({B^{\rm{T}}}B)$.
首先,证明{Lβ(wk)}k∈N是递减序列.
引理2.1 ${L_\beta }({w^{k + 1}}) \le {L_\beta }({w^k}) - \delta {y^{k + 1}} - {y^k}{^2}$,其中$\delta = \frac{\beta }{{2{M^2}}} - \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2} > 0$.
证明 由增广拉格朗日函数的定义,可得
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) = {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) + < {\lambda ^k} - \\ {\lambda ^{k + 1}},A{x^{k + 1}} + B{y^{k + 1}} > = {L_\beta }({x^{k + 1}},{y^{k{\rm{ }} + 1}},{\lambda ^k}) + \frac{1}{\beta }{\lambda ^k} - \\ {\lambda ^{k + 1}}{^2}. \end{array}$ | (2.3) |
利用yk+1的最优性可得
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}){\rm{ }} = f({x^{k + 1}}) + \\ g({y^k}) - < {\lambda ^k},A{x^{k + 1}} + B{y^k} > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \\ \{ f({x^{k + 1}}) + g({y^{k + 1}}){\rm{ }} - < {\lambda ^k},A{x^{k + 1}} + B{y^{k + 1}} > + \\ \frac{\beta }{2}A{x^{k + 1}} + B{y^{k + 1}}{^2}\} = g({y^k}) - g({y^{k + 1}}) + < {\lambda ^k},\\ B({y^{k + 1}} - {y^k}) > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \frac{\beta }{2}A{x^{k + 1}} + \\ B{y^{k + 1}}{^2}. \end{array}$ | (2.4) |
由引理1.2和式(2.2)中第二式可得
$\begin{array}{l} g({y^k}) - g({y^{k + 1}}) \ge < {\lambda ^{k + 1}},B({y^k} - {y^{k + 1}}) > - \\ \frac{L}{2}{y^k} - {y^{k + 1}}{^2}. \end{array}$ |
把上式代入式(2.4)可得
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) \ge < {\lambda ^{k + 1}} - \\ {\lambda ^k},B({y^k} - {y^{k + 1}}) > {\rm{ }} + \frac{\beta }{2}A{x^{k + 1}} + B{y^k}{^2} - \frac{\beta }{2}A\\ {x^{k + 1}} + B{y^{k + 1}}{^2} - \frac{L}{2}{y^k} - {y^{k + 1}}{^2}, \end{array}$ | (2.5) |
由${\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}})$可得
$A{x^{k + 1}} + B{y^k} = \frac{1}{\beta }({\lambda ^k} - {\lambda ^{k + 1}}) + B({y^k} - {y^{k + 1}}).$ |
故
$\begin{array}{l} < {\lambda ^{k + 1}} - {\lambda ^k},B\left( {{y^k} - {y^{k + 1}}} \right) > + \frac{{\bar \beta }}{2}A{x^{k + 1}} + \\ B{y^k}{^2} = < {\lambda ^{k + 1}} - {\lambda ^k},B({y^k} - {y^{k + 1}}) > {\rm{ }} + \frac{\beta }{2}\frac{1}{\beta }({\lambda ^k} - \\ {\lambda ^{k + 1}}) + B({y^k} - {y^{k + 1}}){^2} = \frac{\beta }{2}B({y^k} - {y^{k + 1}}){^2} + \frac{1}{{2\beta }}\\ {\lambda ^{k + 1}} - {\lambda ^k}{^2}. \end{array}$ |
把上式代入式(2.5)右端可得
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^k}) - {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}){\rm{ }} \le \frac{L}{2}{y^k} - \\ {y^{k + 1}}{^2} - \frac{\beta }{2}B({y^k} - {y^{k + 1}}){^2}. \end{array}$ | (2.6) |
由${\lambda ^{k + 1}} = {\lambda ^k} - \beta (A{x^{k + 1}} + B{y^{k + 1}})$可得
${\lambda ^{k + 1}} - {\lambda ^k} = - \beta (A{x^{k + 1}} + B{y^{k + 1}}) \in {\mathop{\rm Im}\nolimits} \left( B \right).$ |
$\mu = \lambda _{{\rm{ + + }}}^{ - \frac{1}{2}}({B^{\rm{T}}}B) > 0$,则
$\begin{array}{l} {\lambda ^{k + 1}} - {\lambda ^k} \le \mu B({\lambda ^{k + 1}} - {\lambda ^k}){\rm{ }} = \\ \mu \nabla g({y^{k + 1}}) - \nabla g({y^k}) \le \mu L{y^{k + 1}} - {y^k}. \end{array}$ | (2.7) |
由H的性质可得yk=H(Byk),从而
$\begin{array}{l} {y^{k + 1}} - {y^k} = H(B{y^{k + 1}}) - H(B{y^k}){\rm{ }} \le \\ MB({y^{k + 1}} - {y^k}). \end{array}$ | (2.8) |
结合式(2.6)和引理2.1可得
$\begin{array}{l} {L_\beta }\left( {{x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}} \right) \le {L_\beta }\left( {{x^{k + 1}},{y^k},{\lambda ^k}} \right){\rm{ }} + \\ \frac{1}{\beta }{\lambda ^k} - {\lambda ^{k + 1}}{^2} + \frac{L}{2}{y^k} - {y^{k + 1}}{^2} - \frac{\beta }{2}B({y^k} - \\ {y^{k + 1}}){^2}. \end{array}$ |
结合式(2.7)、式(2.8)有
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) \le {L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}){\rm{ }} - (\frac{\beta }{{2{M^2}}} - \\ \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2}){y^{k + 1}} - {y^k}{^2}. \end{array}$ |
利用xk+1的最优性可得
$\begin{array}{l} {L_\beta }({x^{k + 1}},{y^{k + 1}},{\lambda ^{k + 1}}) \le {L_\beta }({x^k},{y^k},{\lambda ^k}){\rm{ }} - (\frac{\beta }{{2{M^2}}} - \\ \frac{{{\mu ^2}{L^2}}}{\beta } - \frac{L}{2}){y^{k + 1}} - {y^k}{^2}. \end{array}$ |
引理2.2 $\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $.
证明 由于序列{wk=(xk,yk,λk)}有界,则存在收敛子列{wkj},设$\mathop {\lim }\limits_{{k_j} = + \infty } {w^{{k_j}}} \to {w^*}$. 由f下半连续及g连续,可知Lβ(w)下半连续.从而
${L_\beta }({w^*}) \le \mathop {\lim \inf }\limits_{j \to + \infty } {L_\beta }({w^{{k_j}}}).$ |
故序列{Lβ(wkj)}有下界,又序列{Lβ(wk)}k∈N单调递减,所以Lβ(wkj)收敛并且Lβ(wk)≥Lβ(w*).由引理2.1可得
$\begin{array}{l} \sum\limits_{k = 0}^n \delta {y^{k + 1}} - {y^k}{^2} \le {L_\beta }({w^0}) - {L_\beta }({w^{n + 1}}){\rm{ }} \le \\ {L_\beta }({w^0}) - {L_\beta }({w^*}) < + \infty , \end{array}$ |
由δ>0及n的任意性可得
$\sum\limits_{k = 0}^{ + \infty } {{y^{k + 1}} - {y^k}{^2} < + \infty .} $ |
上式结合式(2.7)可得
$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {\lambda ^k}{^2} < + \infty .} $ |
接下来只需证明:$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {x^k}{^2} < + \infty .} $由ADMM算法(0.1)第三式有
$\left\{ \begin{array}{l} {\lambda ^{k + 1}} = {\lambda ^k} - \beta \left( {A{x^{k + 1}} + B{y^{k + 1}}} \right),\\ {\lambda ^k} = {\lambda ^{k - 1}} - \beta (A{x^k} + B{y^k}). \end{array} \right.$ |
两式相减,可得
$\begin{array}{l} {\lambda ^{k + 1}} - {\lambda ^k} = \left( {{\lambda ^k} - {\lambda ^{k - 1}}} \right){\rm{ }} + \beta \left( {A{x^k} - A{x^{k + 1}}} \right) + \\ \beta \left( {B{y^k} - B{y^{k + 1}}} \right). \end{array}$ |
利用不等式(a+b+c)2≤3(a2+b2+c2)可得
$\begin{array}{l} \beta (A{x^k} - A{x^{k + 1}}){^2} \le 3({\lambda ^{k + 1}} - {\lambda ^k}{^2} + \\ {\lambda ^k} - {\lambda ^{k - 1}}{^2} + {\beta ^{2}}B({y^{k + 1}} - {y^k}){^2}). \end{array}$ | (2.9) |
由F的性质可得xk=F(Axk),进而
$\beta (A{x^k} - A{x^{k + 1}}){^2} \ge \frac{{{\beta ^2}}}{P}{x^k} - {x^{k + 1}}{^2}.$ | (2.10) |
由式(2.9)和式(2.10)可得
$\sum\limits_{k = 0}^{ + \infty } {{x^{k + 1}} - {x^k}{^2} < + \infty .} $ |
故$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k}{^2} < + \infty } $.
引理2.3 存在ζ>0,对于任意k有
$d(0,\partial {L_\beta }({w^{k + 1}})) \le \zeta {y^{k + 1}} - {y^k}.$ |
证明 由增广拉格朗日函数定义,可得
$\left\{ \begin{array}{l} \begin{array}{*{20}{l}} {{\partial _x}{L_\beta }\left( {{w^{k + 1}}} \right) = \partial f\left( {{x^{k + 1}}} \right) - {A^{\rm{T}}}{\lambda ^{k + 1}} + \beta {A^{\rm{T}}}(A{x^{k + 1}} + }\\ {B{y^{k + 1}}),} \end{array}\\ {\partial _y}{L_\beta }({w^{k + 1}}) = \nabla g({y^{k + 1}}) - {B^{\rm{T}}}{\lambda ^{k + 1}} + \beta {B^{\rm{T}}}(A{x^{k + 1}} + \\ B{y^{k + 1}})\\ {\partial _\lambda }{L_\beta }({w^{k + 1}}) = - (A{x^{k + 1}} + B{y^{k + 1}}). \end{array} \right.$ | (2.11) |
进一步结合式(2.2)可得
$\left\{ \begin{array}{l} \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) + {A^{\rm{T}}}\left( {{\lambda ^k} - {\lambda ^{k + 1}}} \right){\rm{ }} \in {\partial _x}{L_\beta }\left( {{w^{k + 1}}} \right)\\ {B^{\rm{T}}}({\lambda ^k} - {\lambda ^{k + 1}}) \in {\partial _y}{L_\beta }({w^{k + 1}}),\\ \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k}) \in {\partial _\lambda }{L_\beta }({w^{k + 1}}). \end{array} \right.$ | (2.12) |
令x${x^*}_{k + 1} = \beta {A^{\rm{T}}}B\left( {{y^{k + 1}} - {y^k}} \right) + {A^{\rm{T}}}\left( {{\lambda ^k} - {\lambda ^{k + 1}}} \right)$,${y^*}_{k + 1} = {B^{\rm{T}}}({\lambda ^k} - {\lambda ^{k + 1}}),{\lambda ^*}_{k + 1} = \frac{1}{\beta }({\lambda ^{k + 1}} - {\lambda ^k})$,则
$({x^*}_{k + 1},{y^*}_{k + 1},{\lambda ^*}_{k + 1}) \in \partial {L_\beta }({w^{k + 1}})$且存在ζ1,ζ2>0,使得
$\begin{array}{l} ({x^*}_{k + 1},{y^*}_{k + 1},{\lambda ^*}_{k + 1}) \le {\zeta _1}{y^{k + 1}} - {y^k} + \\ {\zeta _2}{\lambda ^{k + 1}} - {\lambda ^k}. \end{array}$ |
令ζ:=ζ1+μLζ2,结合式(2.7)可得
$d(0,\partial {L_\beta }({w^{k + 1}})) \le \zeta {y^{k + 1}} - {y^k}.$ |
引理2.4 设序列{wk}的全体极限点记为S(w0),则
(i) S(w0)是一个非空紧集,并且
d(wk,S(w0))→0,k→+∞;
(ii) $S({w^0}) \subset {\rm{crit}}{L_\beta }$;
(iii) Lβ(·)在S(w0)上取有限值且为常数,且
$\mathop {\inf }\limits_{k \in N} {L_\beta }({w^k}) = \mathop {\lim }\limits_{k \to + \infty } {L_\beta }({w^k}).$ |
证明 (i) 式由S(w0)的定义直接可得.
(ii) 设(x*,y*,λ*)∈S(w0),则存在子列{(xkj,ykj,λkj)}使得(xkj,ykj,λkj)→(x*,y*,λ*),(kj→+∞).由xk+1的最优性有
${L_\beta }({x^{k + 1}},{y^k},{\lambda ^k}) \le {L_\beta }({x^*},{y^k},{\lambda ^k}).$ |
由引理2.2可知‖wk+1-wk‖→0,从而(xkj+1,ykj+1,λkj+1)→(x*,y*,λ*).又Lβ(·)关于y,λ连续,则有
$\begin{array}{l} \mathop {\lim }\limits_{j \to + \infty } \sup {L_\beta }({x^{{k_j} + 1}},{y^{{k_j}}},{\lambda ^{{k_j}}}){\rm{ }} = \mathop {\lim }\limits_{j \to + \infty } \sup {L_\beta }({x^{{k_j} + 1}},\\ {y^{{k_j} + 1}},{\lambda ^{{k_j} + 1}}){\rm{ }} \le {L_\beta }({x^*},{y^*},{\lambda ^*}). \end{array}$ |
由函数Lβ(·)的下半连续性,有
$\mathop {\lim \inf }\limits_{j \to + \infty {\rm{ }}} {L_\beta }({x^{{k_j}}} + 1,{y^{{k_j}}} + 1,{\lambda ^{{k_j}}} + 1) \ge {L_\beta }({x^*},{y^*},{\lambda ^*}).$ |
由上式及式(2.13)可得$\mathop {\lim }\limits_{j \to + \infty } f({x^{{k_j} + 1}}) = f({x^*})$.结合▽g的连续性和$\partial f$的闭性,在式(2.2)中对序列$({x^{{k_j} + 1}},{y^{{k_j} + 1}},{\lambda ^{{k_j} + 1}})$取极限,可得
$\left\{ \begin{array}{l} {A^{\rm{T}}}{\lambda ^*} \in \partial f\left( {{x^*}} \right),\\ \nabla g({y^*}) = {B^{\rm{T}}}{\lambda ^*},\\ A{x^*} + B{y^*} = 0. \end{array} \right.$ |
即${w^*} \subset {\rm{crit}}{L_\beta }$.
(iii) 对于任意点(x*,y*,λ*)∈S(w0),存在子列(xkj,ykj,λkj)→(x*,y*,λ*).结合Lβ(xkj+1)收敛,以及{Lβ(wk)}k∈N单调递减,可得
$\mathop {\lim }\limits_{k \to + \infty } {L_\beta }({x^k},{y^k},{\lambda ^k}) = {L_\beta }({x^*},{y^*},{\lambda ^*}).$ |
因此,函数Lβ(·)在集合S(w0)上是常数,并且有$\mathop {\inf }\limits_{k \in N} {\rm{ }}{L_\beta }({w^k}) = \mathop {\lim }\limits_{k \to + \infty } {L_\beta }({w^k}).$.
最后,给出非凸问题(0.5)的ADMM算法(0.1)的收敛性分析.
定理2.1 若Lβ(w)满足KL性质,则
(i) $\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k} < + \infty ;} $;
(ii) 序列{wk}收敛到函数Lβ(·)的一个关键点.
证明 由引理2.4知$\mathop {\lim }\limits_{k \to + \infty } {L_\beta }\left( {{w^k}} \right) = {L_\beta }({w^*})(\forall {w^*} \in S({w^0}))$成立.考虑如下两种情形:
(i) 存在整数k0使得Lβ(wk0)=Lβ(w*)成立,由引理2.1可知,对于任意的k>k0,有
$\begin{array}{l} {y^{k + 1}} - {y^k}{^2} \le {L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^{k + 1}}} \right){\rm{ }} \le \\ {L_\beta }({w^{{k_0}}}) - {L_\beta }({w^*}) = 0, \end{array}$ |
因此,对于任意的k>k0,有yk+1=yk.结合式(2.7)、式(2.9)、式(2.10)可知,对于任意的k
(ii) 对任意的k均有Lβ(wk)>Lβ(w*)成立.由d(wk,S(w0))→0可知,对于任意给定的ε>0,存在k1>0,当k>k1时,有d(wk,S(w0))<ε.又根据Lβ(wk)→Lβ(w*)可知,对于任意给定的η>0,存在k2>0,当k>k2时,有Lβ(wk)<Lβ(w*)+η.从而当$k > \tilde k = \max \{ {k_1},{k_2}\} $时,有
$\begin{array}{l} d\left( {{w^k},S\left( {{w^0}} \right)} \right) < \varepsilon ,\\ {L_\beta }\left( {{w^*}} \right)\beta \left( {{w^k}} \right)\beta \left( {{w^*}} \right) + \eta . \end{array}$ |
由于S(w0)是非空紧集,函数Lβ(·)在集合S(w0)上是常数,由引理1.1得
$\begin{array}{l} \varphi \prime \left( {{L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^*}} \right)} \right)d\left( {0,\partial {L_\beta }\left( {{w^k}} \right)} \right) \ge \\ 1\left( {\forall k > \tilde k} \right). \end{array}$ |
由函数φ的凹性,可得
$\begin{array}{l} \varphi \left( {{L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^*}} \right)} \right) - \varphi ({L_\beta }\left( {{w^{k + 1}}} \right) - \\ {L_\beta }({w^*})){\rm{ }} \ge \varphi \prime ({L_\beta }({w^k}) - {L_\beta }({w^*}))({L_\beta }({w^k}) - \\ {L_\beta }({w^{k + 1}})). \end{array}$ |
由引理2.3及φ′(Lβ(wk)-Lβ(w*))>0,可得
$\begin{array}{l} {L_\beta }\left( {{w^k}} \right) - {L_\beta }\left( {{w^{k + 1}}} \right) \le \\ \frac{{\varphi ({L_\beta }({w^k}) - {L_\beta }({w^*})) - {\rm{ }}\varphi ({L_\beta }({w^{k + 1}}) - {L_\beta }({w^*}))}}{{\varphi \prime ({L_\beta }({w^k}) - {L_\beta }({w^*}))}} \le \\ \zeta {y^k} - {y^{k - 1}}[\varphi ({L_\beta }({w^k}) - {L_\beta }({w^*})) - \\ \varphi ({L_\beta }({w^{k + 1}}) - {L_\beta }({w^*}))]. \end{array}$ |
令Δp,q=φ(Lβ(wp)-Lβ(w*))-φ(Lβ(wq)-Lβ(w*)).由引理2.1可得,对于任意的k>k有
$\delta {y^{k + 1}} - {y^k}{^2} \le \zeta {y^k} - {y^{k - 1}}{\Delta _{k,k + 1}},$ |
即
${y^{k + 1}} - {y^k} \le \sqrt {\frac{\zeta }{\delta }{\Delta _{k,k + 1}}} {y^k} - {y^{k - 1}}{^{\frac{1}{2}}}.$ |
利用不等式$2\sqrt {\alpha \beta } \le \left( {\alpha + \beta > 0} \right)$可得
$2{y^{k + 1}} - {y^k} \le {y^k} - {y^{k - 1}} + \frac{\zeta }{\delta }{\Delta _{k,k + 1}},$ |
由上式可得
$\begin{array}{l} 2\sum\limits_{k = + 1}^m {{y^{k + 1}} - {y^k}} \le \sum\limits_{k = + 1}^m {{\rm{ }}{y^k} - {y^{k - 1}}} + \\ \frac{\zeta }{\delta }{\Delta _{ + 1,m + 1}}. \end{array}$ |
注意到φ(Lβ(wm+1)-Lβ(w*))>0,移项并且令m→+∞,可得
$\begin{array}{l} \sum\limits_{k = + 1}^{ + \infty } {{y^{k + 1}} - {y^k}} \le {y^{\bar k + 1}} - {y^{\bar k}}{\rm{ }} + \\ \frac{\zeta }{\delta }\varphi ({L_\beta }({w^{\bar k + 1}}) - {L_\beta }({w^*})), \end{array}$ |
所以
$\sum\limits_{k = 0}^{ + \infty } {{y^{k + 1}} - {y^k} < + \infty .} $ |
由式(2.7)可得
$\sum\limits_{k = 0}^{ + \infty } {{\lambda ^{k + 1}} - {\lambda ^k} < + \infty .} $ |
另一方面,由式(2.9)、式(2.10)可得
$\begin{align} & \left\| {{x}^{k+1}}-{{x}^{k}} \right\|\le \sqrt{\frac{3P}{{{\beta }^{2}}}}({{\left\| {{\lambda }^{k+1}}-{{\lambda }^{k}} \right\|}^{2}}+{{\left\| {{\lambda }^{k}}-{{\lambda }^{k-1}} \right\|}^{2}}+ \\ & {{\beta }^{2}}{{\left\| B({{y}^{k+1}}-{{y}^{k}}) \right\|}^{2}}{{)}^{\frac{1}{2}}}\le \sqrt{\frac{3P}{{{\beta }^{2}}}} \\ & ({{\lambda }^{k+1}}->{{\lambda }^{k}}+{{\lambda }^{k}}-{{\lambda }^{k-1}}+\beta B({{y}^{k+1}}-{{y}^{k}})), \\ \end{align}$ |
故$\sum\limits_{k = 1}^{ + \infty } {{x^{k + 1}} - {x^k} < + \infty .} $.又因为
$\begin{array}{l} {w^{k + 1}} - {w^k} = \left( {{x^{k + 1}} - {x^k}{^2} + {y^{k + 1}} - } \right.\\ {\left. {{y^k}{^2} + {\lambda ^{k + 1}} - {\lambda ^k}{^2}} \right)^{\frac{1}{2}}} \le {x^{k + 1}} - {x^k} + \\ {y^{k + 1}} - {y^k} + {\lambda ^{k + 1}} - {\lambda ^k}, \end{array}$ |
所以
$\sum\limits_{k = 0}^{ + \infty } {{w^{k + 1}} - {w^k} < + \infty .} $ |
进一步可知序列wk是Cauchy序列(参见文献[11]),所以序列{wk}收敛,定理得证.
3 结论本文针对非凸两分块优化问题,在不要求矩阵A列满秩,B不一定是单位阵,在Lβ(w)满足Kurdyka-Lojasiewicz性质且罚参数大于某个常数的条件下,证明了非凸问题ADMM的收敛性.
[1] |
GLOWINSKI R, MARROCO A. Sur l’approximation,par elements finis d’ordre un,et la resolution,par penalisation-dualité,d’une classe de problèms de Dirichlet nonlineares[J]. Revue Francaise d’Automatique,Informatique et Recherche Opérationelle,Series R, 1975, 31(5/6): 41-76. |
[2] |
GABAY D, MERCIER B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation[J]. Computers &Mathematics with Applications, 1976, 2(1): 17-40. |
[3] |
BOYD S, PARIKH N, CHU E, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers[J]. Foundations and Trends in Machine Learning, 2011, 3(1): 1-122. |
[4] |
YANG L, PONG T K, CHEN X J. Alternating direction method of multipliers for a class of nonconvex and nonsmooth problems with applications to background/foreground extraction[J]. Mathematics, 2016. |
[5] |
HONG M Y, LUO Z Q, RAZAVIYAYN M. Convergen-ce analysis of alternating direction method of multipliers for a family of nonconvex problems[J]. SIAM Journal on Optimization, 2014, 26(1): 337-364. |
[6] |
WANG Y,YIN W T,ZENG J S.Global Convergence of ADMM in Nonconvex Nonsmooth Optimization[R].UCLA CAM Report 15-62,2015.
|
[7] |
LI G Y, PONG T K. Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems[J]. Mathematical Programming, 2016, 159(1/2): 374-401. |
[8] |
WANG F H, XU Z B, XU H K. Convergence of multi-block Bregman ADMM for nonconvex composite problems[J]. Mathematics, 2015, 1-25. |
[9] |
GUO K, HAND R, WUT T. Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints[J]. International Journal of Computer Mathematics, 2016, 1-17. |
[10] |
BOLTE J, DANIILIDIS A, LEY O, et al. Characterizations of Lojasiewicz inequalities and applications[J]. arXiv:0802.0826vl, 2008, 1-48. |
[11] |
ATTOUCH H, BOLTE J, REDONT P, et al. A Proximal alternating minimization and projection methods for nonconvex problems:An approach based on the Kurdyka-Lojasiewicz inequality[J]. Mathematics of Operations Research, 2010, 35(2): 438-457. DOI:10.1287/moor.1100.0449 |
[12] |
BOLTE J, SABACH S, TEBOULLE M. Proximal alternating linearized minimization for nonconvex and nonsmooth problem[J]. Mathematical Programming, 2013, 146(1/2): 459-494. |