RN-Nonlinear Optimization:Basics

诸葛煜
2023-12-01

1 Extremum Point of Multivariate Function

For a second order differentiable univariate function f ( x ) f(x) f(x), the conditions for the existence of extremum points are,

  • Necessary Condition
    f ′ ( x ) = 0 f'(x) = 0 f(x)=0
  • Sufficient Condition
    f ′ ′ ( x ) > 0   (local   minima) f ′ ′ ( x ) < 0   (local   maxima) f''(x) > 0 \ \textit{(local minima)} \\ f''(x) < 0 \ \textit{(local maxima)} f(x)>0 (local minima)f(x)<0 (local maxima)

Similarly, for a second order differentiable multivariate function f ( x ) f(\bm{x}) f(x), the conditions for the existence of extremum points are,

  • Necessary Condition
    ∇ f ( x ) = 0 \nabla f(\bm{x}) = 0 f(x)=0
  • Sufficient Condition
    ∇ 2 f ( x ) ≻ 0   local   minima ∇ 2 f ( x ) ≺ 0   local   maxima \nabla^2 f(\bm{x}) \succ 0 \ \textit{local minima} \\ \nabla^2 f(\bm{x}) \prec 0 \ \textit{local maxima} 2f(x)0 local minima2f(x)0 local maxima

In summary, for a second order differentiable function f ( x ) f(\bm{x}) f(x), the conditions for the existence of local minima(maxima) are the gradient ∇ f ( x ) \nabla f(\bm{x}) f(x) equals to 0 \bm{0} 0 and the Hessian matrix ∇ 2 f ( x ) \nabla^2 f(\bm{x}) 2f(x) is positive(negative) definite.

2 Convex Function

In addition to using the definition to determine whether a function is convex, we can also use the following two necessary and sufficient conditions:

  • First Order Condition: Assume R c \mathbb{R}_c Rc is a open convex set on Euclidean space E n E_n En, and f ( x ) f(\bm{x}) f(x) is differentiable. For any two points x 1 ∈ R c \bm{x}_1 \in \mathbb{R}_c x1Rc and x 2 ∈ R c \bm{x}_2 \in \mathbb{R}_c x2Rc
    f ( x 2 ) ≥ f ( x 1 ) + ∇ f ( x 1 ) T ( x 2 − x 1 ) f(\bm{x_2}) \geq f(\bm{x_1}) + \nabla f(\bm{x}_1)^T(\bm{x}_2 - \bm{x}_1) f(x2)f(x1)+f(x1)T(x2x1)
    Then f ( x ) f(\bm{x}) f(x) is a convex function. And if
    f ( x 2 ) > f ( x 1 ) + ∇ f ( x 1 ) T ( x 2 − x 1 ) f(\bm{x_2}) > f(\bm{x_1}) + \nabla f(\bm{x}_1)^T(\bm{x}_2 - \bm{x}_1) f(x2)>f(x1)+f(x1)T(x2x1)
    Then f ( x ) f(\bm{x}) f(x) is a strictly convex function.

  • Second Order Condition: Assume R c \mathbb{R}_c Rc is a open convex set on Euclidean space E n E_n En, and f ( x ) f(\bm{x}) f(x) is second order differentiable, if its Hessian matrix is semi positive definite
    ∇ 2 f ( x ) ⪰ 0 \nabla^2 f(\bm{x}) \succeq 0 2f(x)0
    then f ( x ) f(\bm{x}) f(x) is a convex function. And if
    ∇ 2 f ( x ) ≻ 0 \nabla^2 f(\bm{x}) \succ 0 2f(x)0
    then f ( x ) f(\bm{x}) f(x) is a strictly convex function.

3 Karush-Kuhn-Tucker Conditions

Consider the following nonlinear optimization problem,
(NLP) min ⁡ f ( x ) s . t . g i ( x ) ≤ 0 , ∀ i = 1 , . . . , m h j ( x ) = 0 , ∀ i = 1 , . . . , p \begin{aligned} \textit{(NLP)}\min \quad & f(\bm{x}) \\ s.t. \quad & g_i(\bm{x}) \leq 0, \forall i = 1, ..., m \\ & h_j(\bm{x}) = 0, \forall i = 1, ..., p \end{aligned} (NLP)mins.t.f(x)gi(x)0,i=1,...,mhj(x)=0,i=1,...,p
in which f ( x ) , g ( x ) f(\bm{x}), g(\bm{x}) f(x),g(x) and h ( x ) h(\bm{x}) h(x) are differentiable, the functions g ( x ) g(\bm{x}) g(x) and h ( x ) h(\bm{x}) h(x) are assumed to satisfy certain regularity conditions also known as constraint qualifications (e.g. the functions g ( x ) g(\bm{x}) g(x) and h ( x ) h(\bm{x}) h(x) should be affine functions).

The Lagrange function is,
L ( x , λ , μ ) = f ( x ) + λ T g ( x ) + μ T h ( x ) \mathcal{L}(\bm{x}, \bm{\lambda}, \bm{\mu}) = f(\bm{x}) + \bm{\lambda}^T g(\bm{x}) + \bm{\mu}^Th(\bm{x}) L(x,λ,μ)=f(x)+λTg(x)+μTh(x)

The Lagrangian dual function is,
d ( λ , μ ) = inf ⁡ x L ( x , λ , μ ) d(\bm{\lambda}, \bm{\mu}) = \inf_{\bm{x}} \mathcal{L}(\bm{x}, \bm{\lambda}, \bm{\mu}) d(λ,μ)=xinfL(x,λ,μ)

The the Lagrangian dual problem is,
max ⁡ λ , μ d ( λ , μ ) \max_{\bm{\lambda}, \bm{\mu}} d(\bm{\lambda}, \bm{\mu}) λ,μmaxd(λ,μ)

The Karush-Kuhn-Tucker conditions can be described as:

  • Stationarity
    ∇ f ( x ) + λ T g ( x ) + μ T h ( x ) = 0 \nabla f(\bm{x}) + \bm{\lambda}^Tg(\bm{x}) + \bm{\mu}^T h(\bm{x}) = \bm{0} f(x)+λTg(x)+μTh(x)=0
  • Primal Feasibility
    g ( x ) ≤ 0 , h ( x ) = 0 g(\bm{x}) \leq \bm{0}, h(\bm{x}) = \bm{0} g(x)0,h(x)=0
  • Dual Feasibility
    λ ≥ 0 \bm{\lambda} \geq \bm{0} λ0
  • Complementary Slackness
    λ T g ( x ) = 0 \bm{\lambda}^T g(\bm{x}) = \bm{0} λTg(x)=0

KKT conditions are necessary conditions for optimality. If f ( x ) f(\bm{x}) f(x) is a convex function, and constraint functions g ( x ) g(\bm{x}) g(x) and h ( x ) h(\bm{x}) h(x) satisfies Slater’s condition, then strong duality holds, KKT conditions are sufficient and necessary.

4 Conclusion

For unconstrained convex minimization problem, in a few special cases, we can find a solution through analytically solving the optimality equation( ∇ f ( x ) = 0 \nabla f(\bm{x}) = \bm{0} f(x)=0), but usually the problem must be solved by an iterative algorithm, such as Newton’s method, gradient descent method, steepest descent method.

For solving constrained convex minimization problem, one famous interior point method is barrier method. Interior point methods solve the KKT conditions by applying Newton’s method to a sequence of equality constrained problems, or to a sequence of modified versions of the KKT conditions.

The following paragraph is from Prof. Boyd’s book, Convex Optimization,

We can view interior-point methods as another level in the hierarchy of convex optimization algorithms.

  • Linear equality constrained quadratic problems are the simplest. For these problems the KKT conditions are a set of linear equations, which can be solved analytically.
  • Newton’s method is the next level in the hierarchy. We can think of Newton’s method as a technique for solving a linear equality constrained optimization problem, with twice differentiable objective, by reducing it to a sequence of linear equality constrained quadratic problems.
  • Interior-point methods form the next level in the hierarchy: They solve an optimization problem with linear equality and inequality constraints by reducing it to a sequence of linear equality constrained problems.

References

[1] Boyd, Stephen, Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

 类似资料: