[Lecture Note 4] Unconstrained Optimization

Form

\[\min_{x \in \mathbb{R}^n} f(x), \;\; \textit{for}\;\; f: \mathbb{R}^n \rightarrow \mathbb{R}\]

where $f$ is continuosly differentiable.

Definition : Level set

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$. For given $\alpha \in \mathbb{R}$, we say the set $L_{\alpha} \subset \mathbb{R}^n$ defined by

\[L_{\alpha} = \{ x \in \mathbb{R}^n | f(x) \leq \alpha \}\]

is called “a level set of $f$”

Properties

  1. If $\alpha_1 > \alpha_2$ then $L_{\alpha_1} \supset L_{\alpha_2}$
  2. If $\hat{\alpha} = \min f(x)$, then $L_{\hat{\alpha}}$ is the solution set
  3. An algorithm solving $\min f(x)$ is called a descent method.

If the seq ${ x_i}_{i=1}^{\infty}$ generated by algorithm, satisfy that

\[f(x_{i+1}) < f(x_i)\;\; \forall i \in \mathbb{N}\]

Definition : Tangent Set

The level set of tangent to the boundary of $L_{\alpha}$, i.e. $\partial L_{\alpha}$, at a point $\hat{x} \in \partial L_{\alpha}$ is defined by.

\[T(\hat{x}, \partial L_{\alpha}) \triangleq \{ y \in \mathbb{R}^n | \beta y = \lim \frac{x_i - \hat{x}}{\left\| x_i - \hat{x} \right\|}, \;\; \overset{\beta > 0}{x_i \in}\partial L_{\alpha}\;\; \forall i \}\]

아래 그림과 같은 불연속의 경우 $T(\hat{x}, \partial L_{\alpha})$는 아래 위로 두 원소를 가진다.

Fig18

Definition : Normal Set

The set of normals to the boundary of $L_{\alpha}$ at a point $\hat{x} \in \partial L_{\alpha}$ is defined by

\[N(\hat{x}, \partial L_{\alpha}) \triangleq \{ \nu \in \mathbb{R}^n | \langle \nu, y \rangle = 0, \;\; y \in T(\hat{x}, \partial L_{\alpha})\}\]

We say that $\nu$ is an outward normal at $\hat{x} \in \partial L_{\alpha}$ If $f(\hat{x} + \lambda \nu) > f(\hat{x})$ for sufficiently small $\lambda > 0$.

Fig19

Theorem U-1 : Mean Value Theorem

For $f \in \mathbb{C}’, \;\; f:\mathbb{R}^n \rightarrow \mathbb{R}$

\[f(x)=f(y), \;\; \exists s \in [0,1]\;\;\textit{such that}\;\; f(y)-f(x) = \langle \nabla f(x + s(y-x)), y-x \rangle\]

($f(x)=f(y)$ 일 경우는 밑의 그림과 같다.) Fig20

Property U-1

Suppse that $x \in \mathbb{R}^n$ such that $f(x) = \alpha$ If a vector $\nabla f(x) \neq 0$, then $\nabla f(x)$ is outward normal to the boundary $\partial L_{\alpha}$ at $x$.

증명을 위해 필요한 부분

  1. $\nabla f(x) \in \mathbb{N}(x, \partial L_{\alpha})$
  2. $f(x + \lambda \nabla f(x)) > f(x)$ for some sufficient small $\lambda \geq 0$

Proof of Property U-1

Proof of 1

Pick any sequence ${ \Delta x_i }{i=1}^{\infty} $ such that $x + \Delta x_i \in \partial L{\alpha}$ and $\Delta x_i \rightarrow 0, \;\; \forall i \in \mathbb{N}$ (i.e. $f(x+\Delta x_i) = \alpha$ ) <- 수식을 해석해 보면 딱 이 의미임.

For each $i \in \mathbb{N}$ by Mean-Value Theorem,

\[\exists s_i \in [0,1] \;\; \textit{such that}\;\; f(x+\Delta x_i) = f(x) + \langle \nabla f(x+s_i \Delta x_i), \Delta x_i \rangle\]

implies that

\[\langle \nabla f(x+s_i \Delta x_i), \Delta x_i \rangle = 0, \;\; i \in \mathbb{N} \Rightarrow \langle \nabla f(x+s_i \Delta x_i), \frac{\Delta x_i}{\left\| \Delta x_i \right\|} \rangle = 0\]

Let $y_i = \frac{\Delta x_i}{\left| \Delta x_i \right|}$, then $\exists y$ such that $y_i \rightarrow y$. (It means that $\left| y_i \right| = 1 \Rightarrow B^o(0,1)$: Closed and Bounded. Compact!!)

\[\begin{align} 0 &= \lim_{i \rightarrow \infty} \langle \nabla f(x + s_i \Delta x_i), \frac{\Delta x_i}{\left\| \Delta x_i \right\|} \rangle \\ &= \langle \nabla f(x + s_i \Delta x_i), \lim_{i \rightarrow \infty} \frac{\Delta x_i}{\left\| \Delta x_i \right\|} \rangle \\ &= \langle \nabla f(x + s_i \Delta x_i), \lim_{i \rightarrow \infty} y_i \rangle \\ &= \langle \nabla f(x + s_i \Delta x_i), y \rangle \\ \end{align}\]

By the definition of $T(x, \partial L_{\alpha})$

\[y \in T(x, \partial L_{\alpha})\]

Therefore, $\nabla f(x) \in \mathbb{N}(x, \partial L_{\alpha})$ (Tangent Space에 있는 $y$와 Inner Product 값이 0 이므로)

Proof of 2

By Mean Value Theorem

\[\forall \lambda > 0, \;\; \exists S_{\lambda} \in [0,1] \;\;\textit{such that} \;\; f(x+\lambda \nabla f(x)) = f(x) + \langle \nabla f(x + s_{\lambda} \lambda \nabla f(x), \lambda \nabla f(x) \rangle)\]

($x + \lambda \nabla f(x) - x = \lambda \nabla f(x)$ … 두번쨰 항의 $\langle , \rangle$ 내부가 이렇게 된 이유.)

Object : $f(x + \lambda \nabla f(x)) - f(x) > 0$ for sufficiently small $\lambda > 0$ implies that $\lambda \langle \nabla f(x + s_{\lambda} \lambda f(x)), \nabla f(x) \rangle > 0$ for sufficiently small $\lambda > 0$.

By Continuity of $\nabla f(x)$ at $x$, (즉, 어떤 $\left| x \right| < \delta$ 존재하여 $\left| \nabla f(x) \right| < \varepsilon \;\; \forall \varepsilon > 0$)

\[\exists \bar{\lambda} \;\;\textit{such that}\;\; 0 < \lambda \leq \bar{\lambda} \;\;\textit{such that}\;\; \langle \nabla f(x+s_{\lambda} \bar{\lambda} \nabla f(x)), \nabla f(x) \rangle \geq \frac{1}{2} \left\| \nabla f(x) \right\|\]

따라서, $\nabla f(x)$ is an outward normal.

Q.E.D. of Property U-1

Corollary U-1 : descent direction

If $x \in \mathbb{R}^n$ is such that $\nabla f(x) \neq 0$, then any vector $h \in \mathbb{R}^n$ such that $\langle \nabla f(x), h \rangle < 0$ is a descent direction for $f(\cdot)$ at $x$. i.e.

\[\exists \hat{\lambda} > 0 \;\;\textit{such that}\;\; f(x+\hat{\lambda} h) - f(x) < 0\]

Note

**Differentiable Convex ** 의 정의에 따르면

\[f(y) - f(x) \geq \langle \nabla f(x), (y-x) \rangle\]

이다. 그러므로 $y=x+\hat{\lambda} h$ 인데, $\left| y \right| > \left| x \right|$ 이라면, 오히려 $f(x+\hat{\lambda} h) - f(x) > 0$ 이어야 한다. 이는 이미 Property U-1에서 증명된 것이다. 따라서 이것이 반대가 되기 위해서는 $\left| y \right| \leq \left| x \right|$ 이어야 한다, 그 경우 $\langle \nabla f(x+s_{\lambda} \hat{\lambda} \nabla f(x)), \nabla f(x) \rangle \geq \frac{1}{2} \left| \nabla f(x) \right|$ 이지만, $\left| y \right| \leq \left| x \right|$ 이므로 부호가 바뀌어서

\[f(x+\hat{\lambda} h) - f(x) = \hat{\lambda} \langle \nabla f(x+s_{\lambda} \bar{\lambda} \nabla f(x)), \nabla f(x) \rangle \leq -\hat{\lambda}\frac{1}{2} \left\| \nabla f(x) \right\|\]

가 된다.

증명은 생략한다.

Directional Derivative

Directional Derivative at $x$ along the vector $h$ is defined by

\[df(x,h) = \lim_{\lambda \downarrow 0} \frac{f(x+\lambda h) - f(x)}{\lambda}\]

where $\lambda \downarrow 0$ means that $\lambda$ is Monotone decreasing on $\mathbb{R}^+$, and $\lambda \uparrow 0$ is Monotone increasing on $\mathbb{R}^-$.

  • If $f$ is continuosly differentiable, then
\[f(x+\lambda h) - f(x) = \int_0^1 \langle \nabla f(x + s \lambda h), \lambda h \rangle\]

imnplies that

\[\frac{f(x+\lambda h) - f(x)}{\lambda} = \int_0^1 \langle \nabla f(x + s \lambda h), h \rangle ds\]

Let $\lambda \downarrow 0$, then

\[\begin{align} df(x, h) &= \lim_{\lambda \downarrow 0} \frac{f(x+\lambda h) - f(x)}{\lambda} \\ &= \lim_{\lambda \downarrow 0} \int_0^1 \langle \nabla f(x + s \lambda h), h \rangle ds \\ &= \int_0^1 \langle \lim_{\lambda \downarrow 0} \nabla f(x + s \lambda h), h \rangle ds \\ &= \langle \nabla f(x ), h \rangle \int_0^1 ds \\ &= \langle \nabla f(x ), h \rangle \end{align}\]

\[df(x,h) = \langle \nabla f(x ), h \rangle\]

Note

많은 경우 $x$를 생략하고 $df(h)$ 를 많이 사용한다. 여기서는 최적화 알고리즘 도출을 위해서 $x$를 굳이 표시하였다. 그 외에 $h[f] = h(f)$ 역시 같은 의미로 사용한다. (미분 기하학, 리만 기하학, 미분 다양체론 등에서..)

Fig21

Comments