2.8. Newton’s Method

Newton’s method is an efficient way to calculate roots of a function.

Definition 2.8.1 (Newton’s method). Let $\vec f:U \to \R^n$ be a differentiable map. Newton’s method consists of starting with some guess $a_0$ for a solution of $\vec f(x) = \vec 0$l Then linearlize the equation at $a_0$: replace the increment to the function, $\vec f(x) - \vec f(a_0)$, by a linear function of the increment, $D\vec f(a_0)$. Now solve the corresponding linear equation:

\[\vec f(a_0) + [D\vec f(a_0)](x-a_0)=\vec 0.\]

This is a system of $n$ linear equations in $n$ unknowns. We can write it as

\[[D\vec f(a_0)](x-a_0)=-\vec f(a_0).\]

If $[D\vec f(a_0)]$ is invertible, which will usually be the case, then

\[x = a_n - [D\vec f(a_n)]^{-1}f(a_n).\]

Newton’s method doesn’t work for every function. In the single variable case, it is clear when it does not work.

Lipschitz conditions

To find at which points Newton’s method converges, we create an explicit bound on how good an approximation $[D\vec f(x_0)]\vec h$ is to $\vec f(x_0 + \vec h) - \vec f(x_0)$.

Definition 2.8.4 (Lipschitz condition for a derivative). Let $U \subset\R^n$ be open and let $f: U \to \R^n$ be a differentiable mapping. The derivative $[Df(x)]$ satisfies a Lipschitz condition on a subset $V \subset U$ with Lipschitz ratio $M$ if for all $x,y\in V$,

\[\underbrace{\Big|[Df(x)] - [Df(y)]\Big|}_{\text{distance between derivatives}} \leq M \underbrace{|x-y|}_{\text{distance between points}}.\]

Computing Lipschitz ratios using second partial derivatives

Definition 2.8.7 (Second partial derivative). Let $U \subset \R^n$ be open, and let $f: U \to \R$ be a differentiable function. If the function $D_if$ is itself differentiable, then its partial derivative with respect to the $j$th variable, $D_j(D_if)$ is called the second partial derivative of $f$.

Proposition 2.8.9 (Derivative of a $C^2$ mapping is Lipschitz). Let $U \subset \R^n$ be an open ball, and $f: U \to \R^n$ a $C^2$ mapping. If

\[|D_kD_jf_i(x)| \leq c_{i,j,k}\]

for any $x\in U$ and for all triples of indices $1\leq i,j,k \leq n$, then for $u,v \in U$,

\[|[Df(u)]-[Df(v)]| \leq \Big(\sum_{1\leq i,j,k\leq n}(c_{i,j,k})^2\Big)^{1/2}|u-v|.\]

Kantorovich’s Theorem

Theorem 2.8.13 (Kantorovich’s theorem). Let $a_0$ be a point in $\R^n$, $U$ an open neighborhood of $a_0$ in $\R^n$, and $\vec f: U \to \R^n$ a differentiable mapping, with its derivative $[D\vec f(a_0)]$ invertible. Define

\[\vec h_0 = -[D\vec f(a_0)]^{-1}\vec f(a_0), a_1 = a_0 + \vec h_0, U_1 = B_{|\vec h_0|}(a_1).\]

If $\overline U_1 \subset U$ and the derivative $[D\vec f(x)]$ satisfies the Lipschitz condition

$ [D\vec f(u_1)] - [D\vec f(u_2)] \leq M u_1 - u_2 $ for all points $u_1, u_2 \in \overline U_1$, and if the inequality
\[|\vec f(a_0)||[D\vec f(a_0)]^{-1}|^2M \leq \frac{1}{2}\]

is satisfied, then the equation $\vec f(x) = \vec 0$ has a unique solution in the closed ball $\overline U_1$, and Newton’s method with initial guess $a_0$ converges to it.

Example. (Let’s complicate life).When does Newton’s method converge for $x^3-2x-5=0$?

Suppose we start at $x_0 = 2$. The distance between $x_0$ and $x_1$, written as $h_0$, is equal to

\[h_0 = [Df(x)]^{-1}f(x_0)\\ = \frac{f(x_0)}{f'(x_0)} = \frac{-1}{10}.\]

Now we compute $x_1 = x_0 -h_0 = 2 + 1/10 = 2.1$. For Newton’s method to converge, we need to find the supremum of $f’‘(x)$ on a ball $B_{h_0}(x_1)=(2,2.2)$. Because $f’’$ is an always increasing function, the sup is $f’‘(2.2) = 13.2$. Using this value on Kantorovich’s theorem,

\[\frac{|f(x_0)|M}{|f'(x_0)|^2} = \frac{(1)(13.2)}{10^2} < \frac{1}{2}.\]

So it does indeed converge at $x_0 = 2$.

Example.

\[x^2+y^2 = 25 + a\\ xy = 12 + b\]

Find $r$ such that if $a^2 + b^2 < r^2$, Newton’s method starting at $\begin{pmatrix}3\4\end{pmatrix}$ will converge to a solution. We find the roots of the function from $U \to \R^2$

\[f\begin{pmatrix}x\\y\end{pmatrix} = \begin{pmatrix}x^2+y^2-25-a\\xy-12-b\end{pmatrix}.\]
By Kantorovich’s theorem, we solve when $ \vec f(a_0)   [D\vec f(a_0)]^{-1} ^2M \leq \frac{1}{2}$. We first calculate $M$ by computing $f$’s Lipschitz condition.
\[[Df] = \begin{bmatrix}2x&2y\\y&x\end{bmatrix}.\] \[D_1D_1f_1 = 2. \space D_2D_1f_1 = 0.\\ D_1D_2f_1 = 0. \space D_2D_2f_1 = 2.\\ D_1D_1f_2 = 0. \space D_2D_1f_2 = 1. \\D_1D_2f_2 = 1. \space D_2D_2f_2 = 0.\]

The partial derivatives are bounded by $M = \sqrt{10}$. Next, we calculate the inverse of the Jacobian. At $x=3$ and $y=4$,

\[[Df]^{-1} = \frac{1}{14}\begin{bmatrix}3&-8\\-4&8\end{bmatrix}\]

The length of the square of this is $\frac{125}{196}$.

Now, let’s find the ball $U_0$ where $f$ converges. It is the ball of radius $h_0$ centered around $x_1$.

\[h_0 = [Df(a_0)]^{-1}f(a_0)\\ = \frac{1}{14}\begin{bmatrix}3&-8\\-4&8\end{bmatrix}\begin{bmatrix}-a\\-b\end{bmatrix}\\ = \frac{1}{14}\begin{bmatrix}-3a+8b\\4a-6b\end{bmatrix}.\]
The equation $f\begin{pmatrix}x\y\end{pmatrix} = \begin{pmatrix}0\0\end{pmatrix}$ has a unique solution in $B_{ h_0 }a_1$ if $r\leq \frac{1}{2}\frac{196}{125}\frac{1}{\sqrt{10}}$.