# 2.8. Newton’s Method

Newton’s method is an efficient way to calculate roots of a function.

Definition 2.8.1 (Newton’s method). Let $\vec f:U \to \R^n$ be a differentiable map. Newton’s method consists of starting with some guess $a_0$ for a solution of $\vec f(x) = \vec 0$l Then linearlize the equation at $a_0$: replace the increment to the function, $\vec f(x) - \vec f(a_0)$, by a linear function of the increment, $D\vec f(a_0)$. Now solve the corresponding linear equation:

$\vec f(a_0) + [D\vec f(a_0)](x-a_0)=\vec 0.$

This is a system of $n$ linear equations in $n$ unknowns. We can write it as

$[D\vec f(a_0)](x-a_0)=-\vec f(a_0).$

If $[D\vec f(a_0)]$ is invertible, which will usually be the case, then

$x = a_n - [D\vec f(a_n)]^{-1}f(a_n).$

Newton’s method doesn’t work for every function. In the single variable case, it is clear when it does not work.

## Lipschitz conditions

To find at which points Newton’s method converges, we create an explicit bound on how good an approximation $[D\vec f(x_0)]\vec h$ is to $\vec f(x_0 + \vec h) - \vec f(x_0)$.

Definition 2.8.4 (Lipschitz condition for a derivative). Let $U \subset\R^n$ be open and let $f: U \to \R^n$ be a differentiable mapping. The derivative $[Df(x)]$ satisfies a Lipschitz condition on a subset $V \subset U$ with Lipschitz ratio $M$ if for all $x,y\in V$,

$\underbrace{\Big|[Df(x)] - [Df(y)]\Big|}_{\text{distance between derivatives}} \leq M \underbrace{|x-y|}_{\text{distance between points}}.$

### Computing Lipschitz ratios using second partial derivatives

Definition 2.8.7 (Second partial derivative). Let $U \subset \R^n$ be open, and let $f: U \to \R$ be a differentiable function. If the function $D_if$ is itself differentiable, then its partial derivative with respect to the $j$th variable, $D_j(D_if)$ is called the second partial derivative of $f$.

Proposition 2.8.9 (Derivative of a $C^2$ mapping is Lipschitz). Let $U \subset \R^n$ be an open ball, and $f: U \to \R^n$ a $C^2$ mapping. If

$|D_kD_jf_i(x)| \leq c_{i,j,k}$

for any $x\in U$ and for all triples of indices $1\leq i,j,k \leq n$, then for $u,v \in U$,

$|[Df(u)]-[Df(v)]| \leq \Big(\sum_{1\leq i,j,k\leq n}(c_{i,j,k})^2\Big)^{1/2}|u-v|.$

## Kantorovich’s Theorem

Theorem 2.8.13 (Kantorovich’s theorem). Let $a_0$ be a point in $\R^n$, $U$ an open neighborhood of $a_0$ in $\R^n$, and $\vec f: U \to \R^n$ a differentiable mapping, with its derivative $[D\vec f(a_0)]$ invertible. Define

$\vec h_0 = -[D\vec f(a_0)]^{-1}\vec f(a_0), a_1 = a_0 + \vec h_0, U_1 = B_{|\vec h_0|}(a_1).$

If $\overline U_1 \subset U$ and the derivative $[D\vec f(x)]$ satisfies the Lipschitz condition

 $[D\vec f(u_1)] - [D\vec f(u_2)] \leq M u_1 - u_2$ for all points $u_1, u_2 \in \overline U_1$, and if the inequality
$|\vec f(a_0)||[D\vec f(a_0)]^{-1}|^2M \leq \frac{1}{2}$

is satisfied, then the equation $\vec f(x) = \vec 0$ has a unique solution in the closed ball $\overline U_1$, and Newton’s method with initial guess $a_0$ converges to it.

Example. (Let’s complicate life).When does Newton’s method converge for $x^3-2x-5=0$?

Suppose we start at $x_0 = 2$. The distance between $x_0$ and $x_1$, written as $h_0$, is equal to

$h_0 = [Df(x)]^{-1}f(x_0)\\ = \frac{f(x_0)}{f'(x_0)} = \frac{-1}{10}.$

Now we compute $x_1 = x_0 -h_0 = 2 + 1/10 = 2.1$. For Newton’s method to converge, we need to find the supremum of $f’‘(x)$ on a ball $B_{h_0}(x_1)=(2,2.2)$. Because $f’’$ is an always increasing function, the sup is $f’‘(2.2) = 13.2$. Using this value on Kantorovich’s theorem,

$\frac{|f(x_0)|M}{|f'(x_0)|^2} = \frac{(1)(13.2)}{10^2} < \frac{1}{2}.$

So it does indeed converge at $x_0 = 2$.

Example.

$x^2+y^2 = 25 + a\\ xy = 12 + b$

Find $r$ such that if $a^2 + b^2 < r^2$, Newton’s method starting at $\begin{pmatrix}3\4\end{pmatrix}$ will converge to a solution. We find the roots of the function from $U \to \R^2$

$f\begin{pmatrix}x\\y\end{pmatrix} = \begin{pmatrix}x^2+y^2-25-a\\xy-12-b\end{pmatrix}.$
 By Kantorovich’s theorem, we solve when $\vec f(a_0) [D\vec f(a_0)]^{-1} ^2M \leq \frac{1}{2}$. We first calculate $M$ by computing $f$’s Lipschitz condition.
$[Df] = \begin{bmatrix}2x&2y\\y&x\end{bmatrix}.$ $D_1D_1f_1 = 2. \space D_2D_1f_1 = 0.\\ D_1D_2f_1 = 0. \space D_2D_2f_1 = 2.\\ D_1D_1f_2 = 0. \space D_2D_1f_2 = 1. \\D_1D_2f_2 = 1. \space D_2D_2f_2 = 0.$

The partial derivatives are bounded by $M = \sqrt{10}$. Next, we calculate the inverse of the Jacobian. At $x=3$ and $y=4$,

$[Df]^{-1} = \frac{1}{14}\begin{bmatrix}3&-8\\-4&8\end{bmatrix}$

The length of the square of this is $\frac{125}{196}$.

Now, let’s find the ball $U_0$ where $f$ converges. It is the ball of radius $h_0$ centered around $x_1$.

$h_0 = [Df(a_0)]^{-1}f(a_0)\\ = \frac{1}{14}\begin{bmatrix}3&-8\\-4&8\end{bmatrix}\begin{bmatrix}-a\\-b\end{bmatrix}\\ = \frac{1}{14}\begin{bmatrix}-3a+8b\\4a-6b\end{bmatrix}.$
 The equation $f\begin{pmatrix}x\y\end{pmatrix} = \begin{pmatrix}0\0\end{pmatrix}$ has a unique solution in $B_{ h_0 }a_1$ if $r\leq \frac{1}{2}\frac{196}{125}\frac{1}{\sqrt{10}}$.