# 5. Sensitivity and Conditioning

February 16, 2021

## Norms and the Neumann Series

See 1.5. Limits and Continuity.

Suppose $F$ is a square matrix and $|F| <1$ in some operator norm and consider the power series

\[\sum_{j=0}^nF^j.\]By the sub multiplicative property, $|F^j| \leq |F|^j$. By the triangle inequality, the partial sums satisfy

\[(I-F)\sum_{j=0}^nF^j= I - F^{n+1}.\]Therefore we have that

\[\lim_{n\to \infty} \|(I-F)\sum_{j=0}^nF^j-I\| \leq \|F\|^{n+1}= 0.\]Therefore, $I-F$ is invertible and $(I-F)^{-1} = \sum_{j=0}^\infty F^j$.

## Notions of Error

The art of numerics is finding an approximation that works in a certain error bound. Some sources of error are

- measurement error
- roundoff error
- modeling error
- approximation error

First, we must formally explain what error is.

**Definition (Absolute error).** Suppose $\hat x$ is an approximation of $x$. The *absolute error* is

Absolute error has the same dimensions of $x$.

**Definition (Relative error).** The *relative error* is a measure with a more natural sense of scale. It is a proportion.

We can also approximate the relative error by replacing $\hat x$ with $x$.

\[\hat e_{rel} = \frac{e_{abs}}{|\hat x|}.\]To see how good of an estimate $\hat e_{rel}$ is, we know that if $\hat e_{rel} <1$, then

\[\frac{\hat e_{rel}}{1+\hat e_{rel}} \leq e_{rel} \leq \frac{\hat e_{rel}}{1-\hat e_{rel}}.\]If $\hat e_{rel}$ is much less than 1, it is a good estimate for $e_{rel}$.

**Definition (Mixed error).** When $x=0$, relative error makes no sense because it is undefined. Another interpretation we can create is mixed error.

where $\tau$ is some natural scale factor associated with $x$.

### Error for vectors

Instead of using absolute value, we can replace everything with vector norms. We can also consider the component-wise errors.

\[e_{abs, i} = |\hat x_i - x_i|\quad e_{rel, i} = \frac{e_{abs, i}}{|x_i|}.\]**Theorem.** The maximum component-wise relative error can be computed as a norm-wise error in a norm defined in terms of the solution vector.

## Forwards and backwards error and conditioning

Often, we will want to approximate a function $f$ by another function $\hat f$.

**Definition (forward error).** The forward error of this function approximation is

This is the function output.

**Definition (backward error, backward stable).** When we have a slightly wrong input,

then $| \hat x - x|$ is the backwards error. An algorithm that always has a small backwards error is *backward stable*.

**Definition (condition number).** The condition number is the relative error divided by the relative perturbation. It is a tight constant relating relative output error to relative input error. For example, suppose we are evaluating the function $f$ with a perturbed input $\hat x = x+h$. Note that the relative error is $|h|/|x|$.

Let $\kappa[f(x)]$ be the smallest constant such that

\[\frac{|f(x+h) - f(x)|}{|f(x)|}\leq \kappa [f(x)] \frac{|h|}{|x|} + o(|h|).\]Suppose $f$ is differentiable. The relative error is

\[\frac{|f(x+e) - f(x)|}{|f(x)|}.\]Using a Taylor expansion, $f(x + h) \approx f(x) + f’(x)h$. Therefore, the condition number is

\[\kappa [f(x)] = \lim_{h \to 0}\frac{|f(x+h)-f(x)|/f(x)|}{|(x+h)-x|/|x|} = \frac{|f'(x)||x|}{|f(x)|}.\]If $f$ is Lipschitz in a neighborhood of $x$, then

\[\kappa[f(x)] = \frac{M_{f(x)}|x|}{|f(x)|}\]where $M_f$ is the smallest constant such that $|f(x + h) - f(x)| \leq M_f|h| + o(|h|)$.

If there is no linear bound on a function, it has an *infinite condition number*.

A problem is *well conditioned* if it has a small conditioned number. A problem is *ill-conditioned* if it has a large condition number.

💡 A backwards stable algorithm applied to a well-conditioned problem has a small forward error.

## Perturbing matrix problems

**Theorem (Condition number for matrix multiplication).** Suppose that we want to compute some matrix operation $y = Ax$ but there is some error $\hat y = (A+E)x$. The absolute error is

The relative error is

\[\frac{\|\hat y - y\|}{\|y\|} = \frac{\|Ex\|}{\|y\|}.\]Assuming $A$ is invertble,

\[\|Ex\| = \|EA^{-1}y\| \leq \|E\|\|A^{-1}\| \|y\|.\]Therefore,

\[\frac{\|\hat y - y\|}{\|y\|} \leq \|A\|\|A^{-1}\|\frac{\|E\|}{\|A\|} = \|A^{-1}\|\|E\|\\ = \|A\|\|A^{-1}\|\frac{\|E\|}{\|A\|} = \kappa(A) \frac{\|E\|}{\|A\|}.\]Here, we see that $\kappa (A) = |A||A^{-1}|$. While not quite accurate, this is called the condition number of $A$. This is also equal to $\sigma_1(A) \cdot \sigma_1(A^{-1})$.

**Claim.** $\sigma_1(A^{-1}) = 1/ \sigma_1(A)$.

**Proof.** $A = U \Sigma V^T$, so $A^{-1} = V\Sigma^{-1}U^T$. Since $V$ and $U$ are orthonormal, the magnitude is the largest singular value.

When we make perturbations to $x$, our condition number is $|A| |x| / |Ax|$. This is derived from our following computation.

\[\hat y = A(x + h)\\ e_{abs} =\| Ax+Ah - Ax = Ah\|.\\ e_{rel} = \frac{\|Ah\|}{\|Ax\|}\]Diving by the perturbation,

\[\frac{\|Ah\| / \|Ax\|}{\|x+h -x\|/\|x\|} =\frac{\|Ah\|\|x\|}{\|Ax\|\|h\|} = \frac{\|A\|\\\|x\|}{\|Ax\|}.\]Notice that $|A| = |J|$. This agrees with the limit definition of A’s derivative as $h\to 0$.

## Dimensions and scaling

It is common that the first step of analyzing an application is *nondimensionalization*. This is the idea of combining constants to obtain a small number of dimensionless constants. One example is the gravity constant $G$.