5. Sensitivity and Conditioning
February 16, 2021
Norms and the Neumann Series
See 1.5. Limits and Continuity.
Suppose $F$ is a square matrix and $|F| <1$ in some operator norm and consider the power series
\[\sum_{j=0}^nF^j.\]By the sub multiplicative property, $|F^j| \leq |F|^j$. By the triangle inequality, the partial sums satisfy
\[(I-F)\sum_{j=0}^nF^j= I - F^{n+1}.\]Therefore we have that
\[\lim_{n\to \infty} \|(I-F)\sum_{j=0}^nF^j-I\| \leq \|F\|^{n+1}= 0.\]Therefore, $I-F$ is invertible and $(I-F)^{-1} = \sum_{j=0}^\infty F^j$.
Notions of Error
The art of numerics is finding an approximation that works in a certain error bound. Some sources of error are
- measurement error
- roundoff error
- modeling error
- approximation error
First, we must formally explain what error is.
Definition (Absolute error). Suppose $\hat x$ is an approximation of $x$. The absolute error is
\[e_{abs} = |\hat x - x|.\]Absolute error has the same dimensions of $x$.
Definition (Relative error). The relative error is a measure with a more natural sense of scale. It is a proportion.
\[e_{rel} = \frac{e_{abs}}{|x|}.\]We can also approximate the relative error by replacing $\hat x$ with $x$.
\[\hat e_{rel} = \frac{e_{abs}}{|\hat x|}.\]To see how good of an estimate $\hat e_{rel}$ is, we know that if $\hat e_{rel} <1$, then
\[\frac{\hat e_{rel}}{1+\hat e_{rel}} \leq e_{rel} \leq \frac{\hat e_{rel}}{1-\hat e_{rel}}.\]If $\hat e_{rel}$ is much less than 1, it is a good estimate for $e_{rel}$.
Definition (Mixed error). When $x=0$, relative error makes no sense because it is undefined. Another interpretation we can create is mixed error.
\[e_{mixed} = \frac{e_{abs}}{|x| + \tau}\]where $\tau$ is some natural scale factor associated with $x$.
Error for vectors
Instead of using absolute value, we can replace everything with vector norms. We can also consider the component-wise errors.
\[e_{abs, i} = |\hat x_i - x_i|\quad e_{rel, i} = \frac{e_{abs, i}}{|x_i|}.\]Theorem. The maximum component-wise relative error can be computed as a norm-wise error in a norm defined in terms of the solution vector.
\[\max_ie_{rel, i} = \|\text{diag}(x)^{-1}(\hat x - x)\|\]Forwards and backwards error and conditioning
Often, we will want to approximate a function $f$ by another function $\hat f$.
Definition (forward error). The forward error of this function approximation is
\[|\hat f(x) - f(x)|.\]This is the function output.
Definition (backward error, backward stable). When we have a slightly wrong input,
\[\hat f(x) = f(\hat x),\]then $| \hat x - x|$ is the backwards error. An algorithm that always has a small backwards error is backward stable.
Definition (condition number). The condition number is the relative error divided by the relative perturbation. It is a tight constant relating relative output error to relative input error. For example, suppose we are evaluating the function $f$ with a perturbed input $\hat x = x+h$. Note that the relative error is $|h|/|x|$.
Let $\kappa[f(x)]$ be the smallest constant such that
\[\frac{|f(x+h) - f(x)|}{|f(x)|}\leq \kappa [f(x)] \frac{|h|}{|x|} + o(|h|).\]Suppose $f$ is differentiable. The relative error is
\[\frac{|f(x+e) - f(x)|}{|f(x)|}.\]Using a Taylor expansion, $f(x + h) \approx f(x) + f’(x)h$. Therefore, the condition number is
\[\kappa [f(x)] = \lim_{h \to 0}\frac{|f(x+h)-f(x)|/f(x)|}{|(x+h)-x|/|x|} = \frac{|f'(x)||x|}{|f(x)|}.\]If $f$ is Lipschitz in a neighborhood of $x$, then
\[\kappa[f(x)] = \frac{M_{f(x)}|x|}{|f(x)|}\]where $M_f$ is the smallest constant such that $|f(x + h) - f(x)| \leq M_f|h| + o(|h|)$.
If there is no linear bound on a function, it has an infinite condition number.
A problem is well conditioned if it has a small conditioned number. A problem is ill-conditioned if it has a large condition number.
💡 A backwards stable algorithm applied to a well-conditioned problem has a small forward error.
Perturbing matrix problems
Theorem (Condition number for matrix multiplication). Suppose that we want to compute some matrix operation $y = Ax$ but there is some error $\hat y = (A+E)x$. The absolute error is
\[e_{abs} = \|\hat y - y\| = \|Ex\|.\]The relative error is
\[\frac{\|\hat y - y\|}{\|y\|} = \frac{\|Ex\|}{\|y\|}.\]Assuming $A$ is invertble,
\[\|Ex\| = \|EA^{-1}y\| \leq \|E\|\|A^{-1}\| \|y\|.\]Therefore,
\[\frac{\|\hat y - y\|}{\|y\|} \leq \|A\|\|A^{-1}\|\frac{\|E\|}{\|A\|} = \|A^{-1}\|\|E\|\\ = \|A\|\|A^{-1}\|\frac{\|E\|}{\|A\|} = \kappa(A) \frac{\|E\|}{\|A\|}.\]Here, we see that $\kappa (A) = |A||A^{-1}|$. While not quite accurate, this is called the condition number of $A$. This is also equal to $\sigma_1(A) \cdot \sigma_1(A^{-1})$.
Claim. $\sigma_1(A^{-1}) = 1/ \sigma_1(A)$.
Proof. $A = U \Sigma V^T$, so $A^{-1} = V\Sigma^{-1}U^T$. Since $V$ and $U$ are orthonormal, the magnitude is the largest singular value.
When we make perturbations to $x$, our condition number is $|A| |x| / |Ax|$. This is derived from our following computation.
\[\hat y = A(x + h)\\ e_{abs} =\| Ax+Ah - Ax = Ah\|.\\ e_{rel} = \frac{\|Ah\|}{\|Ax\|}\]Diving by the perturbation,
\[\frac{\|Ah\| / \|Ax\|}{\|x+h -x\|/\|x\|} =\frac{\|Ah\|\|x\|}{\|Ax\|\|h\|} = \frac{\|A\|\\\|x\|}{\|Ax\|}.\]Notice that $|A| = |J|$. This agrees with the limit definition of A’s derivative as $h\to 0$.
Dimensions and scaling
It is common that the first step of analyzing an application is nondimensionalization. This is the idea of combining constants to obtain a small number of dimensionless constants. One example is the gravity constant $G$.