Today we will talk about:
The idea is to make a matrix have a simpler structure so that each step of QR algorithm becomes cheaper.
In case of a general matrix we can use the Hessenberg form.
The matrix $A$ is said to be in the Hessenberg form, if
$$a_{ij} = 0, \quad \mbox{if } i \geq j+2.$$$$H = \begin{bmatrix} * & * & * & * & * \\ * & * & * & * & * \\ 0 & * & * & * & *\\ 0 & 0 & * & * & *\\ 0 & 0 & 0 & * & * \\ \end{bmatrix}.$$The only difference with Schur decomposition is that we have to map the first column to the vector with two non-zeros, and the first element is not changed.
The computational cost of such reduction is $\mathcal{O}(n^3)$ operations.
In a Hessenberg form, computation of one iteration of the QR algorithm costs $\mathcal{O}(n^2)$ operations (e.g. using Givens rotations, how?), and the Hessenberg form is preserved by the QR iteration (check why).
In the symmetric case, we have $A = A^*$, then $H = H^*$ and the upper Hessenberg form becomes tridiagonal matrix.
From now on we will talk about the case of symmetric tridiagonal form.
Any symmetric (Hermitian) matrix can be reduced to the tridiagonal form by Householder reflections.
Key point is that tridiagonal form is preserved by the QR algorithm, and the cost of one step can be reduced to $\mathcal{O}(n)$!
Let us see..
%matplotlib inline
import jax.numpy as jnp
import jax
import matplotlib.pyplot as plt
from jax.config import config
config.update("jax_enable_x64", True)
#Generate a random tridiagonal matrix
n = 20
d = jax.random.normal(jax.random.PRNGKey(0), (n, ))
sub_diag = jax.random.normal(jax.random.PRNGKey(1), (n-1,))
mat = jnp.diag(d) + jnp.diag(sub_diag, -1) + jnp.diag(sub_diag, 1)
mat1 = jnp.abs(mat)
mat1 = mat1/jnp.max(mat1.flatten())
plt.spy(mat)
q, r = jnp.linalg.qr(mat)
plt.figure()
b = r.dot(q)
# b[abs(b) <= 1e-12] = 0
b = b.at[abs(b) <= 1e-12].set(0.0)
#b = jax.ops.index_update(b, jax.ops.index[abs(b) <= 1e-12], 0)
plt.spy(b)
#plt.figure()
#plt.imshow(np.abs(r.dot(q)))
b[0, :]
DeviceArray([-1.47168069, -0.75157939, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], dtype=float64)
in the case when $A_k = A^*_k$ and is also tridiagonal.
Such matrix is defined by $\mathcal{O}(n)$ parameters; computation of the QR is more complicated, but it is possible to compute $A_{k+1}$ directly without computing $Q_k$.
This is called implicit QR-step.
All the implicit QR algorithms are based on the following theorem:
Let
$$Q^* A Q = H$$be an irreducible upper Hessenberg matrix. Then, the first column of the matrix $Q$ defines all of its other columns. It can be found from the equation
$$A Q = Q H. $$Summary. If we have a decomposition of the form
$$A = X \Lambda X^{-1}, \quad A = \begin{bmatrix}A_{11} & A_{12} \\ A_{21} & A_{22}\end{bmatrix}$$and
$$ \Lambda = \begin{bmatrix} \Lambda_1 & 0 \\ 0 & \Lambda_2 \end{bmatrix}, \quad \lambda(\Lambda_1)=\{\lambda_1,\dots,\lambda_m\}, \ \lambda(\Lambda_2)=\{\lambda_{m+1},\dots,\lambda_r\}, $$and there is a gap between the eigenvalues of $\Lambda_1$ and $\Lambda_2$ ($|\lambda_1|\geq \dots \geq |\lambda_m| > |\lambda_{m+1}| \geq\dots \geq |\lambda_r| >0$), then the $A^{(k)}_{21}$ block of $A_k$ in the QR-iteration goes to zero with
$$\Vert A^{(k)}_{21} \Vert \leq C q^k, \quad q = \left| \frac{\lambda_{m+1}}{\lambda_{m}} \right |,$$where $m$ is the size of $\Lambda_1$.
So we need to increase the gap! It can be done by the QR algorithm with shifts.
The convergence rate for a shifted version is then
$$\left| \frac{\lambda_{m+1} - s_k}{\lambda_{m} - s_k} \right |,$$where $\lambda_m$ is the $m$-th largest eigenvalue of the matrix in modulus.
It converges to the eigenvector corresponding to the largest eigenvalue in modulus.
The convergence can be very slow.
Let us try to use shifting strategy. If we shift the matrix as
and the corresponding eigenvalue becomes small (but we need large).
where $\lambda$ is the shift which is approximation to the eigenvalue that we want.
As it was for the power method, the convergence is linear.
To accelerate convergence one can use the Rayleigh quotient iteration (inverse iteration with adaptive shifts) which is given by the selection of the adaptive shift:
Now let us talk about singular values and eigenvalues.
SVD
exists for any matrix.
Implicit QR algorithm (with shifts) gives the way of computing the eigenvalues (and Schur form).
But we cannot apply QR algorithm directly to the bidiagonal matrix, as it is not diagonalizable in general case.
However, the problem of the computation of the SVD can be reduced to the symmetric eigenvalue problem in two ways:
The case 1 is OK if you do not form T directly!
Thus, the problem of computing singular values can be reduced to the problem of the computation of the eigenvalues of symmetric tridiagonal matrix.
Done:
Next slides:
where $vv^*$ is rank 1 matrix, $v = (0,\dots,0,1,1,0,\dots,0)^T$.
It is tricky to compute the eigenvalues of the matrix
$$D + \rho u u^* $$The characteristic polynomial has the form
$$\det(D + \rho uu^* - \lambda I) = \det(D - \lambda I)\det(I + \rho (D - \lambda I)^{-1} uu^*) = 0.$$Then (prove!!)
$$\det(I + \rho (D - \lambda I)^{-1} uu^*) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{d_i - \lambda} = 0$$Hint: find $\det(I + w u^*)$ using the fact that $\text{det}(C) = \prod_{i=1}^n\lambda_i(C)$ and $\text{trace}(C) = \sum_{i=1}^n \lambda_i$.
How to find the roots?
import jax.numpy as jnp
lm = jnp.array([1, 2, 3, 4])
M = len(lm)
D = jnp.array(lm)
a = jnp.min(lm)
b = jnp.max(lm)
t = jnp.linspace(-1, 6, 1000)
u = 0.5 * jnp.ones(M)
rho = 1
def fun(lam):
return 1 + rho * jnp.sum(u**2/(D - lam))
res = [fun(lam) for lam in t]
plt.figure(figsize=(10,8))
plt.plot(t, res, 'k')
#plt.plot(jnp.zeros_like(t))
plt.ylim([-6, 6])
plt.tight_layout()
plt.yticks(fontsize=24)
plt.xticks(fontsize=24)
plt.grid(True)
The function has only one root at $[d_i, d_{i+1}]$
We have proved, by the way, the Cauchy interlacing theorem (what happens to the eigenvalues under rank-$1$ perturbation)
A Newton method will fail (draw a picture with a tangent line).
Note that Newton method is just approximation of a function $f(\lambda)$ by a linear function.
Much better approximation is the hyperbola:
To fit the coefficients, we have to evaluate $f(\lambda)$ and $f'(\lambda)$ in the particular point.
After that, the approximation can be recovered from solving quadratic equation
First, stability: this method was abandoned for a long time due to instability of the computation of the eigenvectors.
In the recursion, we need to compute the eigenvectors of the $D + \rho uu^*$ matrix.
The exact expression for the eigenvectors is just (let us check!)
where $\alpha_i$ is the computed root.
If $\alpha_i$ and $d_i$ satisfy the interlacing theorem
$$d_n < \alpha_n < \ldots < d_{i+1} < \alpha_{i+1} \ldots$$Then there exists a vector $\widehat{u}$ such that $\alpha_i$ are exact eigenvalues of the matrix
$$\widehat{D} = D + \widehat{u} \widehat{u}^*.$$In the computations of divide and conquer we have to evaluate the sums of the form
$$f(\lambda) = 1 + \rho \sum_{i=1}^n \frac{|u_i|^2}{(d_i - \lambda)},$$and have to do it at least for $n$ points.
The complexity is then $\mathcal{O}(n^2)$, as for the QR algorithm.
Can we make it $\mathcal{O}(n \log n)$?
The answer is yes, but we have to replace the computations by the approximate ones by the help of Fast Multipole Method.
Lets explain a little bit...
Absolutely different approach is based on the bisection.
Given a matrix $A$ its inertia is defined as a triple
where $\nu$ is the number of negative, $\zeta$ - zero and $\pi$ - positive eigenvalues.
and inertia of the diagonal matrix is trivial to compute.
Thus, if we want to find all the eigenvalues in the interval $a$, $b$
Using inertia, we can easily count the number of eigenvalues in an interval.
Recall what a Jacobi (Givens rotations) are
In a plane they correspong to a $2 \times 2$ orthogonal matrix of the form
and in the $n$-dimensional case we select two variables $i$ and $j$ and rotate.
by applying succesive Jacobi rotations $U$ to zero off-diagonal elements.
When the "pivot" is chosen, it is easy to eliminate it.
The main question is then what is the order of sweeps we have to make (i.e. in which order to eliminate).
If we always eliminate the largest off-diagonal elements the method has quadratic convergence.
In practice, a cyclic order (i.e., $(1, 2), (1, 3), \ldots, (2, 3), \ldots$) is used.
where $B = U^*AU$.
$ \Gamma^2(A) = \text{off}^2(B) = \|B\|^2_F - \sum\limits_{i=1}^n b^2_{ii} = \| A \|^2_F - \sum\limits_{i \neq p, q} b^2_{ii} - (b^2_{pp} + b^2_{qq}) = \| A \|^2_F - \sum\limits_{i \neq p, q} a^2_{ii} - (a^2_{pp} + 2a^2_{pq} + a^2_{qq}) = \| A \|^2_F - \sum\limits_{i =1}^n a^2_{ii} - 2a^2_{pq} = \text{off}^2(A) - 2a^2_{pq} < \text{off}^2(A)$
We show that the ''size'' of off-diagonal elements decreases after Jacobi rotation.
If we always select the largest off-diagonal element $a_{pq} = \gamma$ to eliminate (pivot), then we have
thus
$$ \text{off}(A)^2 \leq 2 N \gamma^2, $$where $2N = n(n-1)$ is the number of off-diagonal elements.
Now we use relations $\Gamma^2(A) = \text{off}^2(A) - 2\gamma^2 \leq \text{off}^2(A) - \dfrac{\text{off}^2(A)}{N}$ and get
$$ \Gamma(A) \leq \sqrt{\left(1 - \frac{1}{N}\right)} \text{off}(A). $$i.e. linear convergence. However, the convergence is locally quadratic (given without proof here).
Jacobi method was the first numerical method for the eigenvalues, proposed in 1846.