Lecture 8: Matrix decompositions review. How to compute QR decomposition and Schur decomposition

Recap of the previous lecture

Today lecture

General concept of matrix factorization

What is a simpler structure

Other classes of structured matrices

Plan

The plan for today's lecture is to discuss the decompositions one-by-one and point out:

Decompositions we want to discuss today

PLU decomposition

$$ A = P L U, $$

where $P$ is a permutation matrix, $L$ is a lower triangular matrix, $U$ is an upper triangular.

$$ A^{-1} f = (L U)^{-1} f = U^{-1} L^{-1} f, $$

and this reduces to the solution of two linear systems

$$ L y = f, \quad U x = y $$

with lower and upper triangular matrices respectively.

Q: what is the complexity of solving these linear systems?

Positive definite matrices and Cholesky decomposition, reminder

If the matrix is Hermitian positive definite, i.e.

$$ A = A^*, \quad (Ax, x) > 0, \quad x \ne 0, $$

then it can be factored as

$$ A = RR^*, $$

where $R$ is lower triangular.

We will need this for the QR decomposition.

QR decomposition

$$ A = Q R, $$

where $Q$ is an column orthogonal (unitary) matrix and $R$ is upper triangular.

QR decomposition: applications

This decomposition plays a crucial role in many problems:

QR decomposition and least squares, reminder

$$ \Vert A x - f \Vert_2 \rightarrow \min_x, $$

where $A$ is $n \times m$, $n \geq m$.

$$ A = Q R, $$

and use equation for pseudo-inverse matrix in the case of the full rank matrix $A$:

$$ x = A^{\dagger}b = (A^*A)^{-1}A^*b = ((QR)^*(QR))^{-1}(QR)^*b = (R^*Q^*QR)^{-1}R^*Q^*b = R^{-1}Q^*b. $$

thus $x$ can be recovered from

$$R x = Q^*b$$

Existence of QR decomposition

Theorem. Every rectangular $n \times m$ matrix has a QR decomposition.

There are several ways to prove it and compute it:

Proof using Cholesky decomposition

If we have the representation of the form

$$A = QR,$$

then $A^* A = ( Q R)^* (QR) = R^* (Q^* Q) R = R^* R$, the matrix $A^* A$ is called Gram matrix, and its elements are scalar products of the columns of $A$.

Proof using Cholesky decomposition (full column rank)

$$ (A^* A y, y) = (Ay, Ay) = \Vert Ay \Vert^2 > 0, \quad y\not = 0. $$ $$ (A R^{-1})^* (AR^{-1})= R^{-*} A^* A R^{-1} = R^{-*} R^* R R^{-1} = I. $$

Proof using Cholesky decomposition (rank-deficient case)

Stability of QR decomposition via Cholesky decomposition

$$A^* A = R^* R,$$

and

$$Q = A R^{-1}.$$

Second way: Gram-Schmidt orthogonalization

Gram-Schmidt:

  1. $q_1 := a_1/\Vert a_1 \Vert$
  2. $q_2 := a_2 - (a_2, q_1) q_1, \quad q_2 := q_2/\Vert q_2 \Vert$
  3. $q_3 := a_3 - (a_3, q_1) q_1 - (a_3, q_2) q_2, \quad q_3 := q_3/\Vert q_3 \Vert$
  4. And so on

Note that the transformation from $Q$ to $A$ has triangular structure, since from the $k$-th vector we subtract only the previous ones. It follows from the fact that the product of triangular matrices is a triangular matrix.

Modified Gram-Schmidt

$$q_k := a_k - (a_k, q_1) q_1 - \ldots - (a_k, q_{k-1}) q_{k-1}$$

we do it step-by-step. First we set $q_k := a_k$ and orthogonalize sequentially:

$$ q_k := q_k - (q_k, q_1)q_1, \quad q_k := q_{k} - (q_k,q_2)q_2, \ldots $$

QR decomposition: the (almost) practical way

$$ R = Q^* A, $$

and we need to find a certain orthogonal matrix $Q$ that brings a matrix into upper triangular form.

$$ Q^* A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ & 0_{(n-m) \times m} \end{bmatrix} $$
$$ H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & * & * \\ 0 & * & * \end{bmatrix} $$

Then,

$$ H_2 H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & * \end{bmatrix}, $$

where

$$ H_2 = \begin{bmatrix} 1 & 0 \\ 0 & H'_2, \end{bmatrix} $$

and $H'_2$ is a $3 \times 3$ Householder matrix.

Finally,

$$ H_3 H_2 H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & 0 \end{bmatrix}, $$

You can try to implement it yourself, it is simple.

QR decomposition: real life

$$H = I - 2UU^*, $$

where $U^* U = I$.

Rank-revealing QR-decomposition

$$P A = Q R,$$

where $P$ is the permutation matrix (it permutes columns), and $R$ has the block form

$$R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22}\end{bmatrix}.$$

Summary

Schur form

$$A = Q T Q^*,$$

with upper triangular $T$ and unitary $Q$ and this decomposition gives eigenvalues of the matrix (they are on the diagonal of $T$).

QR algorithm

Way to QR algorithm

$$A = Q T Q^*,$$

and rewrite it in the form

$$ Q T = A Q. $$

Derivation of the QR algorithm as fixed-point iteration

We can write down the iterative process

$$ Q_{k+1} R_{k+1} = A Q_k, \quad Q_{k+1}^* A = R_{k+1} Q^*_k $$

Introduce

$$A_k = Q^* _k A Q_k = Q^*_k Q_{k+1} R_{k+1} = \widehat{Q}_k R_{k+1}$$

and the new approximation reads

$$A_{k+1} = Q^*_{k+1} A Q_{k+1} = ( Q_{k+1}^* A = R_{k+1} Q^*_k) = R_{k+1} \widehat{Q}_k.$$

So we arrive at the standard form of the QR algorithm.

The final formulas are then written in the classical QRRQ-form:

  1. Start from $A_0 = A$.
  2. Compute QR factorization of $A_k = Q_k R_k$.
  3. Set $A_{k+1} = R_k Q_k$.

Iterate until $A_k$ is triangular enough (e.g. norm of subdiagonal part is small enough).

What about the convergence and complexity

Statement

Matrices $A_k$ are unitary similar to $A$

$$A_k = Q^*_{k-1} A_{k-1} Q_{k-1} = (Q_{k-1} \ldots Q_1)^* A (Q_{k-1} \ldots Q_1)$$

and the product of unitary matrices is a unitary matrix.

Convergence and complexity of the QR algorithm

Q: does it mean $\mathcal{O}(n^4)$ complexity totally?

A: fortunately, not.

Convergence of QR algorithm

Does QR algorithm always convergence?

Provide an example.

Counterexample

For a matrix $A = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$

we have $A_k = A$.

Connection to orthogonal iteration

In the previous lecture, we considered power iteration, which is $A^k v$ -- approximation of the eigenvector.

The QR algorithm computes (implicitly) QR-factorization of the matrix $A^k$:

$$A^k = A \cdot \ldots \cdot A = Q_1 R_1 Q_1 R_1 \ldots = Q_1 Q_2 R_2 Q_2 R_2 \ldots (R_2 R_1) = \ldots (Q_1 Q_2 \ldots Q_k) (R_k \ldots R_1).$$

A few words about the SVD

$$A = U \Sigma V^*.$$ $$A^* A = V^* \Sigma^2 V,$$

and/or

$$AA^* = U^* \Sigma^2 U$$

with QR algorithm, but it is a bad idea (c.f. Gram matrices).

Bonus: CS (Cosine-Sine) decomposition

$$ Q = \begin{bmatrix} Q_{11} & Q_{12} \\ Q_{21} & Q_{22} \end{bmatrix} $$

$$ \begin{bmatrix} U_1^* & 0 \\ 0 & U_2^* \end{bmatrix} \begin{bmatrix} Q_{11} & Q_{12} \\ Q_{21} & Q_{22} \end{bmatrix} \begin{bmatrix} V_1 & 0 \\ 0 & V_2 \end{bmatrix} = \begin{bmatrix} C & S \\ -S & C \end{bmatrix}, $$

where $C = \mathrm{diag}(c)$ and $S = \mathrm{diag}(s)$ such that $c_i \geq 0, s_i \geq 0$ and $c_i^2 + s_i^2 = 1$

$$ \begin{bmatrix} U_1^* & 0 \\ 0 & U_2^* \end{bmatrix} \begin{bmatrix} Q_{1} \\ Q_{2} \end{bmatrix} V = \begin{bmatrix} C \\ S \end{bmatrix} $$

Summary

Next steps

Questions?