Today we will talk about matrix factorizations as general tool
Basic matrix factorizations in numerical linear algebra:
We already introduced QR decomposition some time ago, but now we are going to discuss it in more details.
In numerical linear algebra we need to solve different tasks, for example:
In order to do this, we represent the matrix as a sum and/or product of matrices with simpler structure, such that we can solve mentioned tasks faster / in a more stable form.
What is a simpler structure?
We already encountered several classes of matrices with structure.
For dense matrices the most important classes are
The plan for today's lecture is to discuss the decompositions one-by-one and point out:
where $P$ is a permutation matrix, $L$ is a lower triangular matrix, $U$ is an upper triangular
and this reduces to the solution of two linear systems
$$ L y = f, \quad U x = y $$with lower and upper triangular matrices respectively.
If the matrix is Hermitian positive definite, i.e.
$$ A = A^*, \quad (Ax, x) > 0, \quad x \ne 0, $$then it can be factored as
$$ A = RR^*, $$where $R$ is lower triangular.
We will need this for the QR decomposition.
where $Q$ is an column orthogonal (unitary) matrix and $R$ is upper triangular.
The matrix sizes: $Q$ is $n \times m$, $R$ is $m \times m$ if $n\geq m$. See our poster for visualization of QR decomposition
QR decomposition is defined for any rectangular matrix.
This decomposition plays a crucial role in many problems:
where $A$ is $n \times m$, $n \geq m$.
and use equation for pseudo-inverse matrix in the case of the full rank matrix $A$:
$$ x = A^{\dagger}b = (A^*A)^{-1}A^*b = ((QR)^*(QR))^{-1}(QR)^*b = (R^*Q^*QR)^{-1}R^*Q^*b = R^{-1}Q^*b. $$thus $x$ can be recovered from
$$R x = Q^*b$$One of the efficient ways to solve really overdetermined ($n\gg m$) system of linear equations is to use Kaczmarz method.
Instead of solving all equations, pick one randomly, which reads
and given an approximation $x_k$ try to find $x_{k+1}$ as
$$x_{k+1} = \arg \min_x \Vert x - x_k \Vert, \quad \mbox{s.t.} \quad a^{\top}_i x = f_i.$$Theorem. Every rectangular $n \times m$ matrix has a QR decomposition.
There are several ways to prove it and compute it:
If we have the representation of the form
$$A = QR,$$then $A^* A = ( Q R)^* (QR) = R^* (Q^* Q) R = R^* R$, the matrix $A^* A$ is called Gram matrix, and its elements are scalar products of the columns of $A$.
Therefore, $A^* A = R^* R$ always exists.
Then the matrix $Q = A R^{-1}$ is unitary:
When an $n \times m$ matrix does not have full column rank, it is said to be rank-deficient.
The QR decomposition, however, also exists.
For any rank-deficient matrix there is a sequence of full-column rank matrices $A_k$ such that $A_k \rightarrow A$ (why?).
Each $A_k$ can be decomposed as $A_k = Q_k R_k$.
The set of all unitary matrices is compact, thus there exists a converging subsequence $Q_{n_k} \rightarrow Q$ (why?), and $Q^* A_k \rightarrow Q^* A = R$, which is triangular.
and
$$Q = A R^{-1}.$$import jax.numpy as jnp
from jax.config import config
config.update("jax_enable_x64", True)
n = 40
r = 9
a = [[1.0 / (i + j + 0.5) for i in range(r)] for j in range(n)]
a = jnp.array(a)
q, Rmat = jnp.linalg.qr(a)
e = jnp.eye(r)
print('Built-in QR orth', jnp.linalg.norm(jnp.dot(q.T, q) - e))
gram_matrix = a.T.dot(a)
Rmat1 = jnp.linalg.cholesky(gram_matrix)
q1 = jnp.dot(a, jnp.linalg.inv(Rmat1.T))
#q1 = jnp.linalg.solve(Rmat1, a.T).T
print('Via Cholesky:', jnp.linalg.norm(jnp.dot(q1.T, q1) - e))
q1[:, 0]@q1[:, 5]
Built-in QR orth 1.595746790565955e-15 Via Cholesky: 0.8819197747599913
DeviceArray(-2.1925951e-11, dtype=float64)
Gram-Schmidt:
Note that the transformation from $Q$ to $A$ has triangular structure, since from the $k$-th vector we subtract only the previous ones. It follows from the fact that the product of triangular matrices is a triangular matrix.
Gram-Schmidt can be very unstable (i.e., the produced vectors will be not orthogonal, especially if $q_k$ has small norm).
This is called loss of orthogonality.
There is a remedy, called modified Gram-Schmidt method. Instead of doing
we do it step-by-step. First we set $q_k := a_k$ and orthogonalize sequentially:
$$ q_k := q_k - (q_k, q_1)q_1, \quad q_k := q_{k} - (q_k,q_2)q_2, \ldots $$In exact arithmetic, it is the same. In floating point it is absolutely different!
Note that the complexity is $\mathcal{O}(nm^2)$ operations
and we need to find a certain orthogonal matrix $Q$ that brings a matrix into upper triangular form.
Then,
$$ H_2 H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & * \end{bmatrix}, $$where
$$ H_2 = \begin{bmatrix} 1 & 0 \\ 0 & H'_2, \end{bmatrix} $$and $H'_2$ is a $3 \times 3$ Householder matrix.
Finally,
$$ H_3 H_2 H_1 A = \begin{bmatrix} * & * & * \\ 0 & * & * \\ 0 & 0 & * \\ 0 & 0 & 0 \end{bmatrix}, $$You can try to implement it yourself, it is simple.
In reality, since this is a dense matrix factorization, you should implement the algorithm in terms of blocks (why?).
Instead of using Householder transformation, we use block Householder transformation of the form
where $U^* U = I$.
The QR-decomposition can be also used to compute the (numerical) rank of the matrix, see Rank-Revealing QR Factorizations and the Singular Value Decomposition, Y. P. Hong; C.-T. Pan
It is done via so-called rank-revealing factorization.
It is based on the representation
where $P$ is the permutation matrix (it permutes columns), and $R$ has the block form
$$R = \begin{bmatrix} R_{11} & R_{12} \\ 0 & R_{22}\end{bmatrix}.$$The goal is to find $P$ such that the norm of $R_{22}$ is small, so you can find the numerical rank by looking at it.
An estimate is $\sigma_{r+1} \leq \Vert R_{22} \Vert_2$ (check why).
LU and QR decompositions can be computed using direct methods in finite amount of operations.
What about Schur form and SVD?
They can not be computed by direct methods (why?) they can only be computed by iterative methods.
Although iterative methods still have the same $\mathcal{O}(n^3)$ complexity in floating point arithmetic thanks to fast convergence.
with upper triangular $T$ and unitary $Q$ and this decomposition gives eigenvalues of the matrix (they are on the diagonal of $T$).
The QR algorithm was independently proposed in 1961 by Kublanovskaya and Francis.
Do not **mix** QR algorithm and QR decomposition!
QR decomposition is the representation of a matrix, whereas QR algorithm uses QR decomposition to compute the eigenvalues!
and rewrite it in the form
$$ Q T = A Q. $$On the left we can see QR factorization of the matrix $AQ$.
We can use this to derive fixed-point iteration for the Schur form, also known as QR algorithm.
We can write down the iterative process
$$ Q_{k+1} R_{k+1} = A Q_k, \quad Q_{k+1}^* A = R_{k+1} Q^*_k $$Introduce
$$A_k = Q^* _k A Q_k = Q^*_k Q_{k+1} R_{k+1} = \widehat{Q}_k R_{k+1}$$and the new approximation reads
$$A_{k+1} = Q^*_{k+1} A Q_{k+1} = ( Q_{k+1}^* A = R_{k+1} Q^*_k) = R_{k+1} \widehat{Q}_k.$$So we arrive at the standard form of the QR algorithm.
The final formulas are then written in the classical QRRQ-form:
Iterate until $A_k$ is triangular enough (e.g. norm of subdiagonal part is small enough).
Statement
Matrices $A_k$ are unitary similar to $A$
$$A_k = Q^*_{k-1} A_{k-1} Q_{k-1} = (Q_{k-1} \ldots Q_1)^* A (Q_{k-1} \ldots Q_1)$$and the product of unitary matrices is a unitary matrix.
Complexity of each step is $\mathcal{O}(n^3)$, if a general QR decomposition is done.
Our hope is that $A_k$ will be very close to the triangular matrix for suffiently large $k$.
import jax.numpy as jnp
n = 5
a = [[1.0/(i + j + 0.5) for i in range(n)] for j in range(n)]
niters = 10
for k in range(niters):
q, rmat = jnp.linalg.qr(a)
a = rmat.dot(q)
print('Leading 3x3 block of a:')
print(a[:4, :4])
Leading 3x3 block of a: [[ 2.45414517e+00 1.95145959e-08 6.60861102e-18 5.76974836e-17] [ 1.95145961e-08 4.14159204e-01 2.66526168e-13 7.62304085e-18] [ 9.70054052e-21 2.66524048e-13 2.46972338e-02 -3.25466594e-18] [ 3.24776145e-36 9.72713876e-29 1.32450609e-17 7.05827401e-04]]
The convergence of the QR algorithm is from the largest eigenvalues to the smallest.
At least 2-3 iterations is needed for an eigenvalue.
Each step is one QR factorization and one matrix-by-matrix product, as a result $\mathcal{O}(n^3)$ complexity.
Q: does it mean $\mathcal{O}(n^4)$ complexity totally?
A: fortunately, not.
We can speedup the QR algorithm by using shifts, since $A_k - \lambda I$ has the same Schur vectors.
We will discuss these details later
In the previous lecture, we considered power iteration, which is $A^k v$ -- approximation of the eigenvector.
The QR algorithm computes (implicitly) QR-factorization of the matrix $A^k$:
$$A^k = A \cdot \ldots \cdot A = Q_1 R_1 Q_1 R_1 \ldots = Q_1 Q_2 R_2 Q_2 R_2 \ldots (R_2 R_1) = \ldots (Q_1 Q_2 \ldots Q_k) (R_k \ldots R_1).$$and/or
$$AA^* = U^* \Sigma^2 U$$with QR algorithm, but it is a bad idea (c.f. Gram matrices).
where $C = \mathrm{diag}(c)$ and $S = \mathrm{diag}(s)$ such that $c_i \geq 0, s_i \geq 0$ and $c_i^2 + s_i^2 = 1$
Q: how many SVD do we have inside the CS decomposition?
The case of rectangular matrix with orthonormal columns
The algorithm for computing this decomposition is presented here
This decomposition naturally arises in the problem of finding distances and angles between subspaces
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()