Total recall of the first part of NLA¶

The main topics that were covered¶

Floating point arithmetic and related issues
Main operations with matrices and vectors, their complexity and special cases
Matrix decompositions: SVD, skeleton decompositions, LU/Cholecky decomposition, QR, Schur decomposition
SVD applications
Computing eigenvalues and eigenvectors: QR algorithm and power method
How we compute SVD

Singular value decomposition (SVD)¶

Representing matrix in the form

$$ A = U\Sigma V^*,$$

$U$ and $V$ are unitary
$\Sigma$ is diagonal
It exists for any matrix
It is used to get the best low rank approximation of the given matrix in Frobenius and spectral norms (Eckart–Young theorem)

Problem 1: solving linear systems¶

$$ Ax = b $$

LU/Cholesky decomposition
Least-squares problem

Case 1: square matrix¶

A solution to the linear system of equations with a square matrix $A$

$$ Ax = b $$

exists, iff

$\det A \neq 0$

matrix $A$ has full rank.

How to solve this problem?¶

Gaussian elimination or LU decomposition

Definition: LU decomposition of the matrix $A$ is representation in the form

$$A = LU,$$

where $L$ – lower triangular and $U$ – upper triangular matrix.

Forward and backward steps in Gaussian elimination
- Forward step
  
  $$ L y = b.$$
- Backward step
  
  $$ U x = y.$$

Does LU decomposition always exist?

Complexity is $\mathcal{O}(n^3)$
Q: can we reduce it?
Problems with stability
Strictly regular matrix

Pivoting¶

Any matrix can be factorized with PLU decomposition

$$ A = PLU, $$

where $P$ is a permutation matrix.

What about complexity?
How to solve linear system with this decomposition?

If matrix is Hermitian and positive definite¶

Hermitian positive definite matrix $A$ is strictly regular (check Sylvester's criterion) and has Cholesky decomposition

$$A = RR^*,$$

where $R$ is lower triangular matrix.

Sometimes matrix $R$ is called "square root" of the matrix $A$.

Stability of solving linear system¶

Algorithms in exact and floating-point arithmetics behave differently!
Concept of condition number is introduced to estimate this difference in solving linear systems

$$\mathrm{cond}(A) = \Vert A \Vert \Vert A^{-1} \Vert.$$

It leads to the following estimate

$$ \frac{\Vert \widehat{x} - x \Vert}{\Vert x \Vert} \leq \mathrm{cond}(A) \Big(\frac{\Vert\Delta A\Vert}{\Vert A \Vert} + \frac{\Vert \Delta f \Vert}{ \Vert f \Vert}\Big),$$

where $(A + \Delta A) \widehat{x} = f + \Delta f.$

Case 2: rectangular matrix¶

the number of rows is larger than the number of columns – overdetermined linear system
- a solution may not exist
the number of rows is smaller than the number of columns – underdetermined linear system
- a solution is not unique

Least-squares problem¶

$$\Vert A x - b \Vert^2_2 \rightarrow \min_x$$

Geometric interpretation
Standard problem: you need to call a proper function from the package and understand its output

Gram matrix and normal equation¶

From the first order optimality condition we get normal equation

$$ A^* A x = A^* b $$

Matrix $A^* A$ is called Gram matrix.

Property of the normal equation:

Condition number of the matrix $A^* A$ is square of the condition number of the matrix $A$ (check this fact!).
Therefore solving normal equation in this form is not a good idea!

Pseudoinverse matrix¶

If matrix $A$ is rank-deficient, then Gram matrix is singular
Therefore, introduce concept of the pseudoinverse matrix $A^{\dagger}$

$$x = A^{\dagger} b.$$

Matrix

$$A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^*$$

is called pseudoinverse matrix for the matrix $A$.

Pseudoinverse matrix property¶

If matrix $A$ is full-rank, then $A^* A$ is invertible, and we get

$$A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = (A^* A)^{-1} A^*.$$

If matrix $A$ is square and non-singular, then we get

$$ A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = (A^* A)^{-1} A^* = A^{-1} A^{-*} A^* = A^{-1}$$

We have obtained standard inverion of the matrix $A$

If $A$ has linear dependent columns, then $A^\dagger b$ gives solution with minimum Euclidean norm - remind about the geometry of the problem!

Computing pseudoinverse through SVD¶

Let $A = U \Sigma V^*$ be SVD of the matrix $A$. Then,

$$A^{\dagger} = V \Sigma^{\dagger} U^*,$$

where $\Sigma^{\dagger}$ is diagonal matrix such that diagonal entries are inverse of non-zero singular values of the matrix $A$. Indeed,

\begin{align*} A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = \lim_{\alpha \rightarrow 0}( \alpha VV^* + V \Sigma^2 V^*)^{-1} V \Sigma U^* = \lim_{\alpha \rightarrow 0}( V(\alpha I + \Sigma^2) V^*)^{-1} V \Sigma U^* = V \lim_{\alpha \rightarrow 0}(\alpha I + \Sigma^2)^{-1} \Sigma U^* = V \Sigma^{\dagger} U^*, \end{align*}

You can check that $\Sigma^{\dagger}$ is constructed with inversion of singular values
If some singular values are small, then you can skip them. This gives more stable solution with respect to the noise in the right-hand side.

Q: what about condition number?

To solve the problem we use QR decomposition¶

Any matrix can be represented in the form

$$ A = Q R, $$

where $Q$ is unitary matrix, and $R$ is upper tringular matrix.

Gram-Schmidt orthogonalization is unstable (cf. Problem 2 in PS2)
Modified version is more stable

Computing with Householder reflections. This transformation zeroes subvector

$$ H \begin{bmatrix} \times \\ \times \\ \times \\ \times \end{bmatrix} = \begin{bmatrix} \times \\ 0 \\ 0 \\ 0 \end{bmatrix}. $$

Householder matrix

$$ H = I - 2 vv^*. $$

Complexity of Householder matrix by vector product is $\mathcal{O}(n)$!

Complexity of the computing QR decomposition is $\mathcal{O}(n^3)$
What is complexity for the matrix $m \times n,$ where $m > n$?

Computing with Givens matrix (rotations)
Givens matrix in 2D has the form

$$ G = \begin{bmatrix} \cos \alpha & -\sin \alpha \\ \sin \alpha & \cos \alpha \end{bmatrix}, $$

It zeroes elements sequentially
It is appropriate for sparse matrices and parallel computing

Come back to the least-squares problem¶

If matrix $A$ is full-rank, then

$$ x = A^{\dagger}b = (A^*A)^{-1}A^*b = ((QR)^*(QR))^{-1}(QR)^*b = (R^*Q^*QR)^{-1}R^*Q^*b = R^{-1}Q^*b. $$

Thus, we have to solve square linear system

$$ Rx = Q^* b. $$

$R$ is upper-triangular
The complexity of solving is $\mathcal{O}(n^2)$

Problem 2: eigenvalue problem¶

$$ Ax = \lambda x, \qquad A = S \Lambda S^{-1} $$

Power method
QR algorithm

Theoretical approach¶

Characteristic equation

$$ \det(A - \lambda I) = 0 $$

Solve it with polynomial root finding procedure
It is very unstable operation!
Why?

Practical approach: partial eigenvalue problem¶

Power method
One iteration reads as

$$ x_{k+1} = A x_k, \quad x_{k+1} := \frac{x_{k+1}}{\Vert x_{k+1} \Vert_2}.$$

It converges (not always!) to the maximum by modulus eigenvalue
To perform iterations you need only procedure for computing matrix by vector product (cf. Problem 4 in PS2)

Q: what about complexity?

Convergence is linear with factor $q = \left|\frac{\lambda_{2}}{\lambda_{1}}\right| < 1$, where $\lambda_1>\lambda_2\geq\dots\geq \lambda_n$.

Krylov subspace¶

Eigenvector computed with power method lies in Krylov subspace $\{x_0, Ax_0,\dots,A^{k}x_0\}$ and has the form $\mu A^k x_0$, where $\mu$ is normalized constant.
Further Krylov subspace will be a key tool to solve large-scale problems!
More details about why Krylov subpace is so important will be later in the course

Practical approach: full eigenvalue problem¶

We want to find all eigenvalues!
Schur decomposition

$$A = UTU^*,$$

where $U$ is unitary, and $T$ is upper triangular.

Spectra of $T$ and $A$ are the same since the matrix $U$ is unitary
How to compute Schur form?

QR algorithm¶

Initialize $A_0 = A$.
Compute QR decomposition of the matrix $A_k$: $A_k = Q_k R_k$.
Update approximation $A_{k+1} = R_k Q_k$.

How to check convergence?
What about complexity?
How to find matrices $U$ and $T$?

Converegence and acceleration¶

Accelerate QR algorithm up to $\mathcal{O}(n^3)$
Use property of the matrix in upper-hessenberg form
Matrix $A$ is in upper-hessenberg form, if

$$a_{ij} = 0, \quad \mbox{if } i \geq j+2.$$

$$A = \begin{bmatrix} * & * & * & * & * \\ * & * & * & * & * \\ 0 & * & * & * & *\\ 0 & 0 & * & * & *\\ 0 & 0 & 0 & * & * \\ \end{bmatrix}.$$

Where are we moving further?¶

Problems are the same, but the scale is much larger!
Exploiting the structure of matrices
- sparsity
- low rank
- Toeplitz and circulant matrices
Iterative methods for
- solving linear systems
- computing eigenvalues and eigenvectors
- computing matrix functions
Tensor decompoisitions and their applications

Important dates¶

November, 21 –– midterm
November, 24 –– deadline for PS2
November, 24 –– deadline for submiting project proposals

Plans for next classes¶

Tomorrow Q&A session
Midterm is on Thursday
Next regular lecture is on Monday