Total recall of the first part of NLA

The main topics that were covered

  • Floating point arithmetic and related issues
  • Main operations with matrices and vectors, their complexity and special cases
  • Matrix decompositions: SVD, skeleton decompositions, LU/Cholecky decomposition, QR, Schur decomposition
  • SVD applications
  • Computing eigenvalues and eigenvectors: QR algorithm and power method
  • How we compute SVD

Singular value decomposition (SVD)

  • Representing matrix in the form
$$ A = U\Sigma V^*,$$
  • $U$ and $V$ are unitary
  • $\Sigma$ is diagonal

  • It exists for any matrix

  • It is used to get the best low rank approximation of the given matrix in Frobenius and spectral norms (Eckart–Young theorem)

Problem 1: solving linear systems

$$ Ax = b $$
  • LU/Cholesky decomposition
  • Least-squares problem

Case 1: square matrix

A solution to the linear system of equations with a square matrix $A$

$$ Ax = b $$

exists, iff

  • $\det A \neq 0$

or

  • matrix $A$ has full rank.

How to solve this problem?

  • Gaussian elimination or LU decomposition

Definition: LU decomposition of the matrix $A$ is representation in the form

$$A = LU,$$

where $L$ – lower triangular and $U$ – upper triangular matrix.

  • Forward and backward steps in Gaussian elimination

    • Forward step

      $$ L y = b.$$

    • Backward step

      $$ U x = y.$$

Does LU decomposition always exist?

  • Complexity is $\mathcal{O}(n^3)$
  • Q: can we reduce it?
  • Problems with stability
  • Strictly regular matrix

Pivoting

  • Any matrix can be factorized with PLU decomposition
$$ A = PLU, $$

where $P$ is a permutation matrix.

  • What about complexity?
  • How to solve linear system with this decomposition?

If matrix is Hermitian and positive definite

  • Hermitian positive definite matrix $A$ is strictly regular (check Sylvester's criterion) and has Cholesky decomposition
$$A = RR^*,$$

where $R$ is lower triangular matrix.

  • Sometimes matrix $R$ is called "square root" of the matrix $A$.

Stability of solving linear system

  • Algorithms in exact and floating-point arithmetics behave differently!
  • Concept of condition number is introduced to estimate this difference in solving linear systems
$$\mathrm{cond}(A) = \Vert A \Vert \Vert A^{-1} \Vert.$$
  • It leads to the following estimate
$$ \frac{\Vert \widehat{x} - x \Vert}{\Vert x \Vert} \leq \mathrm{cond}(A) \Big(\frac{\Vert\Delta A\Vert}{\Vert A \Vert} + \frac{\Vert \Delta f \Vert}{ \Vert f \Vert}\Big),$$

where $(A + \Delta A) \widehat{x} = f + \Delta f.$

Case 2: rectangular matrix

  • the number of rows is larger than the number of columns – overdetermined linear system
    • a solution may not exist
  • the number of rows is smaller than the number of columns – underdetermined linear system
    • a solution is not unique

Least-squares problem

$$\Vert A x - b \Vert^2_2 \rightarrow \min_x$$
  • Geometric interpretation
  • Standard problem: you need to call a proper function from the package and understand its output

Gram matrix and normal equation

  • From the first order optimality condition we get normal equation
$$ A^* A x = A^* b $$

Matrix $A^* A$ is called Gram matrix.

Property of the normal equation:

  • Condition number of the matrix $A^* A$ is square of the condition number of the matrix $A$ (check this fact!).
  • Therefore solving normal equation in this form is not a good idea!

Pseudoinverse matrix

  • If matrix $A$ is rank-deficient, then Gram matrix is singular
  • Therefore, introduce concept of the pseudoinverse matrix $A^{\dagger}$
$$x = A^{\dagger} b.$$
  • Matrix
$$A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^*$$

is called pseudoinverse matrix for the matrix $A$.

Pseudoinverse matrix property

  • If matrix $A$ is full-rank, then $A^* A$ is invertible, and we get
$$A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = (A^* A)^{-1} A^*.$$
  • If matrix $A$ is square and non-singular, then we get
$$ A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = (A^* A)^{-1} A^* = A^{-1} A^{-*} A^* = A^{-1}$$

We have obtained standard inverion of the matrix $A$

  • If $A$ has linear dependent columns, then $A^\dagger b$ gives solution with minimum Euclidean norm - remind about the geometry of the problem!

Computing pseudoinverse through SVD

Let $A = U \Sigma V^*$ be SVD of the matrix $A$. Then,

$$A^{\dagger} = V \Sigma^{\dagger} U^*,$$

where $\Sigma^{\dagger}$ is diagonal matrix such that diagonal entries are inverse of non-zero singular values of the matrix $A$. Indeed,

\begin{align*} A^{\dagger} = \lim_{\alpha \rightarrow 0}(\alpha I + A^* A)^{-1} A^* = \lim_{\alpha \rightarrow 0}( \alpha VV^* + V \Sigma^2 V^*)^{-1} V \Sigma U^* = \lim_{\alpha \rightarrow 0}( V(\alpha I + \Sigma^2) V^*)^{-1} V \Sigma U^* = V \lim_{\alpha \rightarrow 0}(\alpha I + \Sigma^2)^{-1} \Sigma U^* = V \Sigma^{\dagger} U^*, \end{align*}
  • You can check that $\Sigma^{\dagger}$ is constructed with inversion of singular values
  • If some singular values are small, then you can skip them. This gives more stable solution with respect to the noise in the right-hand side.

Q: what about condition number?

To solve the problem we use QR decomposition

Any matrix can be represented in the form

$$ A = Q R, $$

where $Q$ is unitary matrix, and $R$ is upper tringular matrix.

  • Gram-Schmidt orthogonalization is unstable (cf. Problem 2 in PS2)
  • Modified version is more stable
  • Computing with Householder reflections. This transformation zeroes subvector
$$ H \begin{bmatrix} \times \\ \times \\ \times \\ \times \end{bmatrix} = \begin{bmatrix} \times \\ 0 \\ 0 \\ 0 \end{bmatrix}. $$
  • Householder matrix
$$ H = I - 2 vv^*. $$

Complexity of Householder matrix by vector product is $\mathcal{O}(n)$!

  • Complexity of the computing QR decomposition is $\mathcal{O}(n^3)$
  • What is complexity for the matrix $m \times n,$ where $m > n$?
  • Computing with Givens matrix (rotations)
  • Givens matrix in 2D has the form
$$ G = \begin{bmatrix} \cos \alpha & -\sin \alpha \\ \sin \alpha & \cos \alpha \end{bmatrix}, $$
  • It zeroes elements sequentially
  • It is appropriate for sparse matrices and parallel computing

Come back to the least-squares problem

  • If matrix $A$ is full-rank, then
$$ x = A^{\dagger}b = (A^*A)^{-1}A^*b = ((QR)^*(QR))^{-1}(QR)^*b = (R^*Q^*QR)^{-1}R^*Q^*b = R^{-1}Q^*b. $$

  • Thus, we have to solve square linear system
$$ Rx = Q^* b. $$
  • $R$ is upper-triangular
  • The complexity of solving is $\mathcal{O}(n^2)$

Problem 2: eigenvalue problem

$$ Ax = \lambda x, \qquad A = S \Lambda S^{-1} $$
  • Power method
  • QR algorithm

Theoretical approach

  • Characteristic equation
$$ \det(A - \lambda I) = 0 $$
  • Solve it with polynomial root finding procedure
  • It is very unstable operation!
  • Why?

Practical approach: partial eigenvalue problem

  • Power method
  • One iteration reads as
$$ x_{k+1} = A x_k, \quad x_{k+1} := \frac{x_{k+1}}{\Vert x_{k+1} \Vert_2}.$$
  • It converges (not always!) to the maximum by modulus eigenvalue
  • To perform iterations you need only procedure for computing matrix by vector product (cf. Problem 4 in PS2)

Q: what about complexity?

  • Convergence is linear with factor $q = \left|\frac{\lambda_{2}}{\lambda_{1}}\right| < 1$, where $\lambda_1>\lambda_2\geq\dots\geq \lambda_n$.

Krylov subspace

  • Eigenvector computed with power method lies in Krylov subspace $\{x_0, Ax_0,\dots,A^{k}x_0\}$ and has the form $\mu A^k x_0$, where $\mu$ is normalized constant.
  • Further Krylov subspace will be a key tool to solve large-scale problems!
  • More details about why Krylov subpace is so important will be later in the course

Practical approach: full eigenvalue problem

  • We want to find all eigenvalues!
  • Schur decomposition
$$A = UTU^*,$$

where $U$ is unitary, and $T$ is upper triangular.

  • Spectra of $T$ and $A$ are the same since the matrix $U$ is unitary
  • How to compute Schur form?

QR algorithm

  1. Initialize $A_0 = A$.
  2. Compute QR decomposition of the matrix $A_k$: $A_k = Q_k R_k$.
  3. Update approximation $A_{k+1} = R_k Q_k$.
  • How to check convergence?
  • What about complexity?
  • How to find matrices $U$ and $T$?

Converegence and acceleration

  • Accelerate QR algorithm up to $\mathcal{O}(n^3)$
  • Use property of the matrix in upper-hessenberg form
  • Matrix $A$ is in upper-hessenberg form, if
$$a_{ij} = 0, \quad \mbox{if } i \geq j+2.$$

or

$$A = \begin{bmatrix} * & * & * & * & * \\ * & * & * & * & * \\ 0 & * & * & * & *\\ 0 & 0 & * & * & *\\ 0 & 0 & 0 & * & * \\ \end{bmatrix}.$$

Where are we moving further?

  • Problems are the same, but the scale is much larger!
  • Exploiting the structure of matrices
    • sparsity
    • low rank
    • Toeplitz and circulant matrices
  • Iterative methods for
    • solving linear systems
    • computing eigenvalues and eigenvectors
    • computing matrix functions
  • Tensor decompoisitions and their applications

Important dates

  • November, 21 –– midterm
  • November, 24 –– deadline for PS2
  • November, 24 –– deadline for submiting project proposals

Plans for next classes

  • Tomorrow Q&A session
  • Midterm is on Thursday
  • Next regular lecture is on Monday