Lecture 15: Matrix functions and matrix equations

Previous part

Today lecture

Outline of this part

Book to read: Functions of matrices by Nick Higham

The simplest matrix function: matrix polynomial

It is very easy to define a matrix polynomial as

$$ P(A) = \sum_{k=0}^n c_k A^k. $$

Side-note: Hamilton-Cayley theorem states that $F(A) = 0$ where $F(\lambda) = \det(A - \lambda I)$, thus all matrix polynomials have degree $\leq n-1$.

Matrix polynomials as building blocks

We can define a function of the matrix by Taylor series:

$$ f(A) = \sum_{k=0}^{\infty} c_k A^k. $$

The convergence is understood as the convergence in some matrix norm.

Example of such series is the Neumann series

$$ (I - F)^{-1} = \sum_{k=0}^{\infty} F^k, $$

which is well defined for $\rho(F) < 1$.

Matrix exponential series

The most well-known matrix function is matrix exponential. In the scalar case,

$$ e^x = 1 + x + \frac{x^2}{2} + \frac{x^3}{6} + \ldots = \sum_{k=0}^{\infty} \frac{x^k}{k!}, $$

and it directly translates to the matrix case:

$$ e^A = \sum_{k=0}^{\infty} \frac{A^k}{k!}, $$

the series that always converges, because the series

$$\sum_{k=0}^{\infty} \frac{\Vert A \Vert^k}{k!} = e^{\Vert A \Vert}.$$

Why matrix exponential is important

A lot of practical problems are reduced to a system of linear ODEs of the form

$$ \frac{dy}{dt} = Ay, \quad y(0) = y_0. $$

ODE and matrix exponentials

$$\frac{dy}{dt} = Ay, \quad y(0) = y_0$$ $$\frac{d}{dt} e^{At} = \frac{d}{dt} \sum_{k=0}^{\infty} \frac{t^k A^k}{k!} = \sum_{k=1}^{\infty} \frac{t^{k-1} A^{k}}{(k-1)!} = A e^{At}.$$

Sidenote: matrix exponential and time stepping

Matrix exponential can be much better than solving using, say, Euler scheme:

$$\frac{dy}{dt} \approx \frac{y_{k+1} - y_k}{\tau} = A y_k, \quad y_{k+1} = y_k + \tau A y_k,$$

if we know how to compute the product of the matrix exponential by vector using only matrix-by-vector product.

For dense matrices matrix exponential also provides exact answer to the ODE for any $t$, compared to the approximation by time-stepping schemes.

How to compute matrix functions, including exponential?

$$ A = S \Lambda S^{-1}, $$

where the columns of $S$ are eigenvectors of the matrix $A$, then

$$ F(A) = S F(\Lambda) S^{-1}. $$

Problem: diagonalization can be unstable! (and not every matrix is diagonalizable)

Let us look how matrices are diagonalizable:

Now we can compute a function for perturbed Jordan block.

How funm function works

Schur-Parlett algorithm

$$ A = U T U^*. $$

Computing functions of triangular matrices

We know values on the diagonals

$$ F_{ii} = F(T_{ii}), $$

and also we know that

$$ F T = T F $$

the matrix function commutes with the matrix itself. The function of a triangular matrix is a triangular matrix as well. Using the known values on the diagonal and the commutativity property, we get the diagonals of the matrix one-by-one:

$$f_{ij} = t_{ij} \frac{f_{ii} - f_{jj}}{t_{ii} - t_{jj}} + \sum_{k=i+1}^{j-1} \frac{f_{ik} t_{kj} - t_{ik}f_{kj}}{t_{ii} - t_{jj}}.$$

Matrix functions: definition

$$ f(A) = \int_{\Gamma} f(z) (zI - A)^{-1} dz, $$

]where $f(z)$ is analytic on and inside a closed contour $\Gamma$ that encloses the spectrum of $A$.

Important matrix functions

Matrix exponential

$$e^A = I + A + \frac{1}{2} A^2 + \frac{1}{3!} A^3 + \ldots$$

Series convergence

Method 1: Krylov method

$$ f(A)v \approx f(Q H Q^*)v = Q f(H) Q^*v,$$

where $H$ is a small upper Hessenberg matrix, for which we can use, for example, the Schur-Parlett algorithm.

Pade approximations

$$ \exp(x) \approx \frac{p(x)}{q(x)}, $$

where $p(x)$ and $q(x)$ are polynomials and computation of a rational function of a matrix is reduced to matrix-matrix products and matrix inversions.

Scaling & squaring algorithm

The "canonical algorithm" for the computation of the matrix exponential also relies on scaling of the matrix $A:$

$$\exp(A) = \exp(A/2^k)^{(2^k)}.$$

The matrix then can have a small norm, thus:

Large-scale matrix exponentials

Rational Krylov subspaces

The simplest (yet efficient) approach is based on the so-called extended Krylov subspaces:

$$KE(A, b) = \mathrm{Span}(\ldots, A^{-2} b, A^{-1} b, b, A b, A^2 b, \ldots)$$

At each step you add a vector of the form $A w$ and $A^{-1} w$ to the subspace, and orthogonalize the result (rational Arnoldi method).

I.e., we need only linear system solver for one step, but since the matrix $A$ is fixed, we can factorize it once

Rational Krylov methods

Rational Krylov methods are the most efficient for the computation of matrix functions:

$$KE(A, b) = \mathrm{Span}(\ldots, A^{-2} b, A^{-1} b, b, A b, A^2 b, \ldots)$$ $$f(A)b \approx Q f(H) Q^*b,$$

where $H = Q^* A Q.$

It requires one solver and matrix-by-vector product at each step.

Application to compute distance between manifolds

$$\mathrm{hkt}_{\mathcal{M}}(t) = \mathrm{trace}(\exp(-t L_{\mathcal{M}}))$$

contains all information about graph's spectrum

$$d_{GW}(\mathcal{M}, \mathcal{N}) \geq \sup_{t > 0} \exp(-2(t + t^{-1}))|\mathrm{hkt}_{\mathcal{M}}(t) - \mathrm{hkt}_{\mathcal{N}}(t)|$$

Stochastic trace estimator

$$ \mathrm{trace}(A) = \mathbb{E}_{p(x)}(x^{\top}Ax), $$

where $p(x)$ is distribution with zero mean and unit variance, e.g. Rademacher or standard normal distributions

Distances between languages (original paper)

Other matrix functions

Now, let us briefly talk about other matrix functions:

Sign function

$$\mathrm{sign}(x) = \begin{cases} 1, \quad x > 0, \\ -1, \quad x < 0. \end{cases}$$ $$P = \frac{(I + \mathrm{sign}(A))}{2}$$

is a projector onto the subspace spanned by all positive eigenvalues.

How to compute sign function?

$$X_{k+1} = \frac{1}{2} (X_k + X^{-1}_k), X_0 = \alpha A.$$ $$X_{k+1} = \frac{1}{2} X_k (3 I - X_k), \quad X_0 = \alpha A.$$

Matrix sign function: applications

$$A^* X + X A - X R X + G = 0,$$

which arises in optimal control and stochastic control.

$$ C = \begin{bmatrix} A^* & G \\ R & -A \end{bmatrix}. $$

Inverse square root of the matrix

$$e^{A^{-1} x, x}.$$ $$x = A^{-\frac{1}{2}} y.$$

Matrix equations

$$F(X) = G, \quad X \in \mathbb{R}^{n \times m}$$

is called matrix equation.

Two important matrix equations

We will discuss two matrix equations:

$$ A X + X B = C,$$

where $A$ and $B$ are given, and its special case, continious Lyapunov equation,

$$ A X + XA^{\top} = C,$$

and

$$A X A^* - X = C. $$

Application of the Lyapunov equation

$$\frac{dy}{dt} = Ay, \quad y(0) = y_0,$$$$y(t) \rightarrow 0$$

for $t \rightarrow \infty$.

$$A P + P A^* = Q.$$

Application to model order reduction

Model order reduction of linear time-invariant systems:

$$\frac{dx}{dt} = Ax + Bu, \quad y = C x,$$

where $x$ is state, $u$ is control, and $y$ is the observable. We want to approximate it by a smaller-dimensional linear system

$$ \frac{d\widehat{x}}{dt} = \widehat{A} \widehat{x} + \widehat{B} u, \quad y = \widehat{C} \widehat{x}, $$

in such a way that the output of the reduced system is close to the output of the original (big one).

The optimal $\widehat{A}, \widehat{B}, \widehat{C}$ can be recovered from the solution of the auxiliary Lyaupunov equations.

Solution of the Sylvester equation

$$ A X + X B = C,$$

Kronecker product

A Kronecker product of two matrices $A \in \mathbb{R}^{n_1 \times m_1}, \quad B \in \mathbb{R}^{n_2 \times m_2}$ is a matrix $C$ of size $(n_1 n_2) \times (m_1 m_2)$.

Of the block form

$$A \otimes B = [a_{ij} B].$$

Main property of the Kronecker product and vec

We have

$$\mathrm{vec}(A X B^{\top}) = (B \otimes A) \mathrm{vec}(X).$$

Rewriting the Sylvester equation

$$\mathrm{vec}(A X B^{\top}) = (B \otimes A) \mathrm{vec}(X).$$ $$ A X + X B = C $$

in the form

$$\mathrm{vec}(AX + X B) = (I \otimes A + B^{\top} \otimes I) \mathrm{vec}(X) = \mathrm{vec}(C).$$ $$(I \otimes A + B^{\top} \otimes I)$$

Solving Sylvester equation: Bartels-Stewart method

$$(I \otimes A + B^{\top} \otimes I) x = c.$$

Let us compute Schur decomposition of $A$ and $B$:

$$A = Q_A T_A Q^*_A, \quad B^{\top} = Q_B T_B Q^*_B.$$

Then, we have

$$(I \otimes A + B^{\top} \otimes I) = (I \otimes ( Q_A T_A Q^*_A ) + (Q_B T_B Q^*_B \otimes I) = (Q_B \otimes Q_A) ( I \otimes T_A + T_B \otimes I) (Q^* _B \otimes Q^*_A). $$

We have

$$(Q_B \otimes Q_A)^{-1} = Q^*_B \otimes Q^*_A,$$

thus we only need to solve an auxiliary linear system with the matrix

$$I \otimes T_A + T_B \otimes I.$$

Note, that if $A$ and $B$ are Hermitian, then $T_A$ and $T_B$ are diagonal, and this matrix is diagonal!

Solving a final system

We have the system

$$(I \otimes T_A + T_B \otimes I) z = g,$$

in the matrix form:

$$T_A Z + Z T^{\top}_B = G.$$

Then we just write the equation elementwise and see that the equations are solved successively for $Z_{11}, Z_{21}, \ldots, $.

Take home message

Plan for the next class