Lecture 6: Eigenvalues and eigenvectors

Recap of the previous lecture

Today lecture

Today we will talk about:

What is an eigenvector?

$$ Ax = \lambda x. $$ $$ \det (A - \lambda I) = 0.$$

Eigendecomposition

If matrix $A$ of size $n\times n$ has $n$ eigenvectors $s_i$, $i=1,\dots,n$:

$$ As_i = \lambda_i s_i, $$

then this can be written as

$$ A S = S \Lambda, \quad\text{where}\quad S=(s_1,\dots,s_n), \quad \Lambda = \text{diag}(\lambda_1, \dots, \lambda_n), $$

or equivalently

$$ A = S\Lambda S^{-1}. $$

Existence

$$AA^* = A^* A.$$

Example

$$A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$$

has one eigenvalue $1$ of multiplicity $2$ (since its characteristic polynomial is $p(\lambda)=(1-\lambda)^2$), but only one eigenvector $\begin{pmatrix} c \\ 0 \end{pmatrix}$ and hence the matrix is not diagonalizable.

Why eigenvectors and eigenvalues are important?

Can you give some examples?

Applications of eigenvalues/eigenvectors

Eigenvalues are vibrational frequencies

A typical computation of eigenvectors / eigenvectors is for studying

Google PageRank

$$ p_i = \sum_{j \in N(i)} \frac{p_j}{L(j)}, $$

where $L(j)$ is the number of outgoing links on the $j$-th page, $N(i)$ are all the neighbours of the $i$-th page. It can be rewritten as

$$ p = G p, \quad G_{ij} = \frac{1}{L(j)} $$

or as an eigenvalue problem

$$ Gp = 1 p, $$

i.e. the eigenvalue $1$ is already known. Note that $G$ is left stochastic, i.e. its columns sum up to $1$. Check that any left stochastic matrix has maximum eigenvalue equal to $1$.

Demo

conda install networkx

Computations of eigenvalues

There are two types of eigenproblems:

Computation of the eigenvalues via characteristic equations

The eigenvalue problem has the form

$$ Ax = \lambda x, $$

or

$$ (A - \lambda I) x = 0, $$

therefore matrix $A - \lambda I$ has non-trivial kernel and should be singular.

That means, that the determinant

$$ p(\lambda) = \det(A - \lambda I) = 0. $$

Recall the determinant

The determinant of a square matrix $A$ is defined as

$$\det A = \sum_{\sigma \in S_n} \mathrm{sgn}({\sigma})\prod^n_{i=1} a_{i, \sigma_i},$$

where

Properties of determinant

Determinant has many nice properties:

1. $\det(AB) = \det(A) \det(B)$

2. If we have one row as a sum of two vectors, determinant is a sum of two determinants

3. "Minor expansion": we can expand determinant through a selected row or column.

Eigenvalues and characteristic equation

$$p(\lambda) = \det(A - \lambda I)$$
  1. Compute coefficients of the polynomial
  2. Compute the roots

Is this a good idea?

Give your feedback

We can do a short demo of this

Morale

$$h_{ij} = \int_0^1 x^i x^j\, dx = \frac{1}{i+j+1},$$

is the Hilbert matrix, which has exponential decay of singular values.

Gershgorin circles

$$r_i = \sum_{j \ne i} |a_{ij}|.$$

Proof

First, we need to show that if the matrix $A$ is strictly diagonally dominant, i.e.

$$ |a_{ii}| > \sum_{j \ne i} |a_{ij}|, $$

then such matrix is non-singular.

We separate the diagonal part and off-diagonal part, and get

$$ A = D + S = D( I + D^{-1}S), $$

and $\Vert D^{-1} S\Vert_1 < 1$. Therefore, by using the Neumann series, the matrix $I + D^{-1}S$ is invertible and hence $A$ is invertible.

Now the proof follows by contradiction:

A short demo

Note: There are more complicated figures, like Cassini ovals, that include the spectrum

$$ _{ij} = \{z\in\mathbb{C}: |a_{ii} - z|\cdot |a_{jj} - z|\leq r_i r_j\}, \quad r_i = \sum_{l\not= i} |a_{il}|. $$

Power method

Power method

$$Ax = \lambda x, \quad \Vert x \Vert_2 = 1 \ \text{for stability}.$$

can be rewritten as a fixed-point iteration.

Power method has the form

$$ x_{k+1} = A x_k, \quad x_{k+1} := \frac{x_{k+1}}{\Vert x_{k+1} \Vert_2}$$

and

$$ x_{k+1}\to v_1,$$

where $Av_1 = \lambda_1 v_1$ and $\lambda_1$ is the largest eigenvalue and $v_1$ is the corresponding eigenvector.

$$ \lambda^{(k+1)} = (Ax_{k+1}, x_{k+1}), $$

Convergence analysis for $A=A^*$

Let's have a more precise look at the power method when $A$ is Hermitian. In two slides you will learn that every Hermitian matrix is diagonalizable. Therefore, there exists orthonormal basis of eigenvectors $v_1,\dots,v_n$ such that $Av_i = \lambda_i v_i$. Let us decompose $x_0$ into a sum of $v_i$ with coefficients $c_i$:

$$ x_0 = c_1 v_1 + \dots + c_n v_n. $$

Since $v_i$ are eigenvectors, we have

$$ \begin{split} x_1 &= \frac{Ax_0}{\|Ax_0\|} = \frac{c_1 \lambda_1 v_1 + \dots + c_n \lambda_n v_n}{\|c_1 \lambda_1 v_1 + \dots + c_n \lambda_n v_n \|} \\ &\vdots\\ x_k &= \frac{Ax_{k-1}}{\|Ax_{k-1}\|} = \frac{c_1 \lambda_1^k v_1 + \dots + c_n \lambda_n^k v_n}{\|c_1 \lambda_1^k v_1 + \dots + c_n \lambda_n^k v_n \|} \end{split} $$

Now you see, that

$$ x_k = \frac{c_1}{|c_1|}\left(\frac{\lambda_1}{|\lambda_1|}\right)^k\frac{ v_1 + \frac{c_2}{c_1}\frac{\lambda_2^k}{\lambda_1^k}v_2 + \dots + \frac{c_n}{c_1}\frac{\lambda_n^k}{\lambda_1^k}v_n}{\left\|v_1 + \frac{c_2}{c_1}\frac{\lambda_2^k}{\lambda_1^k}v_2 + \dots + \frac{c_n}{c_1}\frac{\lambda_n^k}{\lambda_1^k}v_n\right\|}, $$

which converges to $v_1$ since $\left| \frac{c_1}{|c_1|}\left(\frac{\lambda_1}{|\lambda_1|}\right)^k\right| = 1$ and $\left(\frac{\lambda_2}{\lambda_1}\right)^k \to 0$ if $|\lambda_2|<|\lambda_1|$.

Things to remember about the power method

Matrix decomposition: the Schur form

There is one class of matrices when eigenvalues can be found easily: triangular matrices

$$ A = \begin{pmatrix} \lambda_1 & * & * \\ 0 & \lambda_2 & * \\ 0 & 0 & \lambda_3 \\ \end{pmatrix}. $$

The eigenvalues of $A$ are $\lambda_1, \lambda_2, \lambda_3$. Why?

Because the determinant is

$$ \det(A - \lambda I) = (\lambda - \lambda_1) (\lambda - \lambda_2) (\lambda - \lambda_3). $$
$$ \det(A - \lambda I) = \det(U (U^* A U - \lambda I) U^*) = \det(UU^*) \det(U^* A U - \lambda I) = \det(U^* A U - \lambda I), $$

where we have used the famous multiplicativity property of the determinant, $\det(AB) = \det(A) \det(B)$.

$$ A = U T U^*. $$

This is the celebrated Schur decomposition.

Schur theorem

Theorem: Every $A \in \mathbb{C}^{n \times n}$ matrix can be represented in the Schur form $A = UTU^*$, where $U$ is unitary and $T$ is upper triangular.

Sketch of the proof.

  1. Every matrix has at least $1$ non-zero eigenvector (take a root of characteristic polynomial, $(A-\lambda I)$ is singular, has non-trivial nullspace). Let
$$Av_1 = \lambda_1 v_1, \quad \Vert v_1 \Vert_2 = 1$$
  1. Let $U_1 = [v_1,v_2,\dots,v_n]$, where $v_2,\dots, v_n$ are any vectors othogonal to $v_1$. Then
$$ U^*_1 A U_1 = \begin{pmatrix} \lambda_1 & * \\ 0 & A_2 \end{pmatrix}, $$

where $A_2$ is an $(n-1) \times (n-1)$ matrix. This is called block triangular form. We can now work with $A_2$ only and so on.

Note: Since we need eigenvectors in this proof, this proof is not a practical algorithm.

Application of the Schur theorem

$$ AA^* = A^* A. $$

Q: Examples of normal matrices?

Examples: Hermitian matrices, unitary matrices.

Normal matrices

Theorem: $A$ is a normal matrix, iff $A = U \Lambda U^*$, where $U$ is unitary and $\Lambda$ is diagonal.

Sketch of the proof:

Important consequence

Therefore, every normal matrix is unitary diagonalizable, which means that it can be diagonalized by unitary matrix $U$.

In other words every normal matrix has orthogonal basis of eigenvectors.

How we compute the Schur decomposition?

Variational principle for eigenvalues

$$R_A(x) = \frac{(Ax, x)}{(x, x)},$$

and the maximal eigenvalue is the maximum of $R_A(x)$, and the minimal eigenvalue is the minimal of $R_A(x)$.

Now, "advanced" concept.

Spectrum and pseudospectrum

Pseudospectrum

We consider the union of all possible eigenvalues of all perturbations of the matrix $A$.

$$\Lambda_{\epsilon}(A) = \{ \lambda \in \mathbb{C}: \exists E, x \ne 0: (A + E) x = \lambda x, \quad \Vert E \Vert_2 \leq \epsilon. \}$$

Summary of todays lecture

Next lecture

Questions?