Lecture 17: Structured matrices, FFT, convolutions, Toeplitz matrices

Previous lecture

Other structured matrices

They are directly connected to the convolution operation and Fast Fourier Transform.

Convolution

$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$

Convolution theorem and Fourier transform

A well-known fact: a convolution in the time domain is a product in the frequency domain.

$$\widehat{x}(w) = (\mathcal{F}(x))(w) = \int_{-\infty}^{\infty} e^{i w t} x(t) dt.$$ $$\mathcal{F}(x * y) = \mathcal{F}(x) \mathcal{F}(y).$$
  1. Compute Fourier transform of $x(t)$ and $y(t)$.
  2. Compute their product
  3. Compute inverse Fourier transform

Discrete convolution operation

$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$

Let us approximate the integral by a quadrature sum on a uniform grid, and store the signal at equidistant points.

Then we are left with the summation

$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j},$$

which is called discrete convolution. This can be thought as an application of a filter with coefficients $x$ to a signal $y$.

There are different possible filters for different purposes, but they all utilize the shift-invariant structure.

Discrete convolution and Toeplitz matrices

A discrete convolution can be thought as a matrix-by-vector product:

$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j}, \Leftrightarrow z = Ax$$

where the matrix $A$ elements are given as $a_{ij} = y_{i-j}$, i.e., they depend only on the difference between the row index and the column index.

Toeplitz matrices: definition

A matrix is called Toeplitz if its elements are defined as

$$a_{ij} = t_{i - j}.$$

Toeplitz and circulant matrix

$$C_{ij} = c_{i - j \mod n},$$

i.e. it periodicaly wraps

$$C = \begin{bmatrix} c_0 & c_3 & c_2 & c_1 \\ c_1 & c_0 & c_3 & c_2 \\ c_2 & c_1 & c_0 & c_3 \\ c_3 & c_2 & c_1 & c_0 \\ \end{bmatrix}. $$

Spectral theorem for circulant matrices

Theorem:

Any circulant matrix can be represented in the form

$$C = \frac{1}{n} F^* \Lambda F,$$

where $F$ is the Fourier matrix with the elements

$$F_{kl} = w_n^{kl}, \quad k, l = 0, \ldots, n-1, \quad w_n = e^{-\frac{2 \pi i}{n}},$$

and matrix $\Lambda = \text{diag}(\lambda)$ is the diagonal matrix and

$$\lambda = F c, $$

where $c$ is the first column of the circulant matrix $C$.

The proof will be later: now we need to study the FFT matrix.

Fourier matrix

The Fourier matrix is defined as:

$$ F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ \dots & \dots & \dots &\dots &\dots \\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$

or equivalently

$$ F_n = \{ w_n^{kl} \}_{k,l=0}^{n-1}, $$

where

$$w_n = e^{-\frac{2\pi i}{n}}.$$

Properties:

Fast Fourier transform (FFT)

Here we consider a matrix interpretation of the standard Cooley-Tukey algorithm (1965), which has underlying divide and conquer idea. Note that in packages more advanced versions are used.

$$ P_n = \begin{pmatrix} 1 & 0 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 1 & 0 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 1 & 0 \\ \hline 0 & 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 0 & 1 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 0 & 1 \end{pmatrix}, $$

Hence,

$$ P_n F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ 1 & w^{4\cdot 1}_n & w^{4\cdot 2}_n & \dots & w^{4\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-2)\cdot 1}_n & w^{(n-2)\cdot 2}_n & \dots & w^{(n-2)\cdot (n-1)}_n\\ \hline 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{3\cdot 1}_n & w^{3\cdot 2}_n & \dots & w^{3\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$

Now let us imagine that we separated its columns and rows by two parts each of size $n/2$.

As a result we get $2\times 2$ block matrix that has the following form

$$ P_n F_n = \begin{pmatrix} \left\{w^{2kl}_n\right\} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix}, \quad k,l = 0,\dots, \frac{n}{2}-1. $$

So far it does not look like something that works faster :) But we will see that in a minute. Lets have a more precise look at the first block $\left\{w^{2kl}_n\right\}$:

$$ w^{2kl}_n = e^{-2kl\frac{2\pi i}{n}} = e^{-kl\frac{2\pi i}{n/2}} = w^{kl}_{n/2}. $$

So this block is exactly twice smaller Fourier matrix $F_{n/2}$!

<!--- Now we can write

$$ \begin{pmatrix} F_{n/2} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix} $$

--> The block $\left\{w_n^{(2k+1)l}\right\}$ can be written as

$$ w_n^{(2k+1)l} = w_n^{2kl + l} = w_n^{l} w_n^{2kl} = w_n^{l} w_{n/2}^{kl}, $$

which can be written as $W_{n/2}F_{n/2}$, where

$$W_{n/2} = \text{diag}(1,w_n,w_n^2,\dots,w_n^{n/2-1}).$$

Doing the same tricks for the other blocks we will finally get

$$ P_n F_n = \begin{pmatrix} F_{n/2} & F_{n/2} \\ F_{n/2}W_{n/2} & -F_{n/2}W_{n/2} \end{pmatrix} = \begin{pmatrix} F_{n/2} & 0 \\ 0 & F_{n/2} \end{pmatrix} \begin{pmatrix} I_{n/2} & I_{n/2} \\ W_{n/2} & -W_{n/2} \end{pmatrix}. $$

Circulant matrices

FFT helps to multiply fast by certain types of matrices. We start from a circulant matrix:

$$ C = \begin{pmatrix} c_0 & c_{n-1} & c_{n-2} & \dots & c_1 \\ c_{1} & c_{0} & c_{n-1} & \dots & c_2 \\ c_{2} & c_{1} & c_0 & \dots & c_3 \\ \dots & \dots & \dots & \dots & \dots \\ c_{n-1} & c_{n-2} & c_{n-3} & \dots & c_0 \end{pmatrix} $$

Theorem. Let $C$ be a circulant matrix of size $n\times n$ and let $c$ be it's first column , then

$$ C = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n $$

Proof.

$$\lambda (\omega) = c_0 + \omega c_1 + \dots + \omega^{n-1} c_{n-1},$$

where $\omega$ is any number such that $\omega^n=1$.

$$ \begin{split} \lambda & = c_0 &+& \omega c_1 &+& \dots &+& \omega^{n-1} c_{n-1},\\ \lambda\omega & = c_{n-1} &+& \omega c_0 &+& \dots &+& \omega^{n-1} c_{n-2},\\ \lambda\omega^2 & = c_{n-2} &+& \omega c_{n-1} &+& \dots &+& \omega^{n-1} c_{n-3},\\ &\dots\\ \lambda\omega^{n-1} & = c_{1} &+& \omega c_{2} &+& \dots &+& \omega^{n-1} c_{0}. \end{split} $$ $$ \lambda(\omega) \cdot \begin{pmatrix} 1&\omega & \dots& \omega^{n-1} \end{pmatrix} = \begin{pmatrix} 1&\omega&\dots& \omega^{n-1} \end{pmatrix} \cdot C. $$ $$ \Lambda F_n = F_n C $$

and finally

$$ C = \frac{1}{n} F^*_n \Lambda F_n, \quad \text{where}\quad \Lambda = \text{diag}(F_nc) \qquad\blacksquare $$

Fast matvec with circulant matrix

$$ Cx = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n x = \text{ifft}\left( \text{fft}(c) \circ \text{fft}(x)\right) $$

where $\circ$ denotes elementwise product (Hadamard product) of two vectors (since $\text{diag}(a)b = a\circ b$) and ifft denotes inverse Fourier transform $F^{-1}_n$.

Toeplitz matrices

Now we get back to Toeplitz matrices!

$$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}& \dots & t_{1-n} \\ t_{1} & t_{0} & t_{-1} & t_{-2}& \dots & t_{2-n} \\ t_{2} & t_{1} & t_0 & t_{-1} &\dots & t_{3-n} \\ t_{3} & t_{2} & t_1 & t_0 & \dots & t_{4-n} \\ \dots & \dots & \dots & \dots & \dots & \dots\\ t_{n-1} & t_{n-2} & t_{n-3} & t_{n-4} &\dots &t_0 \end{pmatrix}, $$

or equivalently $T_{ij} = t_{i-j}$.

Matvec operation can be written as

$$ y_i = \sum_{j=1}^n t_{i-j} x_j, $$

which can be interpreted as a discrete convolution of filter $t_i$ and signal $x_i$. For simplicity the size of the filter $t$ is such that the sizes of the input and output signals are the same. Generally, filter size can be arbitrary.

Fast convolution computation has a variety of applications, for instance, in signal processing or partial differential and integral equations. For instance, here is the smoothing of a signal:

Fast matvec with Toeplitz matrix

Key point: the multiplication by a Toeplitz matrix can be reduced to the multiplication by a circulant.

$$ C = \begin{pmatrix} T & \dots \\ \dots & \dots \end{pmatrix}. $$ $$ C = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix}. $$ $$ \begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix}= \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ \text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})). $$

Multilevel Toeplitz matrices

The 2-dimensional convolution is defined as

$$ y_{i_1i_2} = \sum_{j_1,j_2=1}^n t_{i_1-j_1, i_2-j_2} x_{j_1 j_2}. $$

Note that $x$ and $y$ are 2-dimensional arrays and $T$ is 4-dimensional. To reduce this expression to matrix-by-vector product we have to reshape $x$ and $y$ into long vectors:

$$ \text{vec}(x) = \begin{pmatrix} x_{11} \\ \vdots \\ x_{1n} \\ \hline \\ \vdots \\ \hline \\ x_{n1} \\ \vdots \\ x_{nn} \end{pmatrix}, \quad \text{vec}(y) = \begin{pmatrix} y_{11} \\ \vdots \\ y_{1n} \\ \hline \\ \vdots \\ \hline \\ y_{n1} \\ \vdots \\ y_{nn} \end{pmatrix}. $$

In this case matrix $T$ is block Toeplitz with Toeplitz blocks: (BTTB)

$$ T = \begin{pmatrix} T_0 & T_{-1} & T_{-2} & \dots & T_{1-n} \\ T_{1} & T_{0} & T_{-1} & \dots & T_{2-n} \\ T_{2} & T_{1} & T_0 & \dots & T_{3-n} \\ \dots & \dots & \dots & \dots & \dots\\ T_{n-1} & T_{n-2} & T_{n-3} &\dots &T_0 \end{pmatrix}, \quad \text{where} \quad T_k = t_{k, i_2 - j_2}\quad \text{are Toeplitz matrices} $$

Fast matvec with multilevel Toeplitz matrix

To get fast matvec we need to embed block Toeplitz matrix with Toeplitz blocks into the block circulant matrix with circulant blocks. The analog of $$\begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ\text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})).$$ will look like $$ \begin{pmatrix} y_{11} & y_{12} & y_{13} & \star & \star \\ y_{21} & y_{22} & y_{23} & \star & \star \\ y_{31} & y_{32} & y_{33} & \star & \star \\ \star & \star & \star & \star & \star \\ \star & \star & \star & \star & \star \\ \end{pmatrix} = \text{ifft2d}(\text{fft2d}(\begin{pmatrix} t_{0,0} & t_{1,0} & t_{2,0} & t_{-2,0} & t_{-1,0} \\ t_{0,1} & t_{1,1} & t_{2,1} & t_{-2,1} & t_{-1,1} \\ t_{0,2} & t_{1,2} & t_{2,2} & t_{-2,2} & t_{-1,2} \\ t_{0,-2} & t_{1,-2} & t_{2,-2} & t_{-2,-2} & t_{-1,-2} \\ t_{0,-1} & t_{1,-1} & t_{2,-1} & t_{-2,-1} & t_{-1,-1} \end{pmatrix}) \circ \text{fft2d}(\begin{pmatrix}x_{11} & x_{12} & x_{13} & 0 & 0 \\ x_{21} & x_{22} & x_{23} & 0 & 0 \\ x_{31} & x_{32} & x_{33} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{pmatrix})),$$ where fft2d is 2-dimensional fft that consists of one-dimensional transforms, applied first to rows and and then to columns (or vice versa).

Solving linear systems with Toeplitz matrix

$$T x = f.$$

we have the spectral theorem

$$C = \frac{1}{n}F^* \Lambda F, \quad C^{-1} = \frac{1}{n}F^* \Lambda^{-1} F,$$

but for a general Toeplitz matrices, it is not a trivial question.

Iterative methods

Circulant preconditioner

$$C = \arg \min_P \Vert P - T \Vert_F.$$

Convolutions in neural networks

What is convolution in NN?

$$ (x \star y)(t) = \int_{-\infty}^{+\infty} x(\tau)y(\tau + t)d \tau $$ $$(x * y)(t) = \int_{-\infty}^{+\infty} x(\tau) y(t - \tau) d \tau.$$

Source is here

$$ x(t) \star y(t) = x(-t) * y(t) $$

Source of gif is here

Additional remarks about this operation

How the trained filters and result image representation looks like

Image transformations

Take home message