Lecture 14: Structured matrices, FFT, convolutions, Toeplitz matrices

Previous lecture

  • Krylov methods: Arnoldi relation, CG, GMRES
  • Preconditioners
    • Jacobi
    • Gauss-Seidel
    • SSOR
    • ILU and its modifications

Other structured matrices

  • Up to now, we discussed preconditioning only for sparse matrices
  • But iterative methods work well for any matrices that have fast black-box matrix-by-vector product
  • Important class of such matrices are Toeplitz matrices (and Hankel matrices) and their multilevel variants

They are directly connected to the convolution operation and Fast Fourier Transform.

Convolution

  • One of the key operation in signal processing/machine learning is the convolution of two functions.

  • Let $x(t)$ and $y(t)$ be two given functions. Their convolution is defined as

$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$

Convolution theorem and Fourier transform

A well-known fact: a convolution in the time domain is a product in the frequency domain.

  • Time-frequency transformation is given by the Fourier transform:
$$\widehat{x}(w) = (\mathcal{F}(x))(w) = \int_{-\infty}^{\infty} e^{i w t} x(t) dt.$$
  • Then,
$$\mathcal{F}(x * y) = \mathcal{F}(x) \mathcal{F}(y).$$
  • Thus, the "algorithm" for the computation of the convolution can be:
  1. Compute Fourier transform of $x(t)$ and $y(t)$.
  2. Compute their product
  3. Compute inverse Fourier transform

Discrete convolution operation

$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$

Let us approximate the integral by a quadrature sum on a uniform grid, and store the signal at equidistant points.

Then we are left with the summation

$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j},$$

which is called discrete convolution. This can be thought as an application of a filter with coefficients $x$ to a signal $y$.

There are different possible filters for different purposes, but they all utilize the shift-invariant structure.

Discrete convolution and Toeplitz matrices

A discrete convolution can be thought as a matrix-by-vector product:

$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j}, \Leftrightarrow z = Ax$$

where the matrix $A$ elements are given as $a_{ij} = y_{i-j}$, i.e., they depend only on the difference between the row index and the column index.

Toeplitz matrices: definition

A matrix is called Toeplitz if its elements are defined as

$$a_{ij} = t_{i - j}.$$
  • A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters).

  • It is a dense matrix, however it is a structured matrix (i.e., defined by $\mathcal{O}(n)$ parameters).

  • And the main operation in the discrete convolution is the product of Toeplitz matrix by vector.

  • Can we compute it faster than $\mathcal{O}(n^2)$?

Toeplitz and circulant matrix

  • For a special class of Toeplitz matrices, named circulant matrices the fast matrix-by-vector product can be done.

  • A matrix $C$ is called circulant, if

$$C_{ij} = c_{i - j \mod n},$$

i.e. it periodicaly wraps

$$C = \begin{bmatrix} c_0 & c_1 & c_2 & c_3 \\ c_3 & c_0 & c_1 & c_2 \\ c_2 & c_3 & c_0 & c_1 \\ c_1 & c_2 & c_3 & c_0 \\ \end{bmatrix}. $$
  • These matrices have the same eigenvectors, given by the Discrete Fourier Transform (DFT).

Spectral theorem for circulant matrices

Theorem:

Any circulant matrix can be represented in the form

$$C = \frac{1}{n} F^* \Lambda F,$$

where $F$ is the Fourier matrix with the elements

$$F_{kl} = w_n^{kl}, \quad k, l = 0, \ldots, n-1, \quad w_n = e^{-\frac{2 \pi i}{n}},$$

and matrix $\Lambda = \text{diag}(\lambda)$ is the diagonal matrix and

$$\lambda = F c, $$

where $c$ is the first column of the circulant matrix $C$.

The proof will be later: now we need to study the FFT matrix.

Fourier matrix

The Fourier matrix is defined as:

$$ F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ \dots & \dots & \dots &\dots &\dots \\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$

or equivalently

$$ F_n = \{ w_n^{kl} \}_{k,l=0}^{n-1}, $$

where

$$w_n = e^{-\frac{2\pi i}{n}}.$$

Properties:

  • Symmetric (not Hermitian!)
  • Unitary up to a scaling factor: $F_n^* F_n = F_n F_n^* = nI$ (check this fact). Therefore $F_n^{-1} = \frac{1}{n}F^*_n$
  • Can be multiplied by a vector (called dicrete Fourier transform or DFT) with $\mathcal{O}(n \log n)$ complexity (called fast Fourier transform or FFT)! FFT helps to analyze spectrum of a signal and, as we will see later, helps to do fast mutiplications with certain types of matrices.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np


N = 1000
dt = 1.0 / 800.0
x = np.linspace(0.0, N*dt, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x) + 0.2*np.sin(300.0 * 2.0*np.pi*x)
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Signal')
plt.title('Initial signal')
Out[1]:
Text(0.5,1,'Initial signal')
In [2]:
yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2])) #Note: N/2 to N will give negative frequencies
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.title('Discrete Fourier transform')
Out[2]:
Text(0.5,1,'Discrete Fourier transform')

Fast Fourier transform (FFT)

Here we consider a matrix interpretation of the standard Cooley-Tukey algorithm (1965), which has underlying divide and conquer idea. Note that in packages more advanced versions are used.

  • Let $n$ be a power of 2.
  • First of all we permute the rows of the Fourier matrix such that the first $n/2$ rows of the new matrix had row numbers $1,3,5,\dots,n-1$ and the last $n/2$ rows had row numbers $2,4,6\dots,n$.

  • This permutation can be expressed in terms of multiplication by permutation matrix $P_n$:

$$ P_n = \begin{pmatrix} 1 & 0 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 1 & 0 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 1 & 0 \\ \hline 0 & 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 0 & 1 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 0 & 1 \end{pmatrix}, $$

Hence,

$$ P_n F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ 1 & w^{4\cdot 1}_n & w^{4\cdot 2}_n & \dots & w^{4\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-2)\cdot 1}_n & w^{(n-2)\cdot 2}_n & \dots & w^{(n-2)\cdot (n-1)}_n\\ \hline 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{3\cdot 1}_n & w^{3\cdot 2}_n & \dots & w^{3\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$

Now let us imagine that we separated its columns and rows by two parts each of size $n/2$.

As a result we get $2\times 2$ block matrix that has the following form

$$ P_n F_n = \begin{pmatrix} \left\{w^{2kl}_n\right\} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix}, \quad k,l = 0,\dots, \frac{n}{2}-1. $$

So far it does not look like something that works faster :) But we will see that in a minute. Lets have a more precise look at the first block $\left\{w^{2kl}_n\right\}$:

$$ w^{2kl}_n = e^{-2kl\frac{2\pi i}{n}} = e^{-kl\frac{2\pi i}{n/2}} = w^{kl}_{n/2}. $$

So this block is exactly twice smaller Fourier matrix $F_{n/2}$!

<!--- Now we can write

$$ \begin{pmatrix} F_{n/2} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix} $$

--> The block $\left\{w_n^{(2k+1)l}\right\}$ can be written as

$$ w_n^{(2k+1)l} = w_n^{2kl + l} = w_n^{l} w_n^{2kl} = w_n^{l} w_{n/2}^{kl}, $$

which can be written as $W_{n/2}F_{n/2}$, where

$$W_{n/2} = \text{diag}(1,w_n,w_n^2,\dots,w_n^{n/2-1}).$$

Doing the same tricks for the other blocks we will finally get

$$ P_n F_n = \begin{pmatrix} F_{n/2} & F_{n/2} \\ F_{n/2}W_{n/2} & -F_{n/2}W_{n/2} \end{pmatrix} = \begin{pmatrix} F_{n/2} & 0 \\ 0 & F_{n/2} \end{pmatrix} \begin{pmatrix} I_{n/2} & I_{n/2} \\ W_{n/2} & -W_{n/2} \end{pmatrix}. $$
  • Thus, we reduced multiplication by $F_n$ to 2 multiplications by $F_{n/2}$ and cheap multiplications by diagonal matrices.
  • If we apply the obtained expressions recursively to $F_{n/2}$, we will get $\mathcal{O}(n\log n)$ complexity.
In [2]:
#FFT vs full matvec
import time
import numpy as np
import scipy as sp
import scipy.linalg

n = 10000
F = sp.linalg.dft(n)
x = np.random.randn(n)

y_full = F.dot(x)

full_mv_time = %timeit -q -o F.dot(x)
print('Full matvec time =', full_mv_time.average)

y_fft = np.fft.fft(x)
fft_mv_time = %timeit -q -o np.fft.fft(x)
print('FFT time =', fft_mv_time.average)

print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.06917304271482862
FFT time = 3.3967071485988395e-05
Relative error = 1.550139758681484e-12

Circulant matrices

FFT helps to multiply fast by certain types of matrices. We start from a circulant matrix:

$$ C = \begin{pmatrix} c_0 & c_{n-1} & c_{n-2} & \dots & c_1 \\ c_{1} & c_{0} & c_{n-1} & \dots & c_2 \\ c_{2} & c_{1} & c_0 & \dots & c_3 \\ \dots & \dots & \dots & \dots & \dots \\ c_{n-1} & c_{n-2} & c_{n-3} & \dots & c_0 \end{pmatrix} $$

Theorem. Let $C$ be a circulant matrix of size $n\times n$ and let $c$ be it's first column , then

$$ C = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n $$

Proof.

  • Consider a number
$$\lambda (\omega) = c_0 + \omega c_1 + \dots + \omega^{n-1} c_{n-1},$$

where $\omega$ is any number such that $\omega^n=1$.

  • Lets multiply $\lambda$ by $1,\omega,\dots, \omega^{n-1}$:
$$ \begin{split} \lambda & = c_0 &+& \omega c_1 &+& \dots &+& \omega^{n-1} c_{n-1},\\ \lambda\omega & = c_{n-1} &+& \omega c_0 &+& \dots &+& \omega^{n-1} c_{n-2},\\ \lambda\omega^2 & = c_{n-2} &+& \omega c_{n-1} &+& \dots &+& \omega^{n-1} c_{n-3},\\ &\dots\\ \lambda\omega^{n-1} & = c_{1} &+& \omega c_{2} &+& \dots &+& \omega^{n-1} c_{0}. \end{split} $$
  • Therefore,
$$ \lambda(\omega) \cdot \begin{pmatrix} 1&\omega & \dots& \omega^{n-1} \end{pmatrix} = \begin{pmatrix} 1&\omega&\dots& \omega^{n-1} \end{pmatrix} \cdot C. $$
  • Writing this for $\omega = 1,w_n, \dots, w_n^{n-1}$ we get
$$ \Lambda F_n = F_n C $$

and finally

$$ C = \frac{1}{n} F^*_n \Lambda F_n, \quad \text{where}\quad \Lambda = \text{diag}(F_nc) \qquad\blacksquare $$

Fast matvec with circulant matrix

  • Representation $C = \frac{1}{n} F^* \text{diag}(F_n c) F_n $ gives us an explicit way to multiply a vector $x$ by $C$ in $\mathcal{O}(n\log n)$ operations.
  • Indeed,
$$ Cx = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n x = \text{ifft}\left( \text{fft}(c) \circ \text{fft}(x)\right) $$

where $\circ$ denotes elementwise product (Hadamard product) of two vectors (since $\text{diag}(a)b = a\circ b$) and ifft denotes inverse Fourier transform $F^{-1}_n$.

In [3]:
import numpy as np
import scipy as sp
import scipy.linalg

def circulant_matvec(c, x):
    return np.fft.ifft(np.fft.fft(c) * np.fft.fft(x))

n = 5000
c = np.random.random(n)
C = sp.linalg.circulant(c)
x = np.random.randn(n)


y_full = C.dot(x)
full_mv_time = %timeit -q -o C.dot(x)
print('Full matvec time =', full_mv_time.average)


y_fft = circulant_matvec(c, x)
fft_mv_time = %timeit -q -o circulant_matvec(c, x)
print('FFT time =', fft_mv_time.average)

print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.011294817562946783
FFT time = 0.00012443668489951442
Relative error = 7.183826224607303e-16

Toeplitz matrices

Now we get back to Toeplitz matrices!

$$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}& \dots & t_{1-n} \\ t_{1} & t_{0} & t_{-1} & t_{-2}& \dots & t_{2-n} \\ t_{2} & t_{1} & t_0 & t_{-1} &\dots & t_{3-n} \\ t_{3} & t_{2} & t_1 & t_0 & \dots & t_{4-n} \\ \dots & \dots & \dots & \dots & \dots & \dots\\ t_{n-1} & t_{n-2} & t_{n-3} & t_{n-4} &\dots &t_0 \end{pmatrix}, $$

or equivalently $T_{ij} = t_{i-j}$.

Matvec operation can be written as

$$ y_i = \sum_{j=1}^n t_{i-j} x_j, $$

which can be interpreted as a discrete convolution of filter $t_i$ and signal $x_i$. For simplicity the size of the filter $t$ is such that the sizes of the input and output signals are the same. Generally, filter size can be arbitrary.

Fast convolution computation has a variety of applications, for instance, in signal processing or partial differential and integral equations. For instance, here is the smoothing of a signal:

In [5]:
from scipy import signal
%matplotlib inline
import matplotlib.pyplot as plt

alpha = 0.01
sig = np.repeat([0., 1., 0.], 100)
filt = np.exp(-alpha * (np.arange(100)-50)**2)
filtered = signal.convolve(sig, filt, mode='same') / sum(filt)

fig, (ax_orig, ax_filt, ax_filtered) = plt.subplots(3, 1, sharex=True)
ax_orig.plot(sig)
ax_orig.margins(0, 0.1)
ax_filt.plot(filt)
ax_filt.margins(0, 0.1)
ax_filtered.plot(filtered)
ax_filtered.margins(0, 0.1)

ax_orig.set_title('Original signal')
ax_filt.set_title('Filter')
ax_filtered.set_title('Convolution')

fig.tight_layout()

Fast matvec with Toeplitz matrix

Key point: the multiplication by a Toeplitz matrix can be reduced to the multiplication by a circulant.

  • Indeed, every Toeplitz matrix of size $n\times n$ can be embedded into a Circulant matrix $C$ of size $2n \times 2n$:
$$ C = \begin{pmatrix} T & \dots \\ \dots & \dots \end{pmatrix}. $$
  • The $3\times 3$ matrix $T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} \\ t_{1} & t_{0} & t_{-1} \\ t_{2} & t_{1} & t_0 \\ \end{pmatrix}$ can be embedded as follows
$$ C = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix}. $$
  • For matvec $ \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}= \begin{pmatrix} t_0 & t_{-1} & t_{-2} \\ t_{1} & t_{0} & t_{-1} \\ t_{2} & t_{1} & t_0 \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}$ we pad vector $x$ with zeros:
$$ \begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix}= \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ \text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})). $$
  • Note that you do not need to form and store the whole matrix $T$

Multilevel Toeplitz matrices

The 2-dimensional convolution is defined as

$$ y_{i_1i_2} = \sum_{j_1,j_2=1}^n t_{i_1-j_1, i_2-j_2} x_{j_1 j_2}. $$

Note that $x$ and $y$ are 2-dimensional arrays and $T$ is 4-dimensional. To reduce this expression to matrix-by-vector product we have to reshape $x$ and $y$ into long vectors:

$$ \text{vec}(x) = \begin{pmatrix} x_{11} \\ \vdots \\ x_{1n} \\ \hline \\ \vdots \\ \hline \\ x_{n1} \\ \vdots \\ x_{nn} \end{pmatrix}, \quad \text{vec}(y) = \begin{pmatrix} y_{11} \\ \vdots \\ y_{1n} \\ \hline \\ \vdots \\ \hline \\ y_{n1} \\ \vdots \\ y_{nn} \end{pmatrix}. $$

In this case matrix $T$ is block Toeplitz with Toeplitz blocks: (BTTB)

$$ T = \begin{pmatrix} T_0 & T_{-1} & T_{-2} & \dots & T_{1-n} \\ T_{1} & T_{0} & T_{-1} & \dots & T_{2-n} \\ T_{2} & T_{1} & T_0 & \dots & T_{3-n} \\ \dots & \dots & \dots & \dots & \dots\\ T_{n-1} & T_{n-2} & T_{n-3} &\dots &T_0 \end{pmatrix}, \quad \text{where} \quad T_k = t_{k, i_2 - j_2}\quad \text{are Toeplitz matrices} $$

Fast matvec with multilevel Toeplitz matrix

To get fast matvec we need to embed block Toeplitz matrix with Toeplitz blocks into the block circulant matrix with circulant blocks. The analog of $$\begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ\text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})).$$ will look like $$ \begin{pmatrix} y_{11} & y_{12} & y_{13} & \star & \star \\ y_{21} & y_{22} & y_{23} & \star & \star \\ y_{31} & y_{32} & y_{33} & \star & \star \\ \star & \star & \star & \star & \star \\ \star & \star & \star & \star & \star \\ \end{pmatrix} = \text{ifft2d}(\text{fft2d}(\begin{pmatrix} t_{0,0} & t_{1,0} & t_{2,0} & t_{-2,0} & t_{-1,0} \\ t_{0,1} & t_{1,1} & t_{2,1} & t_{-2,1} & t_{-1,1} \\ t_{0,2} & t_{1,2} & t_{2,2} & t_{-2,2} & t_{-1,2} \\ t_{0,-2} & t_{1,-2} & t_{2,-2} & t_{-2,-2} & t_{-1,-2} \\ t_{0,-1} & t_{1,-1} & t_{2,-1} & t_{-2,-1} & t_{-1,-1} \end{pmatrix}) \circ \text{fft2d}(\begin{pmatrix}x_{11} & x_{12} & x_{13} & 0 & 0 \\ x_{21} & x_{22} & x_{23} & 0 & 0 \\ x_{31} & x_{32} & x_{33} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{pmatrix})),$$ where fft2d is 2-dimensional fft that consists of one-dimensional transforms, applied first to rows and and then to columns (or vice versa).

In [6]:
# Blurring and Sharpening Lena by convolution

from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
%matplotlib  inline
from scipy import misc
import imageio

filter_size = 3
filter_blur = np.ones((filter_size, filter_size)) / filter_size**2
lena = imageio.imread('../files/lena512.jpg')
#lena = misc.face()
#lena = lena[:, :, 0]
blurred = signal.convolve2d(lena, filter_blur, boundary='symm', mode='same')

fig, ax = plt.subplots(2, 2, figsize=(8, 8))
ax[0, 0].imshow(lena[200:300, 200:300], cmap='gray')
ax[0, 0].set_title('Original Lena')
ax[0, 1].imshow(blurred[200:300, 200:300], cmap='gray')
ax[0, 1].set_title('Blurred Lena')
ax[1, 0].imshow((lena - blurred)[200:300, 200:300], cmap='gray')
ax[1, 0].set_title('Lena $-$ Blurred Lena')
ax[1, 1].imshow(((lena - blurred)*3 + blurred)[200:300, 200:300], cmap='gray')
ax[1, 1].set_title('$3\cdot$(Lena $-$ Blurred Lena) + Blurred Lena')
fig.tight_layout()

Solving linear systems with Toeplitz matrix

  • Convolution is ok; but what about deconvolution, or solving linear systems with Toeplitz matrices?
$$T x = f.$$
  • For the periodic case, where $T = C$ is circulant,

we have the spectral theorem

$$C = \frac{1}{n}F^* \Lambda F, \quad C^{-1} = \frac{1}{n}F^* \Lambda^{-1} F,$$

but for a general Toeplitz matrices, it is not a trivial question.

Iterative methods

  • Not-a-bad recipe for Toeplitz linear system is to use iterative method (fast matvec is available).

  • A good choice for a preconditioner is a circulant matrix.

Circulant preconditioner

  • A natural idea is to use circulants as preconditioners, since they are easy to invert.

  • The first preconditioner was the preconditioner by Raymond Chan and Gilbert Strang, who proposed to take the first column of the matrix and use it to generate the circulant.

  • The second preconditioner is the Tony Chan preconditioner, which is also very natural:

$$C = \arg \min_P \Vert P - T \Vert_F.$$
  • A simple formula for the entries of $C$ can be derived.
In [9]:
import numpy as np
import scipy.linalg
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
n = 100
c = np.zeros(n)
c[0] = -2
c[1] = 1
Tm = sp.linalg.toeplitz(c, c)


c1 = sp.linalg.circulant(c) #Strang preconditioner
Fmat = 1.0/np.sqrt(n) * np.fft.fft(np.eye(n)) #Poor man's Fourier matrix

d2 = np.diag(Fmat.conj().dot(Tm).dot(Fmat))
c2 = Fmat.dot(np.diag(d2)).dot(Fmat.conj().T)


mat = np.linalg.inv(c1).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for Strang preconditioner')
plt.gca().get_yaxis().set_visible(False)

mat = np.linalg.inv(c2).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.figure()
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for T. Chan Preconditioner')
plt.gca().get_yaxis().set_visible(False)

Direct methods for Toeplitz matrices

  • The idea with preconditioners works for 2D/3D convolutions, but much worse.

  • In 1D case it is also possible to find direct solvers for the Toeplitz matrix, based on the structure of the inverse.

  • But the inverse of a Toeplitz matrix is not Toeplitz!

  • What to do?

Low displacement rank structure

  • The Toeplitz matrices belong to a more general class of low displacement rank matrices.

  • Define the scaled periodic shift matrix $Z_e$ that takes the vector $x$ and transforms it to a vector

$$Z_e x = \begin{bmatrix} e x_{n-1} \\ x_1 \\ x_2 \\ \vdots \\ x_{n-2} \end{bmatrix} $$
  • What is the matrix form of this linear operator?

Shift matrices, displacement matrices and Toeplitz matrices

Given a Toeplitz matrix $T$, we select any $e, f$ such that $ef \ne 1$ and define the displacement operator as

$$L(T) = Z_e T - T Z_f.$$

For a Toeplitz matrix, $L(T)$ has rank 2 (it has only first row and last column non-zero)

In [10]:
import numpy as np
import scipy.linalg
n = 5 
c = np.zeros(n)
c[0] = -2
c[1] = 1
T = sp.linalg.toeplitz(c, c)
e = 0.5
f = 0.5
def Z_shift(e):
    return  np.diag(np.ones(n-1), -1)  + e * np.diag(np.ones(1), n-1)

Z1 = Z_shift(e)
Z2 = Z_shift(f)

print(Z1.dot(T) - T.dot(Z2))
[[-1.   0.   0.   0.5  0. ]
 [ 0.   0.   0.   0.  -0.5]
 [ 0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   0. ]
 [ 0.   0.   0.   0.   1. ]]

What about the inverse?

It is also of rank $2$ !

In [11]:
import numpy as np
import scipy.linalg
n = 5 
c = np.zeros(n)
c[0] = -2
c[1] = 1
T = sp.linalg.toeplitz(c, c)
e = 0.5
f = 0.5
def Z_shift(e):
    return  np.diag(np.ones(n-1), -1)  + e * np.diag(np.ones(1), n-1)

Z1 = Z_shift(e)
Z2 = Z_shift(f)

Tinv = np.linalg.inv(T)

p1 = Z1.dot(Tinv) - Tinv.dot(Z2)
np.linalg.svd(p1)[1]
Out[11]:
array([1.02062073e+00, 1.02062073e+00, 2.60439840e-16, 6.95949729e-17,
       2.21601201e-17])

Small displacement rank: definition

  • The matrix is said to be of displacement rank $r$ with respect to the pair of generators $Z_e, Z_f$, if
$$L(T) = Z_e T - T Z_f = GH^{\top},$$

where $G$ is $n \times r$ and $H$ is $n \times r$.

  • It is similar to "discrete derivative"

Theorem on the inverse structure

Let $T$ satisfy

$$Z_e T - T Z_f = GH ^{\top},$$

and let it be invertible.

Then we have

$$T^{-1} (Z_e T - T Z_f) T^{-1} = T^{-1} Z_e - Z_f T^{-1} = T^{-1} G H^{\top} T^{-1},$$

i.e. the inverse has small displacement rank with the reversed pair of generators $Z_e, Z_f$ (why?).

Recovering a matrix from the displacement representation

  • Does the low-rank representation after the displacement operator describe the structure?

  • I.e. we need to solve the equation of the form

$$Z_e T - T Z_f = GH^{\top} = B$$

for a given right-hand side.

  • This is a linear system of equations in disguise! (Do you see that this is a linear system? What is the size of this linear system?)

Sylvester equation

  • The equation above is a special case of the Sylvester matrix equation which has the general form
$$AX - X B = C,$$

with $A$, $B$ and $C$ given.

  • In general, this is a linear system with $\mathcal{O}(n^2)$ unknowns, and the computational cost is expected to be $n^6$.

  • For Sylvester equation we can solve it in $\mathcal{O}(n^3)$ (will see in the next part on matrix equations)

  • But we can do better for specific $A, B$.

Back to the particular case

For the particular case, we have

$$Z_e T - T Z_f = GH^{\top} = B,$$

and the solution is given by

$$ (e - f) T = \sum_{j = 1}^r Z_e(g_j) Z_f( J h_j), $$

where $Z_e$ is the e-scaled circulant generated by the vector, and $g_j$, and $h_j$ are the columns of the matrices $G$ and $H$, and $J$ is the per-identity matrix (which has ones on the anti-diagonal).

Gohberg-Semencul formula

Based on this idea and for the special case when $e = 0, f \rightarrow \inf$, we get the following famous formula for the inverse of the Toeplitz matrices as a sum of two products of triangular Toeplitz matrices:

Let $A$ be a Toeplitz matrix, and

$$A \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}=\begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \quad A \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_n \end{bmatrix}=\begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$

then

$$A^{-1} = \frac{1}{x_0} \begin{bmatrix} x_0 & 0 & \ldots & 0 \\ x_1 & x_0 & 0 & \ldots \\ \ldots & \ldots \\ x_n & \ldots & \ldots & x_0 \end{bmatrix}\begin{bmatrix} u_0 & u_1 & \ldots & 0 \\ 0 & u_0 & u_1 & \ldots \\ \ldots & \ldots \\ 0 & \ldots & \ldots & u_0 \end{bmatrix}-\frac{1}{x_0} \begin{bmatrix} 0 & 0 & \ldots & 0 \\ y_0 & 0 & 0 & \ldots \\ y_1 & y_0 & \ldots \\ \ldots & \ldots & \\ y_{n-1} & \ldots & y_0 & 0 \end{bmatrix}\begin{bmatrix} 0 & v_0 & \ldots & 0 \\ 0 & 0 & v_0 & v_1 \\ \ldots & \ldots \\ \ldots & \ldots & \ldots & v_0 \\ 0 & \ldots & \ldots & 0\end{bmatrix},$$

where $u_y = y_{n-i}, \quad v_i = x_{n-i}$.

The main meaning: the inverse can be recovered from the first column and the last column of the matrix.

For details, I refer to the book by Tyrtyshnikov "Brief introduction to numerical analysis".

Fast and superfast direct methods

  • For many years, these formulas gave rise to the fast $\mathcal{O}(n^2)$ and superfast $\mathcal{O}(n \log n)$ methods for Toeplitz matrices.

  • The basic idea is that you use augmentation method.

  • Consider that you have computed the inverse of a $(n-1) \times (n-1)$ block of the Toeplitz matrix.

  • You only need to store two vectors to represent the inverse.

  • Then, the bigger matrix can be written in the block form.

$$T_n = \begin{bmatrix} T_{n-1} & a \\ b^{\top} & c \end{bmatrix}.$$
  • We only need to recompute the last and the first column!

Recomputing the last and the first columns

  • Let us split the vector as
$$x = \begin{bmatrix} x_1 & x_2 \end{bmatrix}.$$
  • Then,
$$T_{n-1} x_1 + a x_2 = e_1, \quad b^{\top} x_1 + c x_2 = 0.$$
  • Or,
$$ x_1 = T^{-1}_{n-1} e_1 - T^{-1}_n a x_2.$$
  • Application of $T^{-1}_{n-1}$ costs $\mathcal{O}(n \log n)$ operations, thus $x_2$ will be recovered in the same number of operations as well. The total complexity is then $\mathcal{O}(n^2 \log n)$ operations.

  • Superfast algorithms are obtained in terms of reducing the problem to the computations with polynomials and using block elimination (in the Fourier style).

Other types of low-displacement rank matrices

  • Hankel matrices
  • Cauchy matrices
  • Vandermonde matrices

Take home message

  • Toeplitz matrices, Circulant matrices, Spectral theorem, FFT
  • Multilevel Toeplitz matrices