They are directly connected to the convolution operation and Fast Fourier Transform.
One of the key operation in signal processing/machine learning is the convolution of two functions.
Let $x(t)$ and $y(t)$ be two given functions. Their convolution is defined as
$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$
A well-known fact: a convolution in the time domain is a product in the frequency domain. Time-frequency transformation is given by the Fourier transform:
$$\widehat{x}(w) = (\mathcal{F}(x))(w) = \int_{-\infty}^{\infty} e^{i w t} x(t) dt.$$
Then,
$$\mathcal{F}(x * y) = \mathcal{F}(x) \mathcal{F}(y).$$
Thus, the "algorithm" for the computation of the convolution can be:
$$(x * y)(t) = \int_{-\infty}^{\infty} x(\tau) y(t - \tau) d \tau.$$
Let us approximate the integral by a quadrature sum on a uniform grid, and store the signal at equidistant points.
Then we are left with the summation
$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j},$$
which is called discrete convolution. This can be thought as an application of a filter with coefficients $x$ to a signal $y$.
There are different possible filters for different purposes, but they all utilize the shift-invariant structure.
A discrete convolution can be thought as a matrix-by-vector product:
$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j}, \Leftrightarrow z = Ax$$
where the matrix $A$ elements are given as $a_{ij} = y_{i-j}$, i.e., they depend only on the difference between the row index and the column index.
A matrix is called Toeplitz if its elements are defined as
$$a_{ij} = t_{i - j}.$$
A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters).
It is a dense matrix, however it is a structured matrix (i.e., defined by $\mathcal{O}(n)$ parameters).
And the main operation in the discrete convolution is the product of Toeplitz matrix by vector.
Can we compute it faster than $\mathcal{O}(n^2)$?
For a special class of Toeplitz matrices, named circulant matrices the fast matrix-by-vector product can be done.
A matrix $C$ is called circulant, if
$$C_{ij} = c_{i - j \mod n},$$
i.e. it periodicaly wraps
$$C = \begin{bmatrix} c_0 & c_1 & c_2 & c_3 \\ c_3 & c_0 & c_1 & c_2 \\ c_2 & c_3 & c_0 & c_1 \\ c_1 & c_2 & c_3 & c_0 \\ \end{bmatrix}. $$
These matrices have the same eigenvectors, given by the Discrete Fourier Transform (DFT).
Theorem:
Any circulant matrix can be represented in the form
$$C = \frac{1}{n} F^* \Lambda F,$$
where $F$ is the Fourier matrix with the elements
$$F_{kl} = w_n^{kl}, \quad k, l = 0, \ldots, n-1, \quad w_n = e^{-\frac{2 \pi i}{n}},$$
and matrix $\Lambda = \text{diag}(\lambda)$ is the diagonal matrix and
$$\lambda = F c, $$
where $c$ is the first column of the circulant matrix $C$.
The proof will be later: now we need to study the FFT matrix.
The Fourier matrix is defined as: $$ F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ \dots & \dots & \dots &\dots &\dots \\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$ or equivalently $$ F_n = \{ w_n^{kl} \}_{k,l=0}^{n-1}, $$ where $$w_n = e^{-\frac{2\pi i}{n}}.$$
Properties:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
N = 1000
dt = 1.0 / 800.0
x = np.linspace(0.0, N*dt, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x) + 0.2*np.sin(300.0 * 2.0*np.pi*x)
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Signal')
plt.title('Initial signal')
plt.show()
yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2])) #Note: N/2 to N will give negative frequencies
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.title('Discrete Fourier transform')
plt.show()
Here we consider a matrix interpretation of the standard Cooley-Tukey algorithm (1965), which has underlying divide and conquer idea. Note that in packages more advanced versions are used.
Let $n$ be a power of 2. First of all we permute the rows of the Fourier matrix such that the first $n/2$ rows of the new matrix had row numbers $1,3,5,\dots,n-1$ and the last $n/2$ rows had row numbers $2,4,6\dots,n$. This permutation can be expressed in terms of multiplication by permutation matrix $P_n$: $$ P_n = \begin{pmatrix} 1 & 0 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 1 & 0 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 1 & 0 \\ \hline 0 & 1 & 0 & 0 & \dots & 0 & 0 \\ 0 & 0 & 0 & 1 &\dots & 0 & 0 \\ \vdots & & & & & & \vdots \\ 0 & 0 & 0 & 0 &\dots & 0 & 1 \end{pmatrix}, $$
Hence, $$ P_n F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ 1 & w^{4\cdot 1}_n & w^{4\cdot 2}_n & \dots & w^{4\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-2)\cdot 1}_n & w^{(n-2)\cdot 2}_n & \dots & w^{(n-2)\cdot (n-1)}_n\\ \hline 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{3\cdot 1}_n & w^{3\cdot 2}_n & \dots & w^{3\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$ Now let us imagine that we separated its columns and rows by two parts each of size $n/2$.
As a result we get $2\times 2$ block matrix that has the following form
$$ P_n F_n = \begin{pmatrix} \left\{w^{2kl}_n\right\} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix}, \quad k,l = 0,\dots, \frac{n}{2}-1. $$
So far it does not look like something that works faster :) But we will see that in a minute. Lets have a more precise look at the first block $\left\{w^{2kl}_n\right\}$:
$$ w^{2kl}_n = e^{-2kl\frac{2\pi i}{n}} = e^{-kl\frac{2\pi i}{n/2}} = w^{kl}_{n/2}. $$
So this block is exactly twice smaller Fourier matrix $F_{n/2}$!
The block $\left\{w_n^{(2k+1)l}\right\}$ can be written as $$ w_n^{(2k+1)l} = w_n^{2kl + l} = w_n^{l} w_n^{2kl} = w_n^{l} w_{n/2}^{kl}, $$ which can be written as $W_{n/2}F_{n/2}$, where $$W_{n/2} = \text{diag}(1,w_n,w_n^2,\dots,w_n^{n/2-1}).$$
Doing the same tricks for the other blocks we will finally get $$ P_n F_n = \begin{pmatrix} F_{n/2} & F_{n/2} \\ F_{n/2}W_{n/2} & -F_{n/2}W_{n/2} \end{pmatrix} = \begin{pmatrix} F_{n/2} & 0 \\ 0 & F_{n/2} \end{pmatrix} \begin{pmatrix} I_{n/2} & I_{n/2} \\ W_{n/2} & -W_{n/2} \end{pmatrix}. $$ Thus, we reduced multiplication by $F_n$ to 2 multiplications by $F_{n/2}$ and cheap multiplications by diagonal matrices. If we apply the obtained expressions recursively to $F_{n/2}$, we will get $\mathcal{O}(n\log n)$ complexity.
#FFT vs full matvec
import time
import numpy as np
import scipy as sp
import scipy.linalg
n = 10000
F = sp.linalg.dft(n)
x = np.random.randn(n)
t = time.time()
y_full = F.dot(x)
print('Full matvec time =', time.time() - t)
t = time.time()
y_fft = np.fft.fft(x)
print('FFT time =', time.time() - t)
print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.09186005592346191 FFT time = 0.012014150619506836 Relative error = 1.5268250537857339e-12
FFT helps to multiply fast by certain types of matrices. We start from a circulant matrix:
$$ C = \begin{pmatrix} c_0 & c_{n-1} & c_{n-2} & \dots & c_1 \\ c_{1} & c_{0} & c_{n-1} & \dots & c_2 \\ c_{2} & c_{1} & c_0 & \dots & c_3 \\ \dots & \dots & \dots & \dots & \dots \\ c_{n-1} & c_{n-2} & c_{n-3} & \dots & c_0 \end{pmatrix} $$
Theorem. Let $C$ be a circulant matrix of size $n\times n$ and let $c$ be it's first column , then $$ C = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n $$
Proof. Consider a number $$\lambda (\omega) = c_0 + \omega c_1 + \dots + \omega^{n-1} c_{n-1},$$ where $\omega$ is any number such that $\omega^n=1$. Lets multiply $\lambda$ by $1,\omega,\dots, \omega^{n-1}$: $$ \begin{split} \lambda & = c_0 &+& \omega c_1 &+& \dots &+& \omega^{n-1} c_{n-1},\\ \lambda\omega & = c_{n-1} &+& \omega c_0 &+& \dots &+& \omega^{n-1} c_{n-2},\\ \lambda\omega^2 & = c_{n-2} &+& \omega c_{n-1} &+& \dots &+& \omega^{n-1} c_{n-3},\\ &\dots\\ \lambda\omega^{n-1} & = c_{1} &+& \omega c_{2} &+& \dots &+& \omega^{n-1} c_{0}. \end{split} $$ Therefore, $$ \lambda(\omega) \cdot \begin{pmatrix} 1&\omega & \dots& \omega^{n-1} \end{pmatrix} = \begin{pmatrix} 1&\omega&\dots& \omega^{n-1} \end{pmatrix} \cdot C. $$ Writing this for $\omega = 1,w_n, \dots, w_n^{n-1}$ we get $$ \Lambda F_n = F_n C $$ and finally $$ C = \frac{1}{n} F^*_n \Lambda F_n, \quad \text{where}\quad \Lambda = \text{diag}(F_nc) \qquad\blacksquare $$
Representation $C = \frac{1}{n} F^* \text{diag}(F_n c) F_n $ gives us an explicit way to multiply a vector $x$ by $C$ in $\mathcal{O}(n\log n)$ operations. Indeed,
$$ Cx = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n x = \text{ifft}\left( \text{fft}(c) \circ \text{fft}(x)\right) $$
where $\circ$ denotes elementwise product (Hadamard product) of two vectors (since $\text{diag}(a)b = a\circ b$) and ifft denotes inverse Fourier transform $F^{-1}_n$.
import time
import numpy as np
import scipy as sp
import scipy.linalg
def circulant_matvec(c, x):
return np.fft.ifft(np.fft.fft(c) * np.fft.fft(x))
n = 5000
c = np.random.random(n)
C = sp.linalg.circulant(c)
x = np.random.randn(n)
t = time.time()
y_full = C.dot(x)
print('Full matvec time =', time.time() - t)
t = time.time()
y_fft = circulant_matvec(c, x)
print('FFT time =', time.time() - t)
print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.01159811019897461 FFT time = 0.009469985961914062 Relative error = 9.40432066429003e-16
Now we get back to Toeplitz matrices! $$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}& \dots & t_{1-n} \\ t_{1} & t_{0} & t_{-1} & t_{-2}& \dots & t_{2-n} \\ t_{2} & t_{1} & t_0 & t_{-1} &\dots & t_{3-n} \\ t_{3} & t_{2} & t_1 & t_0 & \dots & t_{4-n} \\ \dots & \dots & \dots & \dots & \dots & \dots\\ t_{n-1} & t_{n-2} & t_{n-3} & t_{n-4} &\dots &t_0 \end{pmatrix}, $$ or equivalently $T_{ij} = t_{i-j}$.
Matvec operation can be written as $$ y_i = \sum_{j=1}^n t_{i-j} x_j, $$ which can be interpreted as a discrete convolution of filter $t_i$ and signal $x_i$. For simplicity the size of the filter $t$ is such that the sizes of the input and output signals are the same. Generally, filter size can be arbitrary.
Fast convolution computation has a variety of applications, for instance, in signal processing or partial differential and integral equations. For instance, here is the smoothing of a signal:
from scipy import signal
alpha = 0.01
sig = np.repeat([0., 1., 0.], 100)
filt = np.exp(-alpha * (np.arange(100)-50)**2)
filtered = signal.convolve(sig, filt, mode='same') / sum(filt)
fig, (ax_orig, ax_filt, ax_filtered) = plt.subplots(3, 1, sharex=True)
ax_orig.plot(sig)
ax_orig.margins(0, 0.1)
ax_filt.plot(filt)
ax_filt.margins(0, 0.1)
ax_filtered.plot(filtered)
ax_filtered.margins(0, 0.1)
ax_orig.set_title('Original signal')
ax_filt.set_title('Filter')
ax_filtered.set_title('Convolution')
fig.tight_layout()
Key point: the multiplication by a Toeplitz matrix can be reduced to the multiplication by a circulant. Indeed, every Toeplitz matrix of size $n\times n$ can be embedded into a Circulant matrix $C$ of size $2n \times 2n$: $$ C = \begin{pmatrix} T & \dots \\ \dots & \dots \end{pmatrix}. $$ The $3\times 3$ matrix $T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} \\ t_{1} & t_{0} & t_{-1} \\ t_{2} & t_{1} & t_0 \\ \end{pmatrix}$ can be embedded as follows $$ C = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix}. $$ For matvec $ \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}= \begin{pmatrix} t_0 & t_{-1} & t_{-2} \\ t_{1} & t_{0} & t_{-1} \\ t_{2} & t_{1} & t_0 \\ \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}$ we pad vector $x$ with zeros: $$ \begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{2} & t_{1}\\ t_{1} & t_{0} & t_{-1} & t_{-2} & t_{2} \\ t_{2} & t_{1} & t_0 & t_{-1} & t_{-2} \\ t_{-2}& t_{2} & t_{1} & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{2} & t_{1} & t_0 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix}= \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ \text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})). $$ Note that you do not need to form and store the whole matrix $T$.
The 2-dimensional convolution is defined as $$ y_{i_1i_2} = \sum_{j_1,j_2=1}^n t_{i_1-j_1, i_2-j_2} x_{j_1 j_2}. $$ Note that $x$ and $y$ are 2-dimensional arrays and $T$ is 4-dimensional. To reduce this expression to matrix-by-vector product we have to reshape $x$ and $y$ into long vectors: $$ \text{vec}(x) = \begin{pmatrix} x_{11} \\ \vdots \\ x_{1n} \\ \hline \\ \vdots \\ \hline \\ x_{n1} \\ \vdots \\ x_{nn} \end{pmatrix}, \quad \text{vec}(y) = \begin{pmatrix} y_{11} \\ \vdots \\ y_{1n} \\ \hline \\ \vdots \\ \hline \\ y_{n1} \\ \vdots \\ y_{nn} \end{pmatrix}. $$ In this case matrix $T$ is block Toeplitz with Toeplitz blocks: (BTTB) $$ T = \begin{pmatrix} T_0 & T_{-1} & T_{-2} & \dots & T_{1-n} \\ T_{1} & T_{0} & T_{-1} & \dots & T_{2-n} \\ T_{2} & T_{1} & T_0 & \dots & T_{3-n} \\ \dots & \dots & \dots & \dots & \dots\\ T_{n-1} & T_{n-2} & T_{n-3} &\dots &T_0 \end{pmatrix}, \quad \text{where} \quad T_k = t_{k, i_2 - j_2}\quad \text{are Toeplitz matrices} $$
To get fast matvec we need to embed block Toeplitz matrix with Toeplitz blocks into the block circulant matrix with circulant blocks. The analog of $$\begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ\text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})).$$ will look like $$ \begin{pmatrix} y_{11} & y_{12} & y_{13} & \star & \star \\ y_{21} & y_{22} & y_{23} & \star & \star \\ y_{31} & y_{32} & y_{33} & \star & \star \\ \star & \star & \star & \star & \star \\ \star & \star & \star & \star & \star \\ \end{pmatrix} = \text{ifft2d}(\text{fft2d}(\begin{pmatrix} t_{0,0} & t_{1,0} & t_{2,0} & t_{-2,0} & t_{-1,0} \\ t_{0,1} & t_{1,1} & t_{2,1} & t_{-2,1} & t_{-1,1} \\ t_{0,2} & t_{1,2} & t_{2,2} & t_{-2,2} & t_{-1,2} \\ t_{0,-2} & t_{1,-2} & t_{2,-2} & t_{-2,-2} & t_{-1,-2} \\ t_{0,-1} & t_{1,-1} & t_{2,-1} & t_{-2,-1} & t_{-1,-1} \end{pmatrix}) \circ \text{fft2d}(\begin{pmatrix}x_{11} & x_{12} & x_{13} & 0 & 0 \\ x_{21} & x_{22} & x_{23} & 0 & 0 \\ x_{31} & x_{32} & x_{33} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{pmatrix})),$$ where fft2d is 2-dimensional fft that consists of one-dimensional transforms, applied first to rows and and then to columns (or vice versa).
# Blurring and Sharpening Lena by convolution
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy import misc
import imageio
filter_size = 3
filter_blur = np.ones((filter_size, filter_size)) / filter_size**2
lena = imageio.imread('./pics/lena512.jpg')
#lena = misc.face()
#lena = lena[:, :, 0]
blurred = signal.convolve2d(lena, filter_blur, boundary='symm', mode='same')
fig, ax = plt.subplots(2, 2, figsize=(8, 8))
ax[0, 0].imshow(lena[200:300, 200:300], cmap='gray')
ax[0, 0].set_title('Original Lena')
ax[0, 1].imshow(blurred[200:300, 200:300], cmap='gray')
ax[0, 1].set_title('Blurred Lena')
ax[1, 0].imshow((lena - blurred)[200:300, 200:300], cmap='gray')
ax[1, 0].set_title('Lena $-$ Blurred Lena')
ax[1, 1].imshow(((lena - blurred)*3 + blurred)[200:300, 200:300], cmap='gray')
ax[1, 1].set_title('$3\cdot$(Lena $-$ Blurred Lena) + Blurred Lena')
fig.tight_layout()
Convolution is ok; but what about deconvolution, or solving linear systems with Toeplitz matrices?
$$T x = f.$$
For the periodic case, where $T = C$ is circulant,
we have the spectral theorem
$$C = \frac{1}{n}F^* \Lambda F, \quad C^{-1} = \frac{1}{n}F^* \Lambda^{-1} F,$$
but for a general Toeplitz matrices, it is not a trivial question.
Not-a-bad recipe for Toeplitz linear system is to use iterative method (fast matvec is available).
A good choice for a preconditioner is a circulant matrix.
A natural idea is to use circulants as preconditioners, since they are easy to invert.
The first preconditioner was the preconditioner by Raymond Chan and Gilbert Strang, who proposed to take the first column of the matrix and use it to generate the circulant.
The second preconditioner is the Tony Chan preconditioner, which is also very natural:
$$C = \arg \min_P \Vert P - T \Vert_F.$$
import numpy as np
import scipy.linalg
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
n = 100
c = np.zeros(n)
c[0] = -2
c[1] = 1
Tm = sp.linalg.toeplitz(c, c)
c1 = sp.linalg.circulant(c) #Strang preconditioner
Fmat = 1.0/np.sqrt(n) * np.fft.fft(np.eye(n)) #Poor man's Fourier matrix
d2 = np.diag(Fmat.conj().dot(Tm).dot(Fmat))
c2 = Fmat.dot(np.diag(d2)).dot(Fmat.conj().T)
mat = np.linalg.inv(c1).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for Strang preconditioner')
plt.gca().get_yaxis().set_visible(False)
mat = np.linalg.inv(c2).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.figure()
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for T. Chan Preconditioner')
plt.gca().get_yaxis().set_visible(False)
The idea with preconditioners works for 2D/3D convolutions, but much worse.
In 1D case it is also possible to find direct solvers for the Toeplitz matrix, based on the structure of the inverse.
But the inverse of a Toeplitz matrix is not Toeplitz!
What to do?
The Toeplitz matrices belong to a more general class of low displacement rank matrices.
Define the scaled periodic shift matrix
$Z_e$ that takes the vector $x$ and transforms it to a vector
$$Z_e x = \begin{bmatrix} e x_{n-1} \\ x_1 \\ x_2 \\ \vdots \\ x_{n-2} \end{bmatrix} $$
What is the matrix form of this linear operator?
Given a Toeplitz matrix $T$, we select any $e, f$ such that $ef \ne 1$ and define the displacement operator as
$$L(T) = Z_e T - T Z_f.$$
For a Toeplitz matrix, $L(T)$ has rank 2 (it has only first row and last column non-zero)
import numpy as np
import scipy.linalg
n = 5
c = np.zeros(n)
c[0] = -2
c[1] = 1
T = sp.linalg.toeplitz(c, c)
e = 0.5
f = 0.5
def Z_shift(e):
return np.diag(np.ones(n-1), -1) + e * np.diag(np.ones(1), n-1)
Z1 = Z_shift(e)
Z2 = Z_shift(f)
print(Z1.dot(T) - T.dot(Z2))
[[-1. 0. 0. 0.5 0. ] [ 0. 0. 0. 0. -0.5] [ 0. 0. 0. 0. 0. ] [ 0. 0. 0. 0. 0. ] [ 0. 0. 0. 0. 1. ]]
What about the inverse?
It is also of rank $2$ !
import numpy as np
import scipy.linalg
n = 5
c = np.zeros(n)
c[0] = -2
c[1] = 1
T = sp.linalg.toeplitz(c, c)
e = 0.5
f = 0.5
def Z_shift(e):
return np.diag(np.ones(n-1), -1) + e * np.diag(np.ones(1), n-1)
Z1 = Z_shift(e)
Z2 = Z_shift(f)
Tinv = np.linalg.inv(T)
p1 = Z1.dot(Tinv) - Tinv.dot(Z2)
np.linalg.svd(p1)[1]
array([1.02062073e+00, 1.02062073e+00, 2.60439840e-16, 6.95949729e-17, 2.21601201e-17])
The matrix is said to be of displacement rank $r$ with respect to the pair of generators
$Z_e, Z_f$, if
$$L(T) = Z_e T - T Z_f = GH^{\top},$$
where $G$ is $n \times r$ and $H$ is $n \times r$.
It is similar to "discrete derivative"
Let $T$ satisfy
$$Z_e T - T Z_f = GH ^{\top},$$
and let it be invertible.
Then we have
$$T^{-1} (Z_e T - T Z_f) T^{-1} = T^{-1} Z_e - Z_f T^{-1} = T^{-1} G H^{\top} T^{-1},$$
i.e. the inverse has small displacement rank with the reversed pair of generators $Z_e, Z_f$ (why?).
Does the low-rank representation after the displacement operator describe the structure?
I.e. we need to solve the equation of the form
$$Z_e T - T Z_f = GH^{\top} = B$$
For a given right-hand side.
This is a linear system of equations in disguise! (Do you see that this is a linear system? What is the size of this linear system?)
The equation above is a special case of the Sylvester matrix equation which has the general form
$$AX - X B = C,$$
with $A$, $B$ and $C$ given.
In general, this is a linear system with $\mathcal{O}(n^2)$ unknowns, and the computational cost is expected to be $n^6$.
For Sylvester equation we can solve it in $\mathcal{O}(n^3)$ (Thursday on matrix equations)
But we can do better for specific $A, B$.
For the particular case, we have $$Z_e T - T Z_f = GH^{\top} = B,$$
and the solution is given by
$$ (e - f) T = \sum_{j = 1}^r Z_e(g_j) Z_f( J h_j), $$
where $Z_e$ is the e-scaled circulant generated by the vector, and $g_j$, and $h_j$ are the columns of the matrices $G$ and $H$,
and $J$ is the per-identity matrix (which has ones on the anti-diagonal).
For details, see the paper by Victor Pan et. al.
Based on this idea and for the special case when $e = 0, f \rightarrow \inf$, we get the following famous formula for the inverse of the Toeplitz matrices as a sum of two products of triangular Toeplitz matrices:
Let $A$ be a Toeplitz matrix, and
$$A \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}=\begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \end{bmatrix}, \quad A \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_n \end{bmatrix}=\begin{bmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{bmatrix} $$
then
$$A^{-1} = \frac{1}{x_0} \begin{bmatrix} x_0 & 0 & \ldots & 0 \\ x_1 & x_0 & 0 & \ldots \\ \ldots & \ldots \\ x_n & \ldots & \ldots & x_0 \end{bmatrix}\begin{bmatrix} u_0 & u_1 & \ldots & 0 \\ 0 & u_0 & u_1 & \ldots \\ \ldots & \ldots \\ 0 & \ldots & \ldots & u_0 \end{bmatrix}-\frac{1}{x_0} \begin{bmatrix} 0 & 0 & \ldots & 0 \\ y_0 & 0 & 0 & \ldots \\ y_1 & y_0 & \ldots \\ \ldots & \ldots & \\ y_{n-1} & \ldots & y_0 & 0 \end{bmatrix}\begin{bmatrix} 0 & v_0 & \ldots & 0 \\ 0 & 0 & v_0 & v_1 \\ \ldots & \ldots \\ \ldots & \ldots & \ldots & v_0 \\ 0 & \ldots & \ldots & 0\end{bmatrix},$$
where $u_y = y_{n-i}, \quad v_i = x_{n-i}$.
The main meaning: the inverse can be recovered from the first column and the last column of the matrix.
For details, I refer to the book by Tyrtyshnikov "Brief introduction to numerical analysis".
For many years, these formulas gave rise to the fast $\mathcal{O}(n^2)$ and superfast $\mathcal{O}(n \log n)$ methods for Toeplitz matrices.
The basic idea is that you use augmentation method.
Consider that you have computed the inverse of a $(n-1) \times (n-1)$ block of the Toeplitz matrix.
You only need to store two vectors to represent the inverse.
Then, the bigger matrix can be written in the block form.
$$T_n = \begin{bmatrix} T_{n-1} & a \\ b^{\top} & c \end{bmatrix}.$$
We only need to recompute the last and the first column!
Let us split the vector as
$$x = \begin{bmatrix} x_1 & x_2 \end{bmatrix}.$$
Then,
$$T_{n-1} x_1 + a x_2 = e_1, \quad b^{\top} x_1 + c x_2 = 0.$$
Or,
$$ x_1 = T^{-1}_{n-1} e_1 - T^{-1}_n a x_2.$$
Application of $T^{-1}_{n-1}$ costs $\mathcal{O}(n \log n)$ operations,
thus $x_2$ will be recovered in the same number of operations as well. The total complexity is then $\mathcal{O}(n^2 \log n)$ operations.
Superfast algorithms are obtained in terms of reducing the problem to the computations with polynomials and using block elimination
(in the Fourier style).