They are directly connected to the convolution operation and Fast Fourier Transform.
One of the key operation in signal processing/machine learning is the convolution of two functions.
Let $x(t)$ and $y(t)$ be two given functions. Their convolution is defined as
A well-known fact: a convolution in the time domain is a product in the frequency domain.
Let us approximate the integral by a quadrature sum on a uniform grid, and store the signal at equidistant points.
Then we are left with the summation
$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j},$$which is called discrete convolution. This can be thought as an application of a filter with coefficients $x$ to a signal $y$.
There are different possible filters for different purposes, but they all utilize the shift-invariant structure.
A discrete convolution can be thought as a matrix-by-vector product:
$$z_i = \sum_{j=0}^{n-1} x_j y_{i - j}, \Leftrightarrow z = Ax$$where the matrix $A$ elements are given as $a_{ij} = y_{i-j}$, i.e., they depend only on the difference between the row index and the column index.
A matrix is called Toeplitz if its elements are defined as
$$a_{ij} = t_{i - j}.$$A Toeplitz matrix is completely defined by its first column and first row (i.e., $2n-1$ parameters).
It is a dense matrix, however it is a structured matrix (i.e., defined by $\mathcal{O}(n)$ parameters).
And the main operation in the discrete convolution is the product of Toeplitz matrix by vector.
Can we compute it faster than $\mathcal{O}(n^2)$?
For a special class of Toeplitz matrices, named circulant matrices the fast matrix-by-vector product can be done.
A matrix $C$ is called circulant, if
i.e. it periodicaly wraps
$$C = \begin{bmatrix} c_0 & c_3 & c_2 & c_1 \\ c_1 & c_0 & c_3 & c_2 \\ c_2 & c_1 & c_0 & c_3 \\ c_3 & c_2 & c_1 & c_0 \\ \end{bmatrix}. $$Theorem:
Any circulant matrix can be represented in the form
$$C = \frac{1}{n} F^* \Lambda F,$$where $F$ is the Fourier matrix with the elements
$$F_{kl} = w_n^{kl}, \quad k, l = 0, \ldots, n-1, \quad w_n = e^{-\frac{2 \pi i}{n}},$$and matrix $\Lambda = \text{diag}(\lambda)$ is the diagonal matrix and
$$\lambda = F c, $$where $c$ is the first column of the circulant matrix $C$.
The proof will be later: now we need to study the FFT matrix.
The Fourier matrix is defined as:
$$ F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ \dots & \dots & \dots &\dots &\dots \\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$or equivalently
$$ F_n = \{ w_n^{kl} \}_{k,l=0}^{n-1}, $$where
$$w_n = e^{-\frac{2\pi i}{n}}.$$Properties:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
N = 1000
dt = 1.0 / 800.0
x = np.linspace(0.0, N*dt, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x) + 0.2*np.sin(300.0 * 2.0*np.pi*x)
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Signal')
plt.title('Initial signal')
Text(0.5,1,'Initial signal')
yf = np.fft.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*dt), N//2)
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2])) #Note: N/2 to N will give negative frequencies
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.title('Discrete Fourier transform')
Text(0.5,1,'Discrete Fourier transform')
Here we consider a matrix interpretation of the standard Cooley-Tukey algorithm (1965), which has underlying divide and conquer idea. Note that in packages more advanced versions are used.
First of all we permute the rows of the Fourier matrix such that the first $n/2$ rows of the new matrix had row numbers $1,3,5,\dots,n-1$ and the last $n/2$ rows had row numbers $2,4,6\dots,n$.
This permutation can be expressed in terms of multiplication by permutation matrix $P_n$:
Hence,
$$ P_n F_n = \begin{pmatrix} 1 & 1 & 1 & \dots & 1 \\ 1 & w^{2\cdot 1}_n & w^{2\cdot 2}_n & \dots & w^{2\cdot (n-1)}_n\\ 1 & w^{4\cdot 1}_n & w^{4\cdot 2}_n & \dots & w^{4\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-2)\cdot 1}_n & w^{(n-2)\cdot 2}_n & \dots & w^{(n-2)\cdot (n-1)}_n\\ \hline 1 & w^{1\cdot 1}_n & w^{1\cdot 2}_n & \dots & w^{1\cdot (n-1)}_n\\ 1 & w^{3\cdot 1}_n & w^{3\cdot 2}_n & \dots & w^{3\cdot (n-1)}_n\\ \vdots & & & & \vdots\\ 1 & w^{(n-1)\cdot 1}_n & w^{(n-1)\cdot 2}_n & \dots & w^{(n-1)\cdot (n-1)}_n\\ \end{pmatrix}, $$Now let us imagine that we separated its columns and rows by two parts each of size $n/2$.
As a result we get $2\times 2$ block matrix that has the following form
$$ P_n F_n = \begin{pmatrix} \left\{w^{2kl}_n\right\} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix}, \quad k,l = 0,\dots, \frac{n}{2}-1. $$So far it does not look like something that works faster :) But we will see that in a minute. Lets have a more precise look at the first block $\left\{w^{2kl}_n\right\}$:
$$ w^{2kl}_n = e^{-2kl\frac{2\pi i}{n}} = e^{-kl\frac{2\pi i}{n/2}} = w^{kl}_{n/2}. $$So this block is exactly twice smaller Fourier matrix $F_{n/2}$!
<!--- Now we can write
$$ \begin{pmatrix} F_{n/2} & \left\{w_n^{2k\left(\frac{n}{2} + l\right)}\right\} \\ \left\{w_n^{(2k+1)l}\right\} & \left\{w_n^{(2k+1)\left(\frac{n}{2} + l\right)}\right\} \end{pmatrix} $$--> The block $\left\{w_n^{(2k+1)l}\right\}$ can be written as
$$ w_n^{(2k+1)l} = w_n^{2kl + l} = w_n^{l} w_n^{2kl} = w_n^{l} w_{n/2}^{kl}, $$which can be written as $W_{n/2}F_{n/2}$, where
$$W_{n/2} = \text{diag}(1,w_n,w_n^2,\dots,w_n^{n/2-1}).$$Doing the same tricks for the other blocks we will finally get
$$ P_n F_n = \begin{pmatrix} F_{n/2} & F_{n/2} \\ F_{n/2}W_{n/2} & -F_{n/2}W_{n/2} \end{pmatrix} = \begin{pmatrix} F_{n/2} & 0 \\ 0 & F_{n/2} \end{pmatrix} \begin{pmatrix} I_{n/2} & I_{n/2} \\ W_{n/2} & -W_{n/2} \end{pmatrix}. $$#FFT vs full matvec
import time
import numpy as np
import scipy as sp
import scipy.linalg
n = 10000
F = sp.linalg.dft(n)
x = np.random.randn(n)
y_full = F.dot(x)
full_mv_time = %timeit -q -o F.dot(x)
print('Full matvec time =', full_mv_time.average)
y_fft = np.fft.fft(x)
fft_mv_time = %timeit -q -o np.fft.fft(x)
print('FFT time =', fft_mv_time.average)
print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.06917304271482862 FFT time = 3.3967071485988395e-05 Relative error = 1.550139758681484e-12
FFT helps to multiply fast by certain types of matrices. We start from a circulant matrix:
$$ C = \begin{pmatrix} c_0 & c_{n-1} & c_{n-2} & \dots & c_1 \\ c_{1} & c_{0} & c_{n-1} & \dots & c_2 \\ c_{2} & c_{1} & c_0 & \dots & c_3 \\ \dots & \dots & \dots & \dots & \dots \\ c_{n-1} & c_{n-2} & c_{n-3} & \dots & c_0 \end{pmatrix} $$Theorem. Let $C$ be a circulant matrix of size $n\times n$ and let $c$ be it's first column , then
$$ C = \frac{1}{n} F_n^* \text{diag}(F_n c) F_n $$Proof.
where $\omega$ is any number such that $\omega^n=1$.
and finally
$$ C = \frac{1}{n} F^*_n \Lambda F_n, \quad \text{where}\quad \Lambda = \text{diag}(F_nc) \qquad\blacksquare $$where $\circ$ denotes elementwise product (Hadamard product) of two vectors (since $\text{diag}(a)b = a\circ b$) and ifft
denotes inverse Fourier transform $F^{-1}_n$.
import numpy as np
import scipy as sp
import scipy.linalg
def circulant_matvec(c, x):
return np.fft.ifft(np.fft.fft(c) * np.fft.fft(x))
n = 5000
c = np.random.random(n)
C = sp.linalg.circulant(c)
x = np.random.randn(n)
y_full = C.dot(x)
full_mv_time = %timeit -q -o C.dot(x)
print('Full matvec time =', full_mv_time.average)
y_fft = circulant_matvec(c, x)
fft_mv_time = %timeit -q -o circulant_matvec(c, x)
print('FFT time =', fft_mv_time.average)
print('Relative error =', (np.linalg.norm(y_full - y_fft)) / np.linalg.norm(y_full))
Full matvec time = 0.011294817562946783 FFT time = 0.00012443668489951442 Relative error = 7.183826224607303e-16
Now we get back to Toeplitz matrices!
$$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3}& \dots & t_{1-n} \\ t_{1} & t_{0} & t_{-1} & t_{-2}& \dots & t_{2-n} \\ t_{2} & t_{1} & t_0 & t_{-1} &\dots & t_{3-n} \\ t_{3} & t_{2} & t_1 & t_0 & \dots & t_{4-n} \\ \dots & \dots & \dots & \dots & \dots & \dots\\ t_{n-1} & t_{n-2} & t_{n-3} & t_{n-4} &\dots &t_0 \end{pmatrix}, $$or equivalently $T_{ij} = t_{i-j}$.
Matvec operation can be written as
$$ y_i = \sum_{j=1}^n t_{i-j} x_j, $$which can be interpreted as a discrete convolution of filter $t_i$ and signal $x_i$. For simplicity the size of the filter $t$ is such that the sizes of the input and output signals are the same. Generally, filter size can be arbitrary.
Fast convolution computation has a variety of applications, for instance, in signal processing or partial differential and integral equations. For instance, here is the smoothing of a signal:
from scipy import signal
%matplotlib inline
import matplotlib.pyplot as plt
alpha = 0.01
sig = np.repeat([0., 1., 0.], 100)
filt = np.exp(-alpha * (np.arange(100)-50)**2)
filtered = signal.convolve(sig, filt, mode='same') / sum(filt)
fig, (ax_orig, ax_filt, ax_filtered) = plt.subplots(3, 1, sharex=True)
ax_orig.plot(sig)
ax_orig.margins(0, 0.1)
ax_filt.plot(filt)
ax_filt.margins(0, 0.1)
ax_filtered.plot(filtered)
ax_filtered.margins(0, 0.1)
ax_orig.set_title('Original signal')
ax_filt.set_title('Filter')
ax_filtered.set_title('Convolution')
fig.tight_layout()
Key point: the multiplication by a Toeplitz matrix can be reduced to the multiplication by a circulant.
The 2-dimensional convolution is defined as
$$ y_{i_1i_2} = \sum_{j_1,j_2=1}^n t_{i_1-j_1, i_2-j_2} x_{j_1 j_2}. $$Note that $x$ and $y$ are 2-dimensional arrays and $T$ is 4-dimensional. To reduce this expression to matrix-by-vector product we have to reshape $x$ and $y$ into long vectors:
$$ \text{vec}(x) = \begin{pmatrix} x_{11} \\ \vdots \\ x_{1n} \\ \hline \\ \vdots \\ \hline \\ x_{n1} \\ \vdots \\ x_{nn} \end{pmatrix}, \quad \text{vec}(y) = \begin{pmatrix} y_{11} \\ \vdots \\ y_{1n} \\ \hline \\ \vdots \\ \hline \\ y_{n1} \\ \vdots \\ y_{nn} \end{pmatrix}. $$In this case matrix $T$ is block Toeplitz with Toeplitz blocks: (BTTB)
$$ T = \begin{pmatrix} T_0 & T_{-1} & T_{-2} & \dots & T_{1-n} \\ T_{1} & T_{0} & T_{-1} & \dots & T_{2-n} \\ T_{2} & T_{1} & T_0 & \dots & T_{3-n} \\ \dots & \dots & \dots & \dots & \dots\\ T_{n-1} & T_{n-2} & T_{n-3} &\dots &T_0 \end{pmatrix}, \quad \text{where} \quad T_k = t_{k, i_2 - j_2}\quad \text{are Toeplitz matrices} $$To get fast matvec we need to embed block Toeplitz matrix with Toeplitz blocks into the block circulant matrix with circulant blocks. The analog of $$\begin{pmatrix} y_1 \\ y_2 \\ y_3 \\ \star \\ \star \end{pmatrix} = \text{ifft}(\text{fft}(\begin{pmatrix} t_0 \\ t_{1} \\ t_{2} \\ t_{-2} \\ t_{-1} \end{pmatrix})\circ\text{fft}(\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ 0 \\ 0 \end{pmatrix})).$$ will look like $$ \begin{pmatrix} y_{11} & y_{12} & y_{13} & \star & \star \\ y_{21} & y_{22} & y_{23} & \star & \star \\ y_{31} & y_{32} & y_{33} & \star & \star \\ \star & \star & \star & \star & \star \\ \star & \star & \star & \star & \star \\ \end{pmatrix} = \text{ifft2d}(\text{fft2d}(\begin{pmatrix} t_{0,0} & t_{1,0} & t_{2,0} & t_{-2,0} & t_{-1,0} \\ t_{0,1} & t_{1,1} & t_{2,1} & t_{-2,1} & t_{-1,1} \\ t_{0,2} & t_{1,2} & t_{2,2} & t_{-2,2} & t_{-1,2} \\ t_{0,-2} & t_{1,-2} & t_{2,-2} & t_{-2,-2} & t_{-1,-2} \\ t_{0,-1} & t_{1,-1} & t_{2,-1} & t_{-2,-1} & t_{-1,-1} \end{pmatrix}) \circ \text{fft2d}(\begin{pmatrix}x_{11} & x_{12} & x_{13} & 0 & 0 \\ x_{21} & x_{22} & x_{23} & 0 & 0 \\ x_{31} & x_{32} & x_{33} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{pmatrix})),$$ where fft2d is 2-dimensional fft that consists of one-dimensional transforms, applied first to rows and and then to columns (or vice versa).
# Blurring and Sharpening Lena by convolution
from scipy import signal
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy import misc
import imageio
filter_size = 5
filter_blur = np.ones((filter_size, filter_size)) / filter_size**2
lena = imageio.imread('./lena512.jpg')
#lena = misc.face()
#lena = lena[:, :, 0]
blurred = signal.convolve2d(lena, filter_blur, boundary='symm', mode='same')
fig, ax = plt.subplots(2, 2, figsize=(8, 8))
ax[0, 0].imshow(lena[200:300, 200:300], cmap='gray')
ax[0, 0].set_title('Original Lena')
ax[0, 1].imshow(blurred[200:300, 200:300], cmap='gray')
ax[0, 1].set_title('Blurred Lena')
ax[1, 0].imshow((lena - blurred)[200:300, 200:300], cmap='gray')
ax[1, 0].set_title('Lena $-$ Blurred Lena')
ax[1, 1].imshow(((lena - blurred)*3 + blurred)[200:300, 200:300], cmap='gray')
ax[1, 1].set_title('$3\cdot$(Lena $-$ Blurred Lena) + Blurred Lena')
fig.tight_layout()
we have the spectral theorem
$$C = \frac{1}{n}F^* \Lambda F, \quad C^{-1} = \frac{1}{n}F^* \Lambda^{-1} F,$$but for a general Toeplitz matrices, it is not a trivial question.
Not-a-bad recipe for Toeplitz linear system is to use iterative method (fast matvec is available).
A good choice for a preconditioner is a circulant matrix.
A natural idea is to use circulants as preconditioners, since they are easy to invert.
The first preconditioner was the preconditioner by Raymond Chan and Gilbert Strang, who proposed to take the first column of the matrix and use it to generate the circulant.
The second preconditioner is the Tony Chan preconditioner, which is also very natural:
import numpy as np
import scipy.linalg
%matplotlib inline
import matplotlib.pyplot as plt
import scipy as sp
n = 100
c = np.zeros(n)
c[0] = -2
c[1] = 1
Tm = sp.linalg.toeplitz(c, c)
c1 = sp.linalg.circulant(c) #Strang preconditioner
Fmat = 1.0/np.sqrt(n) * np.fft.fft(np.eye(n)) #Poor man's Fourier matrix
d2 = np.diag(Fmat.conj().dot(Tm).dot(Fmat))
c2 = Fmat.dot(np.diag(d2)).dot(Fmat.conj().T)
mat = np.linalg.inv(c1).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for Strang preconditioner')
plt.gca().get_yaxis().set_visible(False)
mat = np.linalg.inv(c2).dot(Tm)
ev = np.linalg.eigvals(mat).real
plt.figure()
plt.plot(np.sort(ev), np.ones(n), 'o')
plt.xlabel('Eigenvalues for T. Chan Preconditioner')
plt.gca().get_yaxis().set_visible(False)
Source is here
Source of gif is here
import torchvision.models as models
vgg16 = models.vgg16(pretrained=True)
print(vgg16)
VGG( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) (avgpool): AdaptiveAvgPool2d(output_size=(7, 7)) (classifier): Sequential( (0): Linear(in_features=25088, out_features=4096, bias=True) (1): ReLU(inplace=True) (2): Dropout(p=0.5, inplace=False) (3): Linear(in_features=4096, out_features=4096, bias=True) (4): ReLU(inplace=True) (5): Dropout(p=0.5, inplace=False) (6): Linear(in_features=4096, out_features=1000, bias=True) ) )
for ch in vgg16.children():
features = ch
break
print(features)
for name, param in features.named_parameters():
print(name, param.shape)
Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (6): ReLU(inplace=True) (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (8): ReLU(inplace=True) (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (11): ReLU(inplace=True) (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (13): ReLU(inplace=True) (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (15): ReLU(inplace=True) (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (18): ReLU(inplace=True) (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (20): ReLU(inplace=True) (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (22): ReLU(inplace=True) (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (25): ReLU(inplace=True) (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (27): ReLU(inplace=True) (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (29): ReLU(inplace=True) (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) 0.weight torch.Size([64, 3, 3, 3]) 0.bias torch.Size([64]) 2.weight torch.Size([64, 64, 3, 3]) 2.bias torch.Size([64]) 5.weight torch.Size([128, 64, 3, 3]) 5.bias torch.Size([128]) 7.weight torch.Size([128, 128, 3, 3]) 7.bias torch.Size([128]) 10.weight torch.Size([256, 128, 3, 3]) 10.bias torch.Size([256]) 12.weight torch.Size([256, 256, 3, 3]) 12.bias torch.Size([256]) 14.weight torch.Size([256, 256, 3, 3]) 14.bias torch.Size([256]) 17.weight torch.Size([512, 256, 3, 3]) 17.bias torch.Size([512]) 19.weight torch.Size([512, 512, 3, 3]) 19.bias torch.Size([512]) 21.weight torch.Size([512, 512, 3, 3]) 21.bias torch.Size([512]) 24.weight torch.Size([512, 512, 3, 3]) 24.bias torch.Size([512]) 26.weight torch.Size([512, 512, 3, 3]) 26.bias torch.Size([512]) 28.weight torch.Size([512, 512, 3, 3]) 28.bias torch.Size([512])
plt.figure(figsize=(20, 17))
for i, filter in enumerate(features[2].weight):
plt.subplot(16, 16, i+1)
plt.imshow(filter[0, :, :].detach(), cmap='gray')
plt.axis('off')
plt.show()
from PIL import Image
img = Image.open("./tiger.jpeg")
plt.imshow(img)
plt.show()
from torchvision import transforms
transform = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
img_ = transform(img)
img_ = img_.unsqueeze(0)
print(img_.size())
plt.imshow(img_[0].permute(1, 2, 0))
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
torch.Size([1, 3, 224, 224])
<matplotlib.image.AxesImage at 0x7ff46a0adf28>
# After the first convolutional layer
output1 = features[0](img_)
print(output1.size())
plt.figure(figsize=(20, 17))
for i, f_map in enumerate(output1[0]):
plt.subplot(8, 8, i+1)
plt.imshow(f_map.detach(), cmap="gray")
plt.axis('off')
plt.show()
torch.Size([1, 64, 224, 224])
# After the second convolutional layer
output2 = features[2](features[1](output1))
print(output2.size())
plt.figure(figsize=(20, 17))
for i, f_map in enumerate(output2[0]):
plt.subplot(8, 8, i+1)
plt.imshow(f_map.detach(), cmap="gray")
plt.axis('off')
plt.show()
torch.Size([1, 64, 224, 224])
output3 = features[5](features[4](features[3](output2)))
print(output3.size())
plt.figure(figsize=(20, 20))
for i, f_map in enumerate(output3[0]):
if i + 1 == 65:
break
plt.subplot(8, 8, i+1)
plt.imshow(f_map.detach(), cmap="gray")
plt.axis('off')
plt.show()
torch.Size([1, 128, 112, 112])
feature
block one gets 512 channels (we start from 3 (RGB)) and significant reduction of the spatial size of imagefrom IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()