We provide signatures of the functions that you have to implement. Make sure you follow the signatures defined, otherwise your coding solutions will not be graded.
Please submit the single Jupyter Notebook file, where only Python and Markdown/$\\LaTeX$ are used. Any hand-written solutions inserted by photos or in any other way are prohibitive and will not be graded. If you will have any questions about using Markdown, ask them!
Let $U$ be an $n \times r$ orthogonal matrix, $n \geq r$. It is well known, that such matrix has $r \times r$ non-singular submatrix $\hat{U}$. To measure ``non-singularity'' of this submatrix one can use the minimal singular value, or the norm of the inverse $\Vert \hat{U}^{-1} \Vert.$
Suppose we select the submatrix that has the smallest possible norm of the inverse among all $r \times r$ submatrices of $U$. What is the maximal value of such norm? Mathematically, we need to estimate
$$t(n, r) = \max_U \min_{\hat{U}} \Vert \hat{U}^{-1} \Vert_2.$$Is it true that $t(n, r) \leq \sqrt{n}$?
One of the way to find eigenvalues for real matrix $A$ of size $(n,n)$ is QR algorithm. Each iteration of the algorithm consists of two main steps:
$A_0 = A$
From lecture materials you have seen that algorithm complexity in general is $O(Nn^3)$ ($N$ - number of iterations). But we can make an improvement by transforming matrix to Upper Hessenberg form (link) before running the algorithm.
The good property of Hessenberg matrix is preservation of its structure during QR algorithm iteration. In other words, if $A_{k}$ is Hessenberg matrix, then $A_{k+1}$ will also be a Hessenberg matrix. We can use this property to reduce complexity of each iteration. Since we need only to zero first lower subdiagonal for QR factorization (first step in each iteration) we can use Givens rotations $O(n^2)$ instead of Householder reflections $O(n^3)$. Now, the last question is how to obtain Hessenberg matrix - use Householder reflections. Since it is done only once (initialization stage), complexity of algorithm will not be increased.
Final algorithm:
I. Initialization
- Transform original matrix $A$ to Hessenberg form $H_0 = U^TAU$ using Householder reflections.
II. Loop
for k in range(N):
Compute QR using Givens rotation: such matrices $Q_k,R_k$ that $H_k = Q_kR_k$
Update matrix $H_{k+1} = R_k Q_k$
Show that Hessenberg matrix preserves its structure under QR algorithm: if $A_k$ hessenberg matrix then $A_{k+1}$ hessenberg as well. (5 pts)
(Bonus task) For real square matrix $A$, which all leading minors are nondegenerate, prove that QR algorithm converges to upper triangular matrix: $\lim\limits_{k\rightarrow \infty} A_k = T$, where $T$ - upper triangular matrix.
Implement function Hessenberg_Transform which takes a real square matrix $A$ and returns transformed matrix in Hessenberg form $H$ (use householder reflections). (5 pts)
Implement function QR_givens which takes a real square matrix in Hessenberg form $H$ and returns QR factorization: matrices $Q$ and $R$. (5 pts)
Implement function QR_algorithm, which takes a real square matrix $A$ and returns $H_{N}$ - the result of $N$ iterations of QR algorithm. Use previously implemented Hessenberg_Transform to get Hessenberg form of matrix $A$, and QR_givens to make QR factorization on each iteration. (2 pts)
Varing parameter $n$ generate random matrix of size $(n,n)$ and measure the computation time of QR_algorithm function. Plot observed data: computation time versus $n$. Use logarithmic scale for both axes. Explain obtained results.(3 pts)
Compare eigenvalues computed using your implemented algorithm and standard numpy functionality for some random matrix. Try different number of iterations for the algorithm. Explain results. (2 pts)
your solution
import numpy as np
def Hessenberg_Transform(A):
H = # your solution
return H
def QR_givens(H):
Q = # your solution
R = # your solution
return Q,R
def QR_algorithm(A, N_iterations):
H = # your solution
return H
In this problem we consider stochastic estimation of trace for implicitly-defined matrices.
Impicitly-defined matrices are such that matrix-vector multiplication is easy to perform, but matrix entries are not easy to obtain (examples are given by integral kernels, products of sparse matrices, products of a set of circulants, low-rank matrices given in a compressed form, etc).
In such cases it is often possible to approximate trace with $k\ll N$ matrix-vector products, where $N$ is the size of matrix.
One popular way to obtain stochastic estimate is Hutchinson algorithm.
Let $u$ be a random vector from $\mathbb{R}^{n}$ with independent identicaly distributed entries $u_i$ each having zero mean and variance $\sigma^2$.
Let $B$ be a symmetric matrix from $\mathbb{R}^{n\times n}$.
Hutchinson algorithm is roughly as follows:
trace_estimate = 0
for i=1:N do
u = random_vector
trace_estimate += (u,Bu)
end do
trace_estimate /= N
So Hutchinson algorithm uses empirical mean $u^\top B u$ as a trace estimation. Below we ask you to find properties of this estimator.
Subproblem 1 (5 pts)
Find the variance for the case when:
a. each $u_{i}\sim \mathcal{N}(0, \sigma^2)$, i.e., for normally distributed with zero mean and variance $\sigma^2$;
b. each $u_{i}$ follows Rademacher distribution;
Why variance matters.
Let $z_1, \dots, z_N$ are independent identicaly distributed samples of random variable with mean $\mu$ and variance $\sigma^2$ and $\hat{\mu} = \frac{1}{N}\sum_{i=1}^{N} z_{i}$ is a standard Monte Carlo extimator for mean.
a. Show that $\mathbb{E}\left[\hat{\mu}\right] = \mu$ and $\text{var}\left[\hat{\mu}\right] = \sigma^2 \big/ N$.
b. Use Chebyshev inequality to find the number of samples $N$ you need to use to guarante that estimate $\hat{\mu}$ deviates from $\mu$ on $\epsilon\ll 1$ with probability at most $\delta\ll 1$.
Does the variance of the estimator matters?
a. Adapt bounds from Subproblem 1 point 4.b for Hutchinson algorithm with normal and Rademacher distributions.
b. Empirically evaluate variances from Subproblem 1 point 2 using several sparse, low-rank and circulant matrices.
c. Comment on practicality of Chebyshev bounds. What is the number of iterations you need to perform to achieve reasonably small error with reasonably large probability according to this bound?
Subproblem 2 (5 pts)
Subproblem 3 (5 pts)
Is it possible to obtain better bound? Here we study how to do that for simplified case.
Let $A$ be the $N\times N$ matrix with all entries equal one, i.e., $A_{ij}=1$. We consider variant of Hutchinson algorithm with standard normal variables, i.e., $u_{i} \sim \mathcal{N}(0, 1)$.
It is possible to generalize bound from Subproblem 3 point 3 and the result can be found in the paper by Avron and Toledo (2010).
Another similar bound that we state without a proof involves Frobenius norm.
For Hutchinson estimate with $l$ steps denoted $T_l$, $\delta \in (0, 1/2]$ for fixed constants $c$ and $C$
\begin{equation} \text{Pr}\left(\left|T_{l} - \text{tr}(A)\right|\leq C \sqrt{\frac{\log(1/\delta)}{l}}\left\|A\right\|_{F}\right) \geq 1 - \delta. \end{equation}There are a lot of improved version of Hutchinson algorithms.
One fruitful idea is to represent the matrix $A$ as a sum $A = A_1 + A_2$ in such a way that $A_1$ has easy-to-compute trace and $\left\|A_2\right\|_F^2 \ll \left\|A_1\right\|_F^2$ or $\text{tr}(A_2) \ll \text{tr}(A_1)$ if both $A_1$ and $A_2$ are spd matrices.
If this splitting is possible to construct, one has smaller one-sample variance or better $(\epsilon, \delta)$-estimator.
Below we ask you to prove simple theoretical result on the systematic construction of such splittings.
Subproblem 4 (5 pts)
Let $A\in\mathbb{R}^{N\times N}$ be symmetric positive definite matrix and $Q\in\mathbb{R}^{N\times k}$ contains $k$ eigenvalues corresponding to top $k$ eigenvalues, i.e., $\lambda_1 \leq \lambda_2\leq\dots\leq \lambda_k\leq\lambda_{k+1}\leq \dots \leq \lambda_{N}$ eigenvalues are ordered and $Q_{\star 1}, \dots Q_{\star k}$ are eigenvectors with eigenvalues $\lambda_{N-k},\dots,\lambda_{N}$.
The simple analysis above suggests the following improved Hutchinson algorithm:
trace_estimate = 0
Q = []
for i=1:m/3 do
u = random_vector
Q = [Q, u] # stack vectors
end do
Q = orthonormal basis for columns of AQ
tr_1 = trace(Q^T A Q) # compute exactly
tr_2 = approximate tr(I - QQ^T)A(I - QQ^T) with Hutchinson algorithm m/3 steps
tr = tr_1 + tr_2
Observe that we substituted true eigenvalues with orthonormal basis. You may consider this a one-sweep power estimation of eigenvalues.
Subproblem 5 (5 pts)
Subproblem 6 (5 pts)
Produce plots y-axis = relative trace error estimation, x-axis = number of matrix-vector products for original Hutchinson algorithm and for the improved version for the following cases:
Suppose we have some classification model $f: X \to Y$ which takes vector $x \in \mathbb{R^n}$ and output some label $y$. It appears that deep neural networks are valunable to small imperceptible perturbations called adversarial attacks. Formally, an adversarial attack is the verctor $\varepsilon \in \mathbb{R^n}$ that leads to misclassification: $y(x) \neq y(x + \varepsilon)$.
However, it was shown that there exist universal adversarial perturbations: $\varepsilon \in \mathbb{R^n}$ that leads to misclassification $y(x) \neq y(x + \varepsilon)$ for most of inputs.
The hypetesis is that perturbation of a hidden layer caused by an attack will propagate further in the network changing predicted label of x.
Let $f_i$ be the output of $i$-s hidden layer, then $$f_i(x + \varepsilon) - f(x) \approx J_i(x)\varepsilon$$ So, to find attack we need to solve the folliwing optimization problem $$ \max_{\|\varepsilon\|_p = 1} \sum\limits_{x \in \text{batch}}\|J_i(x)\varepsilon\|_q^q $$
import torch
import torch.nn as nn
import requests
def download_file_from_google_drive(id, destination):
URL = "https://docs.google.com/uc?export=download"
session = requests.Session()
response = session.get(URL, params = { 'id' : id }, stream = True)
token = get_confirm_token(response)
if token:
params = { 'id' : id, 'confirm' : token }
response = session.get(URL, params = params, stream = True)
save_response_content(response, destination)
def get_confirm_token(response):
for key, value in response.cookies.items():
if key.startswith('download_warning'):
return value
return None
def save_response_content(response, destination):
CHUNK_SIZE = 32768
with open(destination, "wb") as f:
for chunk in response.iter_content(CHUNK_SIZE):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
class CifarNet(nn.Module):
def __init__(self):
super(CifarNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=3)
self.conv2 = nn.Conv2d(64, 64, kernel_size=3)
self.conv3 = nn.Conv2d(64, 128, kernel_size=3)
self.conv4 = nn.Conv2d(128, 128, kernel_size=3)
self.pool = nn.MaxPool2d(2, 2)
self.relu = nn.ReLU(inplace=True)
self.fc1 = nn.Linear(3200, 256)
self.dropout = nn.Dropout(0.5)
self.fc2 = nn.Linear(256, 256)
self.fc3 = nn.Linear(256, 10)
def forward(self, x):
x = self.relu(self.conv1(x))
x = self.relu(self.conv2(x))
x = self.pool(x)
x = self.relu(self.conv3(x))
x = self.relu(self.conv4(x))
x = self.pool(x)
x = x.view(-1, 3200)
x = self.relu(self.fc1(x))
x = self.dropout(x)
x = self.relu(self.fc2(x))
x = self.fc3(x)
return x
device = "cuda:0" if torch.cuda.is_available() else "cpu"
file_id = "1qrNvr3eLYjvbkVg5jY2sYA2rqK9XkTXs"
download_file_from_google_drive(file_id, "./cifar_checkpoint.pth")
model = CifarNet().to(device)
model.load_state_dict(torch.load("./cifar_checkpoint.pth", map_location="cpu"))
Image restoration is the task where we need to eliminate blurring and some random noise to get ideal image. Practically, images obtained in real life could be represented in following way:
$$g = f * h + n$$where $g$ - obtained image of size $(N,N)$, $h$ - blur kernel of size $(3,3)$, $n$ - an additive zero-mean Gaussian white noise, and $f$ - ideal image (deblurred and denoised). In our work we will use gaussian blur kernel with window size 3:
$$h = \frac{1}{16}\begin{pmatrix}1 & 2 & 1\\ 2 & 4 & 2\\ 1 & 2 & 1\end{pmatrix}$$Here $f*h$ is a 2-d convolution of ideal image with blur kernel, which could be rewritten in matrix format:
$$f * h = H \mathrm{vec}(f)$$where $H$ - block Toeplitz with Toeplitz blocks matrix which corresponds to 2-d convolution with kernel $h$, $\mathrm{vec}(\cdot)$ - operation of vectorization.
So, let us write the task of finding $f$ as optimization problem:
$$\min\limits_{f,u} \|H\mathrm{vec}(f) - \mathrm{vec}(g)\|_2^2 + \alpha_1\|\mathrm{vec}(f-u)\|_2^2 + \alpha_2\|u\|_{\mathrm{TV}}$$where $\|u\|_{\mathrm{TV}} = \sum\limits_{1\leq j,k\leq N}\|\nabla u_{j,k}\|_2 = \sum\limits_{1\leq j,k\leq N}\sqrt{(u_{j+1,k}-u_{j,k})^2 + (u_{j,k+1}-u_{j,k})^2}$ here we calculate gradients (pixel differences) along x and y image dimensions.
One can notice that problem could be splitted on two:
$$\min\limits_{u}\min\limits_f \{\|H\mathrm{vec}(f) - \mathrm{vec}(g)\|_2^2 + \alpha_1\|\mathrm{vec}(f-u)\|_2^2\} + \alpha_2\|u\|_{\mathrm{TV}}$$To find the solution we will use iterative method:
$$\begin{cases} f^{(i)} = \arg\min\limits_f \|H\mathrm{vec}(f) - \mathrm{vec}(g)\|_2^2 + \alpha_1\|\mathrm{vec}(f-u^{(i-1)})\|_2^2 & (a)\\ u^{(i)} = \arg\min\limits_u \alpha_1\|\mathrm{vec}(f^{(i)}-u)\|_2^2 + \alpha_2\|u\|_{\mathrm{TV}} & (b) \end{cases}$$Taking matrix derivative over the minimization functional in subproblem (a) we get the system:
$$(H^TH + \alpha_1I)\mathrm{vec}(f) = H^T\mathrm{vec}(g+\alpha_1u^{(i-1)})$$This system can be solved by conjugate gradient method. Also, taking into account that $H$ is block Toeplitz with Toeplitz blocks (BTTB), we can make fast matrix by vector multiplication.
The subproblem (b) could be solved by using any kind of solver of your choice.
So, your task is to:
from PIL import Image, ImageOps
!wget --no-check-certificate \
"https://github.com/oseledets/nla2022/blob/main/hw2/lena.png?raw=true" \
-O "/lena.png"
--2022-11-21 01:03:37-- https://github.com/oseledets/nla2022/blob/main/hw2/lena.png?raw=true Resolving github.com (github.com)... 140.82.112.3 Connecting to github.com (github.com)|140.82.112.3|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://github.com/oseledets/nla2022/raw/main/hw2/lena.png [following] --2022-11-21 01:03:37-- https://github.com/oseledets/nla2022/raw/main/hw2/lena.png Reusing existing connection to github.com:443. HTTP request sent, awaiting response... 302 Found Location: https://raw.githubusercontent.com/oseledets/nla2022/main/hw2/lena.png [following] --2022-11-21 01:03:37-- https://raw.githubusercontent.com/oseledets/nla2022/main/hw2/lena.png Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 20329 (20K) [image/png] Saving to: ‘/lena.png’ /lena.png 100%[===================>] 19.85K --.-KB/s in 0.001s 2022-11-21 01:03:37 (24.1 MB/s) - ‘/lena.png’ saved [20329/20329]
orig_image = ImageOps.grayscale(Image.open("/lena.png"))
orig_image