The complexity to find an LU decomposition of a dense $n\times n$ matrix is $\mathcal{O}(n^3)$. Significant reduction in complexity can be achieved if the matrix has a certain structure, e.g. it is sparse. In the following task we consider an important example of $LU$ for a special type of sparse matrices –– band matrices with the bandwidth $m$ equal to 3 or 5 which called tridiagonal and pentadiagonal respectively.
band_lu(diag_broadcast, n)
which computes LU decomposition for tridiagonal or pentadiagonal matrix with given diagonal values.
For example, input parametres (diag_broadcast = [4,-2,1], n = 4)
mean that we need to find LU decomposition for the triangular matrix of the form:As an output it is considered to make L
and U
- 2D arrays representing diagonals in factors $L$ (L[0]
keeps first lower diagonal, L[1]
keeps second lower, ...), and $U$ (U[:,0]
keeps main diagonal, U[:,1]
keeps first upper, ...).
scipy
, i.e. which takes the whole matrix and does not know about its special structure, and band decomposition of yours implementation. Comment on the results.from scipy.sparse import diags # can be used with broadcasting of scalars if desired dimensions are large
import numpy as np
# INPUT : diag_broadcast - list of diagonals value to broadcast,length equal to 3 or 5; n - integer, band matrix shape.
# OUTPUT : L - 2D np.ndarray, L.shape[0] depends on bandwidth, L.shape[1] = n-1, do not store main diagonal, where all ones; add zeros to the right side of rows to handle with changing length of diagonals.
# U - 2D np.ndarray, U.shape[0] = n, U.shape[1] depends on bandwidth;
# add zeros to the bottom of columns to handle with changing length of diagonals.
def band_lu(diag_broadcast, n):
# enter your code here
raise NotImplementedError()
# Your solution is here
Let $A = \begin{pmatrix} \varepsilon & 1 & 0\\ 1 & 1 & 1 \\ 0 & 1 & 1 \end{pmatrix}.$
# Your solution is here
Let $A = \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix}$ be a block matrix. The goal is to solve the linear system
$$ \begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} = \begin{bmatrix} f_1 \\ f_2 \end{bmatrix}. $$where $X$ - nonsingular square matrix.
# Your solution is here
In the lecture we provide naive implementation of LU factorization with loops and elementwise update of factors. In this subproblem we ask you to provide more efficient implementation of LU factorization and explain how you derive this implementation (main ideas and how you use them in this particular case).
NumPy or JAX are both ok in this subproblem, but please use the single library for all implementations.
# Your solution is here
(5 pts) Prove that normal matrix is Hermitian iff its eigenvalues are real. Prove that normal matrix is unitary iff its eigenvalues satisfy $|\lambda| = 1$.
(5 pts) The following problem illustrates instability of the Jordan form. Find theoretically the eigenvalues of the perturbed Jordan block:
Comment how eigenvalues of $J(0)$ are perturbed for large $n$.
# Your solution is here
# INPUT: G - np.ndarray or sparse matrix
# OUTPUT: A - np.ndarray (of size G.shape) or sparse matrix
def pagerank_matrix(G):
# enter your code here
return A
num_iter
. It should be organized as a function power_method(A, x0, num_iter)
that outputs approximation to eigenvector $x$, eigenvalue $\lambda$ and history of residuals $\{\|Ax_k - \lambda_k x_k\|_2\}$. Make sure that the method converges to the correct solution on a matrix $\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}$ which is known to have the largest eigenvalue equal to $3$.# INPUT: A - np.ndarray (2D), x0 - np.ndarray (1D), num_iter - integer (positive)
# OUTPUT: x - np.ndarray (of size x0), l - float, res - np.ndarray (of size num_iter + 1 [include initial guess])
def power_method(A, x0, num_iter): # 5 pts
# enter your code here
return x, l, res
num_iter=100
and random initial guess x0
. Explain the absence of convergence. num_iter=100
for 10 different initial guesses and print/plot the resulting approximated eigenvectors. Why do they depend on the initial guess?In order to avoid this problem Larry Page and Sergey Brin proposed to use the following regularization technique:
$$ A_d = dA + \frac{1-d}{N} \begin{pmatrix} 1 & \dots & 1 \\ \vdots & & \vdots \\ 1 & \dots & 1 \end{pmatrix}, $$where $d$ is a small parameter in $[0,1]$ (typically $d=0.85$), which is called damping factor, $A$ is of size $N\times N$. Now $A_d$ is the matrix with multiplicity of the largest eigenvalue equal to 1. Recall that computing the eigenvector of the PageRank matrix, which corresponds to the largest eigenvalue, has the following interpretation. Consider a person who stays in a random node of a graph (i.e. opens a random web page); at each step s/he follows one of the outcoming edges uniformly at random (i.e. opens one of the links). So the person randomly walks through the graph and the eigenvector we are looking for is exactly his/her stationary distribution — for each node it tells you the probability of visiting this particular node. Therefore, if the person has started from a part of the graph which is not connected with the other part, he will never get there. In the regularized model, the person at each step follows one of the outcoming links with probability $d$ OR teleports to a random node from the whole graph with probability $(1-d)$.
(2 pts) Now, run the power method with $A_d$ and plot residuals $\|A_d x_k - \lambda_k x_k\|_2$ as a function of $k$ for $d=0.97$, num_iter=100
and a random initial guess x0
.
(5 pts) Find the second largest in the absolute value eigenvalue of the obtained matrix $A_d$. How and why is it connected to the damping factor $d$? What is the convergence rate of the PageRank algorithm when using damping factor?
Usually, graphs that arise in various areas are sparse (social, web, road networks, etc.) and, thus, computation of a matrix-vector product for corresponding PageRank matrix $A$ is much cheaper than $\mathcal{O}(N^2)$. However, if $A_d$ is calculated directly, it becomes dense and, therefore, $\mathcal{O}(N^2)$ cost grows prohibitively large for big $N$.
(2 pts) Implement fast matrix-vector product for $A_d$ as a function pagerank_matvec(A, d, x)
, which takes a PageRank matrix $A$ (in sparse format, e.g., csr_matrix
), damping factor $d$ and a vector $x$ as an input and returns $A_dx$ as an output.
(1 pts) Generate a random adjacency matrix of size $10000 \times 10000$ with only 100 non-zero elements and compare pagerank_matvec
performance with direct evaluation of $A_dx$.
# INPUT: A - np.ndarray (2D), d - float (from 0.0 to 1.0), x - np.ndarray (1D, size of A.shape[0/1])
# OUTPUT: y - np.ndarray (1D, size of x)
def pagerank_matvec(A, d, x): # 2 pts
# enter your code here
return y
Download the dataset from here, unzip it and put dblp_authors.npz
and dblp_graph.npz
in the same folder with this notebook. Each value (author name) from dblp_authors.npz
corresponds to the row/column of the matrix from dblp_graph.npz
. Value at row i
and column j
of the matrix from dblp_graph.npz
corresponds to the number of times author i
cited papers of the author j
. Let us now find the most significant scientists according to PageRank model over DBLP data.
load_dblp(...)
function. Print its density (fraction of nonzero elements). Find top-10 most cited authors from the weighted adjacency matrix. Now, make all the weights of the adjacency matrix equal to 1 for simplicity (consider only existence of connection between authors, not its weight). Obtain the PageRank matrix $A$ from the adjacency matrix and verify that it is stochastic.pagerank_matvec
to your power_method
(without rewriting it) for fast calculation of $A_dx$, you can create a LinearOperator
: L = scipy.sparse.linalg.LinearOperator(A.shape, matvec=lambda x, A=A, d=d: pagerank_matvec(A, d, x))
L@x
or L.dot(x)
will result in calculation of pagerank_matvec(A, d, x)
and, thus, you can plug $L$ instead of the matrix $A$ in the power_method
directly. Note: though in the previous subtask graph was very small (so you could disparage fast matvec implementation), here it is very large (but sparse), so that direct evaluation of $A_dx$ will require $\sim 10^{12}$ matrix elements to store - good luck with that (^_<).from scipy.sparse import load_npz
import numpy as np
def load_dblp(path_auth, path_graph):
G = load_npz(path_graph).astype(float)
with np.load(path_auth) as data: authors = data['authors']
return G, authors
G, authors = load_dblp('dblp_authors.npz', 'dblp_graph.npz')
# Your code is here
# INPUT:
# A_init - square matrix,
# num_iter - number of iterations for QR algorithm
# OUTPUT:
# Ak - transformed matrix A_init given by QR algorithm,
# convergence - numpy array of shape (num_iter, ),
# where we store the maximal number from the Chebyshev norm
# of triangular part of the Ak for every iteration
def qr_algorithm(A_init, num_iter): # 3 pts
# enter your code here
return Ak, convergence
plt.spy(Ak, precision=1e-7)
.# Your solution is here
plt.spy(Ak, precision=1e-7)
. Is this matrix lower triangular? How does this correspond to the claim about convergence of the QR algorithm?# Your solution is here
Let's redefine scalar product $ \forall x, y \in \mathbb{C}^n$ in a following way:
$$ [x,y]_J = y^{*}Jx, \text{s.t.}\ J = \text{diag}(j_{11}, j_{22}, \dots, j_{nn})\ \text{and}\ j_{ii} = \pm1\ \forall i \in [1,n].$$Denote rows of matrix $V \in \mathbb{C}^{n \times n}$ as $v_1, v_2, \dots, v_n$. Then $V$ is called $\textbf{J-orthonormal}$ iff
$$[v_i, v_k]_J = \pm \delta_{ik}.$$We will call matrix $T \in \mathbb{C}^{n \times n}$ $\textbf{almost triangular}$ iff $T$ is upper triangular with diagonal blocks of order $1$ or $2$.
Matrix $A \in \mathbb{C}^{n \times n}$ is said to be $\textbf{J-decomposable}$ if exist J-orthonormal matrix $V$ and upper triangular matrix $T$ such that
$$A = V T V^{-1}.$$Matrix $A \in \mathbb{C}^{n \times n}$ is said to have $\textbf{pseudoschur J-decomposition}$ if exist J-orthonormal matrix $V$ and almost triangular matrix $T$ such that
$$A = V T V^{-1}.$$This problem is to get familiar with the fact that two abovementioned decompositions exist not for any square matrix with complex entries.
(8 pts) Given that $A \in \mathbb{C}^{n \times n}$ is diagonalizable, show that it has pseudoschur J-decomposition for any $J$ of form $J=\text{diag}(\pm 1, \dots, \pm 1)$. Note that in order to solve that subproblem you should firstly prove the following fact:
Let $S \in \mathbb{C}^{m \times n},\ m \ge n,\ J = \text{diag}(\pm 1).$ If $A = S^{*}JS$ and $det(A) \ne 0$, then exists QR decomposition of $S$ with respect to $J$: $$S = P_1 QR P_2^{*} = P_1 Q \begin{bmatrix} R_1 \\ 0 \end{bmatrix} P_2^{*}, \ Q^{*} J^{'}Q = J^{'}, \ J^{'} = P_1^{*}JP_1,$$ where $P_1$ and $P_2$ are permutation matrices, $Q$ is called $J^{'}$- unitary and $R_1$ is almost triangular.
# Your solutuion is here
The application that we are particularly interested in is the approximation of a given matrix by a low-rank matrix:
$$ A \approx UV^T, A \in \mathbb{R}^{m \times n}, U \in \mathbb{R}^{m \times r}, V \in \mathbb{R}^{n \times r}.$$It is well known that the best (in any unitary invariant norm) low-rank approximation can be computed via singular value decomposition (SVD). As an alternative, we can consider skeleton decompostion of the form:
$$A \approx A_r = A(: , \mathcal{J})A(\mathcal{I} , \mathcal{J})^{-1}A(\mathcal{I} , :),$$where $\mathcal{I,J}$ are some index sets of length $r$.
Below we consider the matrix $N \times N$ derived from the following function discretization in the uniform grid in square $[0, 1] \times [0, 1]$: $ f(x, y) = \exp(-\sqrt{x^2 + y^2})$. It means that $A = [a_{ij}]$, where $a_{ij} = f(x_i, x_j)$ and $x_i = i / (N-1)$, $x_j = j / (N-1)$, where $i, j = 0,\ldots, N-1$.
np.meshgrid
can help you)# Your solution is here
(3 pts) Compute Skeleton approximation with random selection of rows and columns indices for $r = 5$ (check that submatrix in the intersection of rows and columns is nonsingular). Average the relative error $\frac{\|A - A_r \|_F}{\|A\|_F}$ over $M$ samples of column/row indices. Check that $M$ is sufficiently large to provide stable mean.
Note: extracting submatrices should be done according to numpy
e.g. A[index_set, :]
to extract selected rows, A[:, index_set]
to extract selected columns etc..
# Your solution is here
As you should know from the lecture, if $A$ is of rank $r$ and $\hat{A} = A(\mathcal{I} , \mathcal{J})$ is nonsingular, then the exact equality holds. In the approximate case, however, the quality of the approximation depends on the volume of the submatrix $\hat{A}$:
Theorem
If $\hat{A} = A_{max}$ has maximal in modulus determinant among all $r \times r$ submatrices of $A$, the following error etimate holds:
$$ \|A - A_r\|_1 \leq (r+1)\sigma_{r+1}.$$And the question is how to choose a good submatrix of nearly maximal volume in practice.
Definition: We call $r \times r$ submatrix $A_{dom}$ of rectangular $n \times r$ matrix $A$ of full rank dominant, if all the entries of $AA_{dom}^{-1}$ are not greater than $1$ in modulus.
The crucial theoretical result behind the scene is that the volume of any dominant submatrix $A_{dom}$ can not be very much smaller than the maximum volume submatrix $A_{max}$ (without proof).
We provide the following algorithm for constructing dominant submatrix of a tall matrix.
Algorithm 1:
Given matrix $A$ of size $n \times r$ finds dominant submatrix of size $r \times r$
step 0. Start with arbitrary nonsingular $r \times r$ submatrix $A_{dom}$. Reorder rows in $A$ so that $A_{dom}$ occupies first $r$ rows in $A$.
step 1. Compute $B = AA_{dom}^{-1}$ and find its maximum in module entry $b_{ij}$.
step 2. If $|b_{ij}| > 1 + \delta$, then:
Swap rows $i$ and $j$ in $B$ (accrodignly in A). By swapping the rows we have increased the volume of the upper submatrix in $B$, as well as in $A$ (why?). Let $A_{dom}$ be the new upper submatrix of $A$ and go to step 1.
elseif $|b_{ij}| < 1 + \delta$:
return $A_{dom}$.
Note: $\delta = 10^{-2}$ seems to be a good practical choice.
row_indices
array which can be used as A[row_indices, :]
to extract selected submatrix.Note that matrix inverse $A_{dom}^{-1}$ in step 3 has to be updated efficiently using Shermann-Morison formula (inverse of rank-1 update).
def dominant_submatrix_search(A):
# Your code is here
return row_indices
# check the convergence of your implementation on random data
row_indices = dominant_submatrix_search(np.random.rand(5000, 10))
Hint:
1) start with random selection of columns
2) if you transpose the matrix for which you find rows, then Algorithm 1 will give you update for columns in the initial matrix
def skeleton(A, r):
# Your code is here
return row_indices, column_indices
skeleton
approximation algorithm with SVD and discuss its pros and cons.# Your solution is here