Sparse direct solvers:
Why sparse linear systems can be solved faster, what is the technique?
In the LU factorization of the matrix $A$ the factors $L$ and $U$ can be also sparse:
Note that the inverse matrix to sparse matrix is not sparse!
import numpy as np
import scipy.sparse as spsp
n = 7
ex = np.ones(n);
a = spsp.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
b = np.array(np.linalg.inv(a.toarray()))
print(a.toarray())
print(b)
np.linalg.svd(b[:3, 4:])[1]
[[-2. 1. 0. 0. 0. 0. 0.] [ 1. -2. 1. 0. 0. 0. 0.] [ 0. 1. -2. 1. 0. 0. 0.] [ 0. 0. 1. -2. 1. 0. 0.] [ 0. 0. 0. 1. -2. 1. 0.] [ 0. 0. 0. 0. 1. -2. 1.] [ 0. 0. 0. 0. 0. 1. -2.]] [[-0.875 -0.75 -0.625 -0.5 -0.375 -0.25 -0.125] [-0.75 -1.5 -1.25 -1. -0.75 -0.5 -0.25 ] [-0.625 -1.25 -1.875 -1.5 -1.125 -0.75 -0.375] [-0.5 -1. -1.5 -2. -1.5 -1. -0.5 ] [-0.375 -0.75 -1.125 -1.5 -1.875 -1.25 -0.625] [-0.25 -0.5 -0.75 -1. -1.25 -1.5 -0.75 ] [-0.125 -0.25 -0.375 -0.5 -0.625 -0.75 -0.875]]
array([1.75000000e+00, 7.67644213e-17, 5.66270826e-36])
$L$ and $U$ are typically sparse. In the tridiagonal case they are even bidiagonal!
from scipy.sparse.linalg import splu
T = splu(a.tocsc(), permc_spec="NATURAL")
print(T.L.toarray())
[[ 1. 0. 0. 0. 0. 0. 0. ] [-0.5 1. 0. 0. 0. 0. 0. ] [ 0. -0.66666667 1. 0. 0. 0. 0. ] [ 0. 0. -0.75 1. 0. 0. 0. ] [ 0. 0. 0. -0.8 1. 0. 0. ] [ 0. 0. 0. 0. -0.83333333 1. 0. ] [ 0. 0. 0. 0. 0. -0.85714286 1. ]]
Interesting to note that splu
without permc_spec
will produce permutations which will not yield the bidiagonal factor:
from scipy.sparse.linalg import splu
T = splu(a.tocsc())
print(T.L.todense())
print(T.perm_c)
[[ 1. 0. 0. 0. 0. 0. 0. ] [-0.5 1. 0. 0. 0. 0. 0. ] [ 0. -0.66666667 1. 0. 0. 0. 0. ] [ 0. 0. -0.75 1. 0. 0. 0. ] [ 0. 0. 0. 0. 1. 0. 0. ] [ 0. 0. 0. -0.8 -0.5 1. 0. ] [ 0. 0. 0. 0. -0.5 -0.71428571 1. ]] [0 1 2 3 5 4 6]
From a matrix that comes as a discretization of a two-dimensional problem everything is much worse:
import scipy as sp
import scipy.sparse as spsp
import scipy.sparse.linalg
import matplotlib.pyplot as plt
%matplotlib inline
n = 20
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = spsp.csc_matrix(A)
T = spsp.linalg.splu(A)
plt.spy(T.L, marker='.', color='k', markersize=8)
#plt.spy(A, marker='.', color='k', markersize=8)
<matplotlib.lines.Line2D at 0x1192c9850>
For correct permutation in 2D case the number of nonzeros in $L$ factor grows as $\mathcal{O}(N \log N)$. But complexity is $\mathcal{O}(N^{3/2})$.
The number of nonzeros in LU decomposition has a deep connection to the graph theory.
networkx package
can be used to visualize graphs, given only the adjacency matrix.
It may even recover to some extend the graph structure.
import networkx as nx
n = 10
ex = np.ones(n);
lp1 = sp.sparse.spdiags(np.vstack((ex, -2*ex, ex)), [-1, 0, 1], n, n, 'csr');
e = sp.sparse.eye(n)
A = sp.sparse.kron(lp1, e) + sp.sparse.kron(e, lp1)
A = spsp.csc_matrix(A)
G = nx.Graph(A)
nx.draw(G, pos=nx.spectral_layout(G), node_size=50)
The fill-in of a matrix are those entries which change from an initial zero to a nonzero value during the execution of an algorithm.
The fill-in is different for different permutations. So, before factorization we need to find reordering which produces the smallest fill-in.
Example
$$A = \begin{bmatrix} * & * & * & * & *\\ * & * & 0 & 0 & 0 \\ * & 0 & * & 0 & 0 \\ * & 0 & 0& * & 0 \\ * & 0 & 0& 0 & * \end{bmatrix} $$If we eliminate elements from the top to the bottom, then we will obtain dense matrix. However, we could maintain sparsity if elimination was done from the bottom to the top.
Given matrix $A=A^*>0$ we calculate its Cholesky decomposition $A = LL^*$.
Factor $L$ can be dense even if $A$ is sparse:
$$ \begin{bmatrix} * & * & * & * \\ * & * & & \\ * & & * & \\ * & & & * \end{bmatrix} = \begin{bmatrix} * & & & \\ * & * & & \\ * & * & * & \\ * & * & * & * \end{bmatrix} \begin{bmatrix} * & * & * & * \\ & * & * & * \\ & & * & * \\ & & & * \end{bmatrix} $$How to make factors sparse, i.e. to minimize the fill-in?
We need to find a permutation of indices so that factors are sparse, i.e. we build Cholesky factorisation of $PAP^\top$, where $P$ is a permutation matrix.
For the example from the previous slide
$$ P \begin{bmatrix} * & * & * & * \\ * & * & & \\ * & & * & \\ * & & & * \end{bmatrix} P^\top = \begin{bmatrix} * & & & * \\ & * & & * \\ & & * & * \\ * & * & * & * \end{bmatrix} = \begin{bmatrix} * & & & \\ & * & & \\ & & * & \\ * & * & * & * \end{bmatrix} \begin{bmatrix} * & & & * \\ & * & & * \\ & & * & * \\ & & & * \end{bmatrix} $$where
$$ P = \begin{bmatrix} & & & 1 \\ & & 1 & \\ & 1 & & \\ 1 & & & \end{bmatrix} $$import numpy as np
import scipy.sparse as spsp
import scipy.sparse.linalg as spsplin
import scipy.linalg as splin
import matplotlib.pyplot as plt
%matplotlib inline
A = spsp.coo_matrix((np.random.randn(10), ([0, 0, 0, 0, 1, 1, 2, 2, 3, 3],
[0, 1, 2, 3, 0, 1, 0, 2, 0, 3])))
print("Original matrix")
plt.spy(A)
plt.show()
lu = spsplin.splu(A.tocsc(), permc_spec="NATURAL")
print("L factor")
plt.spy(lu.L)
plt.show()
print("U factor")
plt.spy(lu.U)
plt.show()
print("Column permutation:", lu.perm_c)
print("Row permutation:", lu.perm_r)
Original matrix
L factor
U factor
Column permutation: [0 1 2 3] Row permutation: [1 3 2 0]
then
$$ PAP^\top = \begin{bmatrix} A_{11} & 0 & 0 \\ 0 & A_{22} & 0 \\ A_{31} & A_{32} & A_{33} - A_{31}A_{11}^{-1} A_{13} - A_{32}A_{22}^{-1}A_{23} \end{bmatrix} \begin{bmatrix} I & 0 & A_{11}^{-1}A_{13} \\ 0 & I & A_{22}^{-1}A_{23} \\ 0 & 0 & I\end{bmatrix} $$Reordering the rows and the columns of the sparse matrix in order to reduce the number of nonzeros in $L$ and $U$ factors is called fill-in minimization.
Unfortunately, this paper by Rose and Tarjan in 1975 proves that fill-in minimization problem is NP-complete.
But, many heuristics exist:
Graphs of $\begin{bmatrix} * & * & * & * \\ * & * & & \\ * & & * & \\ * & & & * \end{bmatrix}$ and $\begin{bmatrix} * & & & * \\ & * & & * \\ & & * & * \\ * & * & * & * \end{bmatrix}$ have the following form:
and
The idea is to eliminate rows and/or columns with fewer non-zeros, update fill-in and then repeat. How it relates to Markowitz pivoting?
Efficient implementation is an issue (adding/removing elements).
Current champion is "approximate minimal degree" by Amestoy, Davis, Duff.
It is suboptimal even for 2D PDE problems
SciPy sparse package uses minimal ordering approach for different matrices ($A^{\top}A$, $A + A^{\top}$)
Definition. A separator in a graph $G$ is a set $S$ of vertices whose removal leaves at least two connected components.
Separator $S$ gives the following ordering for an $N$-vertex graph $G$:
Separator for the 2D Laplacian matrix
$$ A_{2D} = I \otimes A_{1D} + A_{1D} \otimes I, \quad A_{1D} = \mathrm{tridiag}(-1, 2, -1), $$is as follows
Once we have enumerated first indices in $\alpha$, then in $\beta$ and separators indices in $\sigma$ we get the following matrix
$$ PAP^\top = \begin{bmatrix} A_{\alpha\alpha} & & A_{\alpha\sigma} \\ & A_{\beta\beta} & A_{\beta\sigma} \\ A_{\sigma\alpha} & A_{\sigma\beta} & A_{\sigma\sigma}\end{bmatrix} $$which has arrowhrad structure.
For blocks $A_{\alpha\alpha}$, $A_{\beta\beta}$ we continue splitting recursively.
When the recursion is done, we need to eliminate blocks $A_{\sigma\alpha}$ and $A_{\sigma\beta}$.
This makes block in the position of $A_{\sigma\sigma}\in\mathbb{R}^{n\times n}$ dense.
Calculation of Cholesky of this block costs $\mathcal{O}(n^3) = \mathcal{O}(N^{3/2})$, where $N = n^2$ is the total number of nodes.
So, the complexity is $\mathcal{O}(N^{3/2})$
All of them have interfaces for C/C++, Fortran and Matlab
Computing separators is not a trivial task.
Graph partitioning heuristics has been an active research area for many years, often motivated by partitioning for parallel computation.
Existing approaches:
The idea of spectral partitioning goes back to Miroslav Fiedler, who studied connectivity of graphs (paper).
We need to split the vertices into two sets.
Consider +1/-1 labeling of vertices and the cost
and since we have +1/-1 labels, we have
$$\sum_i x^2_i = n \quad \Longleftrightarrow \quad \|x\|_2^2 = n.$$Cost $E_c$ can be written as (check why)
$$E_c = (Lx, x)$$where $L$ is the graph Laplacian, which is defined as a symmetric matrix with
$$L_{ii} = \mbox{degree of node $i$},$$$$L_{ij} = -1, \quad \mbox{if $i \ne j$ and there is an edge},$$and $0$ otherwise.
Minimization of $E_c$ with the mentioned constraints leads to a partitioning that tries to minimize number of edges in a separator, while keeping the partition balanced.
We now relax the integer quadratic programming to the continuous quadratic programming
Since $e$ is the eigenvector, corresponding to the smallest eigenvalue, on the space $x^\top e =0$ we get the second minimal eigevalue.
The sign $x_i$ indicates the partitioning.
In computations, we need to find out, how to find this second minimal eigenvalue –– we at least know about power method, but it finds the largest. We will discuss iterative methods for eigenvalue problems later in our course.
This is the main goal of the iterative methods for large-scale linear problems, and can be achieved via few matrix-by-vector products.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import networkx as nx
kn = nx.read_gml('karate.gml')
print("Number of vertices = {}".format(kn.number_of_nodes()))
print("Number of edges = {}".format(kn.number_of_edges()))
nx.draw_networkx(kn, node_color="red") #Draw the graph
Number of vertices = 34 Number of edges = 78
import scipy.sparse.linalg as spsplin
Laplacian = nx.laplacian_matrix(kn).asfptype()
plt.spy(Laplacian, markersize=5)
plt.title("Graph laplacian")
plt.axis("off")
plt.show()
eigval, eigvec = spsplin.eigsh(Laplacian, k=2, which="SM")
print("The 2 smallest eigenvalues =", eigval)
The 2 smallest eigenvalues = [-4.14058418e-15 4.68525227e-01]
plt.scatter(np.arange(len(eigvec[:, 1])), np.sign(eigvec[:, 1]))
plt.show()
print("Sum of elements in Fiedler vector = {}".format(np.sum(eigvec[:, 1].real)))
Sum of elements in Fiedler vector = 8.743006318923108e-16
nx.draw_networkx(kn, node_color=np.sign(eigvec[:, 1]))
Definition. The algebraic connectivity of a graph is the second-smallest eigenvalue of the Laplacian matrix.
Claim. The algebraic connectivity of a graph is greater than 0 if and only if a graph is a connected graph.
Computing bisection recursively is expensive.
As an alternative, one typically computes multilevel bisection that consists of 3 phases.
Once the permutation has been computed, we need to implement the elimination, making use of efficient computational kernels.
If in the elemination we will be able to get the elements into blocks, we will be able to use BLAS-3 computations.
It is done by supernodal data structures:
If adjacent rows have the same sparsity structure, they can be stored in blocks:
Also, we can use such structure in efficient computations!
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()