# NLA and neural networks

## Neural network is a composition of a set of functions

- These functions are typically linear and non-linear
- You can represent this network as a computational graph
- This representation helps a lot in [automatic differentiation](https://en.wikipedia.org/wiki/Automatic_differentiation)

## What is a gradient and how we can compute it automatically?

- If $f$ depends on vector or matrix and returns a scalar, then its gradient is a vector or matrix, whose elements are derivatives of output w.r.t. corresponding element of the input
- Example: $f(x) = x^{\top}Ax + b^{\top}x$. What is a gradient of this simple quadratic function?
- Elementwise example: $f(x) = x^2$. What is a Jacobi matrix?

In [18]:
import torch

n = 5
A = torch.randn((n, n), requires_grad=True)
b = torch.randn((n,))
x = torch.randn((n,), requires_grad=True)

f = 0.5 * x @ A @ x - b @ x
f.backward()
print(f)
print(f.item())

tensor(-7.4333, grad_fn=<SubBackward0>)
-7.433262825012207


In [20]:
manual_grad_x = 0.5 * (A + A.t()) @ x - b

print(manual_grad_x.data)
print(x.grad.data)
print(x.grad)

tensor([-0.0997, -3.0563,  5.3020,  1.2093, -4.0723])
tensor([-0.0997, -3.0563,  5.3020,  1.2093, -4.0723])
tensor([-0.0997, -3.0563,  5.3020,  1.2093, -4.0723])


In [21]:
manual_grad_A = 0.5 * torch.ger(x, x)

print(manual_grad_A.data)
print(A.grad.data)
print(torch.norm(manual_grad_A.data - A.grad.data).item())

tensor([[ 0.5078,  0.7830, -1.0070, -0.1017,  0.4298],
        [ 0.7830,  1.2074, -1.5528, -0.1568,  0.6628],
        [-1.0070, -1.5528,  1.9971,  0.2017, -0.8524],
        [-0.1017, -0.1568,  0.2017,  0.0204, -0.0861],
        [ 0.4298,  0.6628, -0.8524, -0.0861,  0.3639]])
tensor([[ 0.5078,  0.7830, -1.0070, -0.1017,  0.4298],
        [ 0.7830,  1.2074, -1.5528, -0.1568,  0.6628],
        [-1.0070, -1.5528,  1.9971,  0.2017, -0.8524],
        [-0.1017, -0.1568,  0.2017,  0.0204, -0.0861],
        [ 0.4298,  0.6628, -0.8524, -0.0861,  0.3639]])
0.0


## How these operations relate to the neural networks?

- Supervised learning and other problem statements 
- We have data, we have labels
- Neural network $\approx$ **complex** composition of **simple** functions, which recovers label based on feeded data
- To train target function a.k.a. loss function, one of the stochatic first-order method is used
- Therefore, we need gradients!

<img src="./pytorch_logo.png">

- It is a convenient framework for constructing and training neural networks
- It implements dynamic computational graph
- You just implement function that computes required quantities, call it with some arguments and you can get the gradient of this function for this arguments for free!
- Sparse matrices support is still under development

### What operations are typically used as an ingredients of NN?

- Matrix by vector (or matrix) multiplication and additions - fully-connected layers
- Elementwise operations are non-linearities
- Local operations like [MaxPooling](https://deepai.org/machine-learning-glossary-and-terms/max-pooling)
- Different reduction operations to make a scalar from vector

## Summary

- NLA is a basis of neural networks as well as of other computational techniques
- Moderm frameworks (PyTorch, JAX, TF, etc) computes gradients automatically
- [Colab](https://colab.research.google.com/) allows you to perform simple tests and improve your understanding the topic