PyTorch¶

Why do we use it?¶

  1. Many research works related to Deep Learning (DL) are implemented on PyTorch
  2. It's a constantly evolving and expanding package (functorch)
  3. Package provide wide variety of tools to make your code computationally efficient (GPU)

How to initialize vectors, matrices, tensors (object with number of dimensions more than 2)?¶

Manual definition¶

In [3]:
x = torch.tensor([[0,1,0],
                  [1,2,3],
                  [2,4,6]
                  ])
x
Out[3]:
tensor([[0, 1, 0],
        [1, 2, 3],
        [2, 4, 6]])

From numpy¶

In [ ]:
x = np.random.uniform(size=(3,3))
torch.from_numpy(x)
Out[ ]:
tensor([[0.4860, 0.0246, 0.7205],
        [0.4484, 0.6652, 0.0402],
        [0.9689, 0.5528, 0.7985]], dtype=torch.float64)

Zero, ones, random vector¶

Zeros¶

In [ ]:
dims = (4)
x = torch.zeros(dims)
x
Out[ ]:
tensor([0., 0., 0., 0.])

Ones¶

In [ ]:
dims = (4)
x = torch.ones(dims)
x
Out[ ]:
tensor([1., 1., 1., 1.])

Range¶

In [ ]:
x = torch.arange(10)
x
Out[ ]:
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Linspace¶

In [ ]:
x = torch.linspace(-1,1,10)
x
Out[ ]:
tensor([-1.0000, -0.7778, -0.5556, -0.3333, -0.1111,  0.1111,  0.3333,  0.5556,
         0.7778,  1.0000])

Matrices¶

Identity matrix¶

In [ ]:
x = torch.eye(3)
x
Out[ ]:
tensor([[1., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.]])

Diagonal of matrix¶

In [ ]:
x.diag()
Out[ ]:
tensor([1., 1., 1.])

Diagonal matrix¶

In [ ]:
x = torch.diag(torch.arange(3))
x
Out[ ]:
tensor([[0, 0, 0],
        [0, 1, 0],
        [0, 0, 2]])

Random matrix (uniform distribution)¶

In [ ]:
dims = (3,3)
x = torch.rand(dims)
x
Out[ ]:
tensor([[0.1512, 0.5654, 0.4128],
        [0.4651, 0.1956, 0.7034],
        [0.7692, 0.8898, 0.7750]])

Random matrix (normal distribution)¶

In [ ]:
dims = (3,3)
x = torch.randn(dims)
x
Out[ ]:
tensor([[-0.2327, -0.0362,  0.5722],
        [ 0.9502,  1.0798,  0.4765],
        [-0.0577, -2.3726,  0.5873]])

How calculate statistics on tensors¶

In [ ]:
x.mean(), x.std(), x.median(), x.min(), x.max()
Out[ ]:
(tensor(0.1074),
 tensor(1.0340),
 tensor(0.4765),
 tensor(-2.3726),
 tensor(1.0798))
In [ ]:
x.mean(dim=1), x.std(dim=1), x.median(dim=1)[0], x.min(dim=1)[0], x.max(dim=1)[0]
Out[ ]:
(tensor([ 0.1011,  0.8355, -0.6144]),
 tensor([0.4197, 0.3176, 1.5565]),
 tensor([-0.0362,  0.9502, -0.0577]),
 tensor([-0.2327,  0.4765, -2.3726]),
 tensor([0.5722, 1.0798, 0.5873]))
In [ ]:
x.median(dim=1)[1], x.min(dim=1)[1], x.max(dim=1)[1]
Out[ ]:
(tensor([1, 0, 0]), tensor([0, 2, 1]), tensor([2, 1, 2]))

Numpy operations in PyTorch¶

Multiplication¶

In [ ]:
x = torch.randn((3,2))
y = torch.randn((2,3))
x @ y
Out[ ]:
tensor([[-0.0930, -1.2068, -0.9818],
        [ 0.2951,  1.0160,  0.2693],
        [-1.1653, -1.0365,  1.9469]])
In [ ]:
x = torch.randn((1,3))
y = torch.randn((3,1))
x
Out[ ]:
tensor([[-0.8451,  1.2279, -1.0831]])
In [ ]:
y
Out[ ]:
tensor([[0.3426],
        [0.9155],
        [0.2896]])
In [ ]:
x @ y
Out[ ]:
tensor([[0.5209]])
In [ ]:
y @ x
Out[ ]:
tensor([[-0.0292,  0.2806,  0.4464],
        [ 0.0248, -0.2386, -0.3797],
        [-0.0213,  0.2045,  0.3254]])

Reshaping and dimension permutation¶

In [ ]:
x = torch.randn((2,3,4))
x
Out[ ]:
tensor([[[ 1.4408,  1.0554, -0.4302,  2.4403],
         [-1.5278, -1.2142,  0.7821,  0.2796],
         [ 0.5764, -0.8881, -2.0423, -0.9508]],

        [[ 0.9360, -0.5286, -0.2148, -0.6397],
         [-0.5315,  0.3589, -0.6161, -1.6595],
         [ 0.2836, -0.8500,  1.2469, -0.2715]]])
In [ ]:
x.reshape(2*3,4)
Out[ ]:
tensor([[ 1.4408,  1.0554, -0.4302,  2.4403],
        [-1.5278, -1.2142,  0.7821,  0.2796],
        [ 0.5764, -0.8881, -2.0423, -0.9508],
        [ 0.9360, -0.5286, -0.2148, -0.6397],
        [-0.5315,  0.3589, -0.6161, -1.6595],
        [ 0.2836, -0.8500,  1.2469, -0.2715]])
In [ ]:
x.reshape(-1,4), x.reshape(-1,4).shape
Out[ ]:
(tensor([[ 0.0159,  0.5230,  0.4219],
         [-0.8664, -1.3224, -0.2815],
         [ 0.3624, -0.9631, -0.1161],
         [-0.2072, -1.8525,  0.1093]]), torch.Size([4, 3]))
In [ ]:
x.permute(1,0,2).shape
Out[ ]:
torch.Size([3, 2, 4])
In [ ]:
x.permute(0,2,1).reshape(-1,3).shape
Out[ ]:
torch.Size([8, 3])

Decompositions¶

In [1]:
import torch
In [3]:
x = torch.randn((100,100))
In [6]:
U, S, V = torch.svd(x)
Q, R = torch.linalg.qr(x)
In [15]:
x = torch.randn((100,10)) @ torch.randn((10,100))
In [16]:
%timeit torch.svd(x)
714 µs ± 36.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [17]:
%timeit torch.svd_lowrank(x,q=10)
283 µs ± 8.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

How to transfer data to GPU¶

In [ ]:
torch.cuda.is_available()
Out[ ]:
True
In [ ]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
Out[ ]:
device(type='cuda')
In [ ]:
%timeit x = torch.randn((100,100)).to(device)
106 µs ± 2.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [ ]:
%timeit x = torch.randn((100,100), device=device)
9.96 µs ± 114 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [ ]:
x = torch.randn((5,5), device=device)
print(x.get_device())
x = x.detach().cpu()
print(x.get_device())
0
-1
In [ ]:
x = torch.randn((100,1000),device="cpu")
y = torch.randn((1000,100),device="cpu")
%timeit x @ y
357 µs ± 5.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [ ]:
x = torch.randn((100,1000),device="cuda")
y = torch.randn((1000,100),device="cuda")
%timeit x @ y
21.8 µs ± 506 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [ ]:
x = torch.randn((100,1000),device="cuda")
y = torch.randn((1000,100),device="cpu")
x @ y
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-22e087ccddd4> in <module>
      1 x = torch.randn((100,1000),device="cuda")
      2 y = torch.randn((1000,100),device="cpu")
----> 3 x @ y

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

Neural Networks¶

Simple network¶

In [ ]:
net = torch.nn.Linear(100,10)
x = torch.randn((5,100))
x.shape
Out[ ]:
torch.Size([5, 100])
In [ ]:
out = net(x)
out.shape
Out[ ]:
torch.Size([5, 10])
In [ ]:
out_manual = []
for x_one in x:
    out_manual.append(net(x_one[None]))
out_manual = torch.cat(out_manual, dim=0)
In [ ]:
torch.norm(out-out_manual)
Out[ ]:
tensor(7.7970e-07, grad_fn=<CopyBackwards>)
In [ ]:
x_one.shape, x_one[None].shape
Out[ ]:
(torch.Size([100]), torch.Size([1, 100]))
In [ ]:
x_one.unsqueeze(0).shape
Out[ ]:
torch.Size([1, 100])

More complex networks¶

In [ ]:
net = torch.nn.Sequential(
    torch.nn.Linear(100,50),
    torch.nn.ReLU(),
    torch.nn.Linear(50,10)
)
net
Out[ ]:
Sequential(
  (0): Linear(in_features=100, out_features=50, bias=True)
  (1): ReLU()
  (2): Linear(in_features=50, out_features=10, bias=True)
)
$$\mathrm{net} = f_2(f_1(f_0(x)))$$$$f_0() = Linear(100,50)()$$$$f_1() = ReLU()$$$$f_2() = Linear(50,10)()$$
In [ ]:
x = torch.randn((5,100))
In [ ]:
out = net(x)
In [ ]:
out_manual = x
for i in range(len(net)):
    out_manual = net[i](out_manual)
In [ ]:
torch.norm(out - out_manual)
Out[ ]:
tensor(0., grad_fn=<CopyBackwards>)

How to calc gradients and jacobians? (functorch)¶

In [ ]:
net = torch.nn.Sequential(
    torch.nn.Linear(100,10),
    torch.nn.ReLU(),
    torch.nn.Linear(10,1)
).to(device)
In [ ]:
x = torch.randn((1,100), device="cuda")
x_var = torch.autograd.Variable(x,requires_grad=True)
In [ ]:
grads = torch.autograd.grad(outputs=net(x_var), inputs=x_var, grad_outputs=torch.ones_like(net(x)))[0]
In [ ]:
grads.shape
Out[ ]:
torch.Size([1, 100])

Calculation of jacobian using standard tools¶

In [ ]:
net = torch.nn.Sequential(
    torch.nn.Linear(100,50),
    torch.nn.ReLU(),
    torch.nn.Linear(50,20)
).to(device)
x = torch.randn((1,100), device="cuda")
x_var = torch.autograd.Variable(x,requires_grad=True)
grads = torch.autograd.grad(outputs=net(x_var), inputs=x_var, grad_outputs=torch.ones_like(net(x)))[0]
In [ ]:
grads.shape, net(x).shape
Out[ ]:
(torch.Size([1, 100]), torch.Size([1, 20]))

[1,20,100]

In [ ]:
def slow_grad_calc(x):
    grads = []
    for i in range(20):
        x_var = torch.autograd.Variable(x,requires_grad=True)
        out = net(x_var)[:,i]
        grad = torch.autograd.grad(out,x_var,torch.ones_like(out))[0][:,None]
        grads.append(grad)
    return torch.cat(grads,dim=1)
In [ ]:
grads = slow_grad_calc(x)
grads.shape
Out[ ]:
torch.Size([1, 20, 100])
In [ ]:
%timeit slow_grad_calc(x)
9.51 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

FuncTorch¶

In [ ]:
jac_func = jacfwd(net)
jac_func(x).shape
Out[ ]:
torch.Size([1, 20, 1, 100])
In [ ]:
%timeit jac_func(x)
1.17 ms ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

FuncTorch operations over batch¶

In [ ]:
x = torch.randn((10,100),device=device)
In [ ]:
grads = jac_func(x)
grads.shape
Out[ ]:
torch.Size([10, 20, 10, 100])
In [ ]:
grads[0,:,0,:]
Out[ ]:
tensor([[-0.0077,  0.0243, -0.0019,  ...,  0.0293, -0.0044,  0.0351],
        [-0.0030,  0.0255,  0.0129,  ...,  0.0025, -0.0403,  0.0171],
        [ 0.0011, -0.0244,  0.0128,  ..., -0.0139,  0.0124,  0.0274],
        ...,
        [-0.0033, -0.0449, -0.0122,  ..., -0.0086,  0.0235, -0.0176],
        [-0.0164, -0.0018, -0.0123,  ...,  0.0323, -0.0443, -0.0130],
        [ 0.0219, -0.0086,  0.0348,  ..., -0.0059, -0.0297,  0.0355]],
       device='cuda:0', grad_fn=<SliceBackward0>)
In [ ]:
grads[0,:,1,:]
Out[ ]:
tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]], device='cuda:0',
       grad_fn=<SliceBackward0>)
In [ ]:
vmap(jac_func)(x.unsqueeze(1)).shape
Out[ ]:
torch.Size([10, 1, 20, 1, 100])
In [ ]:
x.unsqueeze(1).shape
Out[ ]:
torch.Size([10, 1, 100])
In [ ]:
net(x,index=0)
i = 0
func = lambda x: net(x,index=i)
jacfwd(func)