Let $f(x): \mathbb{R}^n \to \mathbb{R}$, then vector, which contains all first order partial derivatives: $$\nabla f(x) = \dfrac{df}{dx} = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix}$$ named gradient of $f(x)$. This vector indicates the direction of steepest ascent. Thus, vector $- \nabla f(x)$ means the direction of the steepest descent of the function in the point. Moreover, the gradient vector is always orthogonal to the contour line in the point.
$$\nabla f(x)^T = \dfrac{df}{dx^T} = \left(\frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right)$$
Let $f(x): \mathbb{R}^n \to \mathbb{R}$, then matrix, containing all the second order partial derivatives: $$f''(x) = \dfrac{d(\nabla f)}{dx^T} = \dfrac{d\left(\nabla f^T\right)}{dx} = \begin{pmatrix} \frac{\partial^2 f}{\partial x_1 \partial x_1} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2 \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \dots & \frac{\partial^2 f}{\partial x_n \partial x_n} \end{pmatrix}$$
But actually, Hessian could be a tensor in such a way: $\left(f(x): \mathbb{R}^n \to \mathbb{R}^m \right)$ is just 3d tensor, every slice is just hessian of corresponding scalar function $\left( H\left(f_1(x)\right), H\left(f_2(x)\right), \ldots, H\left(f_m(x)\right)\right)$
The extension of the gradient of multidimensional $f(x): \mathbb{R}^n \to \mathbb{R}^m$: $$f'(x) = \dfrac{df}{dx^T} = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \dots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \dots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \frac{\partial f_m}{\partial x_2} & \dots & \frac{\partial f_m}{\partial x_n} \end{pmatrix}$$
$$f(x) : X \to Y; \;\;\;\;\;\;\;\; \frac{\partial f(x)}{\partial x} \in G$$
X | Y | G | Name |
---|---|---|---|
$\mathbb{R}$ | $\mathbb{R}$ | $\mathbb{R}$ | $f'(x)$ (derivative) |
$\mathbb{R}^n$ | $\mathbb{R}$ | $\mathbb{R^n}$ | $\dfrac{\partial f}{\partial x_i}$ (gradient) |
$\mathbb{R}^n$ | $\mathbb{R}^m$ | $\mathbb{R}^{n \times m}$ | $\dfrac{\partial f_i}{\partial x_j}$ (jacobian) |
$\mathbb{R}^{m \times n}$ | $\mathbb{R}$ | $\mathbb{R}^{m \times n}$ | $\dfrac{\partial f}{\partial x_{ij}}$ |
I order approximation $$ df(x) = \langle \nabla f(x), dx\rangle $$
II order approximation
$$ d^2f(x) = \langle \nabla^2 f(x) dx_1, dx_2\rangle = \langle H_f(x) dx_1, dx_2\rangle $$
Solution:
Calculate $\nabla f(x)$, if $f(x) = \dfrac{1}{2}x^TAx + b^Tx + c$
Solution:
Calculate $\nabla f(x), f''(x)$, if $f(x) = -e^{-x^Tx}$
Решение:
Absolutely the same logic could be $$g_k = \dfrac{\partial f(x)}{\partial x_k} \rightarrow H_{k,p} = \dfrac{\partial g_k}{\partial x_p}$$ $$H_{k,p} = - \left( e ^{-\sum\limits_i x_i^2} \cdot 2x_p\right) 2x_k + 2 e ^{-\sum\limits_i x_i^2} \dfrac{\partial x_k}{\partial x_p} = 2 e ^{-\sum\limits_i x_i^2} \cdot \left( \dfrac{\partial x_k}{\partial x_p} - 2x_px_k\right)$$
Finally, $f''(x) = H_{f(x)} = 2e^{-x^Tx} \left( E - 2 xx^T\right)$
Calculate $\nabla f(x)$, if $f(x) = \dfrac{1}{2} \|Ax - b\|^2_2$
Solution:
$$ f(x) = \dfrac{1}{2} \langle Ax-b, Ax-b\rangle $$
$$ d f(x) = \dfrac{1}{2} 2 \langle Ax-b, Adx\rangle $$
$$ d f(x) = \langle A^\top(Ax-b), dx\rangle \;\;\;\; \to \nabla f(x) = A^\top (Ax-b) $$
Calculate $\nabla f(x)$, if $f(x) = \ln \langle Ax, x \rangle$
Solution:
$$ df(x) = \dfrac{d\langle Ax, x \rangle}{\langle Ax, x \rangle} = \dfrac{Ad\langle x, x \rangle}{\langle Ax, x \rangle} = \dfrac{2A \langle x, dx \rangle}{\langle Ax, x \rangle} = \dfrac{2\langle Ax, dx \rangle}{\langle Ax, x \rangle} $$
Calculate $\nabla f(X)$, if $f(X) = \text{tr } AX$
Solution:
$$ f(X) = \langle A^\top, X\rangle $$
$$ d f(X) = d \langle A^\top, X\rangle = \langle A^\top, dX\rangle $$
$$ \nabla f(X) = A^\top $$
Calculate $\nabla f(X)$, if $f(X) = \langle S, X\rangle - \log \det X$
Solution:
$$ df(X) = \langle S, dX \rangle - \dfrac{d(\det X)}{\det X} $$
$$ df(X) = \langle S, dX \rangle - \dfrac{\det X \langle X^{-\top}, dX\rangle}{\det X} $$
Suppose, that $\det X \neq 0$
$$ df(X) = \langle S, dX \rangle - \langle X^{-\top}, dX\rangle = \langle S - X^{-\top} , dX\rangle $$