import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
v = np.array([1, 2])
C = A @ B
d = A @ v
print(f"C={C}")
print(f"d={d}")C=[[19 22]
[43 50]]
d=[ 5 11]
Advanced Statistical Inference
EURECOM
\[ \require{physics} \definecolor{input}{rgb}{0.42, 0.55, 0.74} \definecolor{params}{rgb}{0.51,0.70,0.40} \definecolor{output}{rgb}{0.843, 0.608, 0} \definecolor{vparams}{rgb}{0.58, 0, 0.83} \definecolor{noise}{rgb}{0.0, 0.48, 0.65} \definecolor{latent}{rgb}{0.8, 0.0, 0.8} \]
Vectors \({\boldsymbol{v}}\in {\mathbb{R}}^n\) \[ {\boldsymbol{v}}= \left[ \begin{array}{c } v_1 \\ \vdots \\ v_n \end{array} \right] \]
Matrices \({\boldsymbol{A}}\in {\mathbb{R}}^{m \times n}\) \[ {\boldsymbol{A}}= \left[ \begin{array}{c c c c } A_{11} & A_{12} & \cdots & A_{1n} \\ A_{21} & A_{22} & \cdots & A_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ A_{m1} & A_{m2} & \cdots & A_{mn} \\ \end{array} \right] \]
Matrix addition: \(({\boldsymbol{A}}+ {\boldsymbol{B}})_{ij} = a_{ij} + b_{ij}\)
Scalar multiplication: \((\gamma {\boldsymbol{B}})_{ij} = \gamma b_{ij}\)
Matrix-vector multiplication: \[ ({\boldsymbol{A}}{\boldsymbol{v}})_i = \sum_{j=1}^n a_{ij} v_j \]
Matrix-matrix multiplication: \[ ({\boldsymbol{A}}{\boldsymbol{B}})_{ij} = \sum_{k=1}^n a_{ik} b_{kj} \]
Tip
In Numpy, use @ for both matrix-vector and matrix-matrix multiplication (depending on the dimensions of the arrays).
A square matrix is one where the number of rows equals the number of columns (e.g., \({\boldsymbol{A}}\in {\mathbb{R}}^{n \times n}\)).
Properties:
The inverse of a square matrix \({\boldsymbol{A}}\) is denoted \({\boldsymbol{A}}^{-1}\) and satisfies
\[ {\boldsymbol{A}}{\boldsymbol{A}}^{-1} = {\boldsymbol{A}}^{-1} {\boldsymbol{A}}= {\boldsymbol{I}} \]
Important
The inverse does not always exist. We will see later the conditions under which it does.
If \({\boldsymbol{A}}\) is invertible, then \({\boldsymbol{A}}\) is said to be non-singular. Otherwise, \({\boldsymbol{A}}\) is singular.
Additional properties:
The determinant of a square matrix \({\boldsymbol{A}}\) is denoted \(|{\boldsymbol{A}}|\) or \(\det({\boldsymbol{A}})\) and is a scalar value.
For a \(2 \times 2\) matrix \({\boldsymbol{A}}= \left[ \begin{array}{c c} a & b \\ c & d \end{array} \right]\), the determinant is \(ad - bc\).
For a general \(n \times n\) matrix, the determinant is defined recursively as
\[ \det({\boldsymbol{A}}) = \sum_{j=1}^n (-1)^{1+j} a_{1j} \det({\boldsymbol{A}}_{-1,-j}) \]
where \({\boldsymbol{A}}_{-1,-j}\) is the matrix \({\boldsymbol{A}}\) with the first row and \(j\)-th column removed. Note that \({\boldsymbol{A}}_{-1,-j}\) is a \((n-1) \times (n-1)\) matrix.
This is called the cofactor expansion of the determinant.
Given a triangular matrix \({\boldsymbol{A}}= \left[ \begin{array}{c c c} a_{11} & \cdots & a_{1n} \\ 0 & \ddots & \vdots \\ 0 & 0 & a_{nn} \end{array} \right]\), then \(\det({\boldsymbol{A}}) = \prod_{i=1}^n a_{ii}\).
Determinant of a sum of matrices: \(\det({\boldsymbol{A}}+ {\boldsymbol{B}}) \neq \det({\boldsymbol{A}}) + \det({\boldsymbol{B}})\)
Determinant of a product: \(\det({\boldsymbol{A}}{\boldsymbol{B}}) = \det({\boldsymbol{A}}) \det({\boldsymbol{B}})\)
Determinant of an inverse: \(\det({\boldsymbol{A}}^{-1}) = \det({\boldsymbol{A}})^{-1}\)
Determinant of a transpose: \(\det({\boldsymbol{A}}^{\top}) = \det({\boldsymbol{A}})\)
Determinant of a scalar multiple: \(\det(\gamma {\boldsymbol{A}}) = \gamma^n \det({\boldsymbol{A}})\)
A matrix is invertible if and only if its determinant is non-zero.
Determinant of an orthogonal matrix: \(\det({\boldsymbol{Q}}) = \pm 1\)
\[ {\boldsymbol{A}}{\boldsymbol{v}}= \lambda {\boldsymbol{v}} \]
\[ {\boldsymbol{A}}= {\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top} \]
\({\boldsymbol{\Lambda}}= \left[ \begin{array}{c c c c } \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \\ \end{array} \right]\)
Proof
By the spectral theorem, \({\boldsymbol{A}}= {\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top}\), so \(\det({\boldsymbol{A}}) = \det({\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top}) = \det({\boldsymbol{Q}}) \det({\boldsymbol{\Lambda}}) \det({\boldsymbol{Q}}^{\top}) = \det({\boldsymbol{\Lambda}}) = \prod_{i=1}^n \lambda_i\).
Proof
By the spectral theorem, \({\boldsymbol{A}}= {\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top}\), so \(\mathrm{Tr}({\boldsymbol{A}}) = \mathrm{Tr}({\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top}) = \mathrm{Tr}({\boldsymbol{Q}}^{\top} {\boldsymbol{Q}}{\boldsymbol{\Lambda}}) = \mathrm{Tr}({\boldsymbol{\Lambda}}) = \sum_{i=1}^n \lambda_i\).
The \(n \times n\) matrix \({\boldsymbol{A}}\) is said to be positive definite if \({\boldsymbol{y}}^{\top} {\boldsymbol{A}}{\boldsymbol{y}}> 0\) for all \({\boldsymbol{y}}\neq {\boldsymbol{0}}\).
The \(n \times n\) matrix \({\boldsymbol{A}}\) is said to be positive semidefinite if \({\boldsymbol{y}}^{\top} {\boldsymbol{A}}{\boldsymbol{y}}\geq 0\) for all \({\boldsymbol{y}}\neq {\boldsymbol{0}}\).
A symmetric matrix is positive definite if and only if all its eigenvalues are positive.
Proof
Let \({\boldsymbol{A}}= {\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top}\) be the spectral decomposition of \({\boldsymbol{A}}\). Then \({\boldsymbol{y}}^{\top} {\boldsymbol{A}}{\boldsymbol{y}}= {\boldsymbol{y}}^{\top} {\boldsymbol{Q}}{\boldsymbol{\Lambda}}{\boldsymbol{Q}}^{\top} {\boldsymbol{y}}= ({\boldsymbol{Q}}^{\top} {\boldsymbol{y}})^{\top} {\boldsymbol{\Lambda}}({\boldsymbol{Q}}^{\top} {\boldsymbol{y}}) = \sum_{i=1}^n \lambda_i ({\boldsymbol{q}}_i^{\top} {\boldsymbol{y}})^2\). Since \(\lambda_i > 0\) for all \(i\), then \({\boldsymbol{y}}^{\top} {\boldsymbol{A}}{\boldsymbol{y}}> 0\) for all \({\boldsymbol{y}}\neq {\boldsymbol{0}}\).
Corollary
Let \({\boldsymbol{X}}\) be an \(n \times p\) matrix and \({\boldsymbol{y}}\) be a \(p\)-dimensional vector. Then
\[ \begin{aligned} {\boldsymbol{y}}^{\top} ({\boldsymbol{X}}^{\top} {\boldsymbol{X}}) {\boldsymbol{y}} & = ({\boldsymbol{X}}{\boldsymbol{y}})^{\top} ({\boldsymbol{X}}{\boldsymbol{y}}) \\ & = {\boldsymbol{z}}^{\top} {\boldsymbol{z}}\\ & = \sum_{i=1}^n z_i^2 \geq 0 \end{aligned} \]
To guarantee that \({\boldsymbol{X}}^{\top} {\boldsymbol{X}}\) is positive definite, we also need to ensure that \({\boldsymbol{X}}\) is full rank.
In practice:
Sometimes, we need to ensure that \({\boldsymbol{X}}^{\top} {\boldsymbol{X}}\) is positive definite numerically. To ensure that we can add a small jitter to the diagonal, \({\boldsymbol{X}}^{\top} {\boldsymbol{X}}+ \epsilon {\boldsymbol{I}}\), with \(\epsilon > 0\).
Define lower triangular matrix \(\boldsymbol{L}\)
\[ {\boldsymbol{L}}= \left[ \begin{array}{c c c c } L_{11} & 0 & \cdots & 0 \\ L_{21} & L_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ L_{n1} & L_{n2} & \cdots & L_{nn} \\ \end{array} \right] \]
so that \({\boldsymbol{A}}= {\boldsymbol{L}}{\boldsymbol{L}}^{\top}\).
Question: How to go from \({\boldsymbol{A}}\) to \({\boldsymbol{L}}\)?
Example of Cholesky algorithm: for \(3 \times 3\) matrix \({\boldsymbol{A}}\):
\[ \begin{aligned} {\boldsymbol{L}}{\boldsymbol{L}}^{\top} = \left[ \begin{array}{c c c} L_{11} & 0 & 0 \\ L_{21} & L_{22} & 0 \\ L_{31} & L_{32} & L_{33} \\ \end{array} \right] \left[ \begin{array}{c c c} L_{11} & L_{21} & L_{31} \\ 0 & L_{22} & L_{32} \\ 0 & 0 & L_{33} \\ \end{array} \right] = \left[ \begin{array}{c c c} L_{11}^2 & & \text{(symmetric)} \\ L_{21}L_{11} & L_{21}^2 + L_{22}^2 & \\ L_{31}L_{11} & L_{31}L_{21} + L_{32}L_{22} & L_{31}^2 + L_{32}^2 + L_{33}^2 \\ \end{array} \right] \end{aligned} \]
Properties of Cholesky decomposition:
Iterative algorithm costing \(\mathcal{O}(n^3)\) operations.
The determinant of \({\boldsymbol{A}}\) is \(\det({\boldsymbol{A}}) = \det({\boldsymbol{L}}{\boldsymbol{L}}^{\top}) = \det({\boldsymbol{L}})^2 = \left(\prod_{i=1}^n L_{ii}\right)^2\)
Usefull for solving \({\boldsymbol{A}}^{-1} {\boldsymbol{b}}\) (e.g., in linear systems and in linear regression) via back substitution.
The inverse of \({\boldsymbol{A}}\) can be computed as back substitution

Simone Rossi - Advanced Statistical Inference - EURECOM