Proof of The Fundamental Theorem of Linear Algebra
[TOC]
Orthogonality of $\mathcal{R}(A), \mathcal{R}(A^T), \mathcal{N}(A), \mathcal{N}(A^T)$
Statement
$$ \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{R}(A) \perp \mathcal{N}(A^T) $$
Proof
Strang's "The Fundamental Theorem of Linear Algebra" Page 2-3: $$ \begin{aligned} \mathcal{R}(A) &: \text{Column Space(Space Spanned by Column Vectors)} \\ \mathcal{R}(A^T) &: \text{Row Space(Space Spanned by Row Vectors)} \\ \mathcal{N}(A) &: \underline{x} \in \mathcal{N}(A), A\underline{x} = 0, \underline{x} \cdot (\forall \underline{r} \in \text{Row(A)}) = 0 \\ &\implies \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{N}(A^T) &: \mathcal{N}(A^T) \perp \mathcal{R}(A) (\text{Based on above}) \end{aligned} $$ Thus we have: $$ \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{R}(A) \perp \mathcal{N}(A^T) $$
Every Vector Space = A Subspace + Its Orthogonal Complement
Statement
Any vector space is the direct sum of a subspace and its orthogonal complement. $$ V = W \oplus W^{\bot} $$
Proof1
$$ \forall \underline{v} \in V, \exists \underline{w} = \sum_{i=1}^{r} \frac{\langle \underline{v} | \underline{w_i}\rangle}{||\underline{w_i}||^2} \underline{w_i} \in W, \\ (\text{Every vector space have orthonormal basis, through G-S}: \{w_i\}) \\ \underline{v} = (\underline{w}) + (\underline{v} - \underline{w}) \\ (\underline{v} = \text{orthogonal projection onto W + complement}) $$Then we can verify: $$ \langle \underline{v} - \underline{w} | \underline{w} \rangle = 0 \implies \underline{v} - \underline{w} \in W^{\bot} $$ Also: $$ \underline{w} \in W, \underline{w} \in W^{\bot} \implies\langle\underline{w} | \underline{w} \rangle = 0 \implies \underline{w} = 0 \\ W \cap W^{\bot} = \{\underline{0}\} $$ Therefore: $$ \underline{w} \in W, \underline{v} - \underline{w} \in W^{\bot}, W \cap W^{\bot} = \{\underline{0}\} \\ \implies V = W \oplus W^{\bot}\\ \implies dim(V) = dim(W) + dim(W^T) $$
The Two Most Important Laws of Linear Algebra
Rank-Nullity Theorem
Statement
$$ \begin{aligned} dim(V) &= rank(A) + nullity(A) \\ \end{aligned} $$
Proof: Theorem 3.4.11
Suppose a basis for $N(A)$: $$ \{\underline{v_1}, ..., \underline{v_k} \} $$
Then the basis for $V$: $$ \{\underline{v_1}, ..., \underline{v_k}, ..., \underline{v_n} \} $$ Then we only need to show that: $$ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} \text{ is the basis of } \mathcal{R}(A) $$
For $R(A)$: $$ \begin{aligned} \forall \underline{w} \in \mathcal{R}(A), \exists \underline{v} &= \sum_{i=1}^{n}s_i\underline{v_i} \in V, \\ \underline{w} = A(\underline{v}) = \sum_{i=1}^{n}s_i A(\underline{v_i}) &= \sum_{i=k+1}^{n}s_i A(\underline{v_i}) \\ \Downarrow \\ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} &\text{ span } \mathcal{R}(A) \end{aligned} $$ Every vector in range of $A$ can be represented as a linear combination of: $$ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} $$ And: $$ \because \{\underline{v_{k+1}}, \underline{v_n}\} \notin \mathcal{N}(A), A(\underline{v_i}) \neq 0 \\ \therefore \underline{w} = \sum_{i=k+1}^{n}s_i A(\underline{v_i}) = \underline{0}\implies s_i = 0 \\ \implies \{A(\underline{v_{k+1}}), ..., A(\underline{v_n})\} \text{ are linearly independent} $$
This means every vector has a unique representation of linear combination of vectors in this set. According to the definition of "basis", these vectors form a basis for the range of A. This means: $$ rank(A) = dim \mathcal({R}(A)) = n - k = dim(V) - dim(\mathcal{N}(A)) $$
Row Space and Column Space Has Same Dimension
Statement
$$ dim \mathcal{R}(A) = dim\mathcal{R}(A^T) = r $$
One way to show it is through row reduction. Row reduction is simply linear recombination of row vectors, thus row space won't change. Suppose there are r pivots after row reduction. Pivots in row space are also pivots in column space. Therefore row space and column space has same number of linearly independent vectors in their bases.
Proof: Corollary 3.4.12
$$ \begin{aligned} \mathcal{R}(A)^{\bot} &= \{\underline{v} \in F^m | \underline{v} \cdot \underline{y} = 0, \forall \underline{y} \in \mathcal{R}(A) \} \\ \mathcal{R}(A^H)^{\bot} &= \{\underline{v} \in F^n | \underline{v}^H \cdot \underline{x} = 0, \forall \underline{x} \in \mathcal{R}(A^H) \} \\ &= \{\underline{v} \in F^n | \underline{v}^H (A^H \underline{y}) = 0, \forall \underline{y} \in F^m\} \\ &= \{\underline{v} \in F^n | \underline{v}^H A^H = 0\} \\ &= \{\underline{v} \in F^n | A\underline{v} = 0\} \\ &= \mathcal{N}(A) \end{aligned} $$Since every vector space is the direct sum of a subspace + its orthogonal complement:
$$ \begin{aligned} dim(F^n) &= dim(\mathcal{R}(A^H)) + dim(\mathcal{R}(A^H)^{\bot}) \\ n &= dim(\mathcal{R}(A^H)) + dim(\mathcal{N}(A)) \end{aligned} $$Also, by rank-nullity theorem:
$$ dim(F^n) = n = dim(\mathcal{R}(A)) +dim(\mathcal{N}(A)) $$Therefore we have:
$$ \begin{aligned} dim(\mathcal{R}(A^H)) = n - dim(\mathcal{N}(A)) = dim(\mathcal{R}(A)) \end{aligned} $$Thus the row space of $A$ has same dimension as its column space.
Dimensions
Statement
The Dimensions of $\mathcal{R}(A), \mathcal{R}(A^T), \mathcal{N}(A), \mathcal{N}(A^T)$.
Proof
First of all, we assume:
$$ dim(\mathcal{R}(A)) = dim(\mathcal{R}(A^T)) = r $$By rank-nullity theorem, we have:
$$ dim(\mathbb{R}^m) = m = dim(\mathcal{R}(A)) + dim(\mathcal{N}(A)) = r + (n-r) \\ dim(\mathbb{R}^n) = n = dim(\mathcal{R}(A^T)) + dim(\mathcal{N}(A^T)) = r + (m-r) $$Thus:
$$ dim(\mathcal{N}(A)) = n - r \\ dim(\mathcal{N}(A^T)) = m - r $$Vector Space Decomposition
Statement
Every vector $x \in\mathbb{R}^n$ can be decomposed as: $x_r + x_n$.
Proof
We need only to prove that $V$ is the direct sum of row space and null space: $$ V = \mathcal{R}(A^T) \oplus \mathcal{N}(A) $$
It's the same as to prove subspace and its orthogonal complement: $$ \mathcal{R}(A^T) = \mathcal{N}(A)^{\bot} \\\vee\\ \mathcal{N}(A) = \mathcal{R}(A^T)^{\bot} $$
We have proved the orthogonality of these spaces, but we haven’t proved that that they are orthogonal complement to each other. Nevertheless this is indeed the case:
$$ \begin{aligned} \mathcal{N}(A) &\perp \mathcal{R}(A^T), \\ \forall \underline{v} \in \mathcal{N}(A), \forall \underline{r} &\in \mathcal{R}(A^T), \\ \underline{v} &\perp \underline{r} \\ &\Downarrow \\ \underline{v} &\subseteq \mathcal{R}(A^T)^{\bot} \\ &\Downarrow \\ \mathcal{N}(A) &\subseteq \mathcal{R}(A^T)^{\bot} \end{aligned} $$
And:
$$ \begin{aligned} \forall \underline{v} \in \mathcal{R}(A^T)^{\bot}, \forall \underline{r} &\in \mathcal{R}(A^T), \\ \underline{v} &\cdot \underline{r} = 0 \\ &\Downarrow \\ \underline{v} &\in \mathcal{N}(A), \\ &\Downarrow \\ \mathcal{R}(A^T)^{\bot} &\subseteq \mathcal{N}(A) \end{aligned} $$
Thus:
$$ \mathcal{N}(A) = \mathcal{R}(A^T)^{\bot} $$
Therefore according to 2:
$$ \begin{aligned} \mathbb{R}^n &= \mathcal{R}(A^T) \oplus \mathcal{R}(A^T)^{\bot} \\ &= \mathcal{R}(A^T) \oplus \mathcal{N}(A) \end{aligned} $$
Similarly, $$ \begin{aligned} \mathbb{R}^m &= \mathcal{R}(A) \oplus \mathcal{R}(A)^{\bot} \\ &= \mathcal{R}(A) \oplus \mathcal{N}(A^T) \end{aligned} $$
Finally, we write them together: $$ \mathbb{R}^n = \mathcal{R}(A^T) \oplus \mathcal{N}(A) \\ \mathbb{R}^m = \mathcal{R}(A) \oplus \mathcal{N}(A^T) $$
Below are supplements to the original theorems.
Qualification for Basis
Statement
Any set of $n$ linearly independent vectors is a basis for vector space $V$($\dim V = n$).
The proof is separated in multiple steps.
Lemma I: Maximum Capacity of Linearly Independent Set
Statement
Any linearly independent set in $V$($\dim V = n$) contains no more than $n$ vectors.
Proof2
Suppose a linearly independent set in $V$ contains more than $n$ vectors: $$ A = \{ \underline{a_1}, \cdots, \underline{a_m}\} \text{ is a basis for } Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \\ dim(Span\{ \underline{a_1}, \cdots, \underline{a_m}\}) = m \gt n $$
But the dimension of the space spanned by the set is $n$: $$ \text{Suppose } \{ \underline{v_1}, \cdots, \underline{v_n}\} \text{ is a basis for } V, \\ \forall \underline{a} \in Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \subseteq V, \\ (\text{ as } \underline{a_1}, \cdots, \underline{a_m} \in V, Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \subseteq V)\\ \underline{a} = \sum_{i=1}^n s_i \underline{v_i} \\ \implies \{ \underline{v_1}, \cdots, \underline{v_n}\} \text{ is a basis for } Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \\ \implies dim(Span\{ \underline{a_1}, \cdots, \underline{a_m}\}) = n \ngtr n $$
We have a contradiction here. Thus the assumption is wrong. Any linearly independent set in $V$ contains no more than $n$ vectors.
Lemma II: Is Basis?
Statement
Any set of $n$ linearly independent vectors is a basis for $V$($dimV = n$).
Proof3
Suppose a linearly independent set of $n$ vectors: $$ A = \{ \underline{a_1}, \cdots, \underline{a_n}\} $$
Then we can construct a linearly dependent set: $$ \forall \underline{b} \in V, \\ B = \{ \underline{a_1}, \cdots, \underline{a_n}, \underline{b}\} $$ Because it is linearly dependent, we have: $$ (\sum_{i=1}^{n} s_i \underline{a_i}) + t\underline{b} = \underline{0} \implies \text{Parameters s or b are not all zero.}\\ \Downarrow \\ t \neq 0 (\text{otherwise all parameters s and t are zero}) \implies \underline{b} = \sum_{i \in B\backslash Z} \frac{s_i}{t} \underline{a_i} \\ \Downarrow \\ \text{Every vector in } V \text{can be expressed as a linear combination vectors in A. A span } V. $$
Because $A$ is linearly independent, every vector in V can be expressed as a unique linear combination of vectors in $A$. According to the definition of basis, $A$ is a basis for $V$.
A strait forward conclusion is the following(Reference):
Lemma III: Subspace = Vector Space?
Statement
Suppose $W$ is a subspace of $V$. If $\dim W = \dim V$, then $W = V$.
Proof
This can be proved as the basis of $W$ is a set of $n$ linearly independent vectors. Since dim $V = n$, the basis of $W$ form a basis of $V$: $$ V \subseteq Span\{\underline{w_1}, \cdots, \underline{w_n}\} = W $$ Because $W$ is subspace of $V$, we must have $W = V$.
Invertible Matrix Theorem
First, we prove two basic conclusions(Reference: UCLA, Theorem 3.3.18):
Left Invertible ⟹ Linearly Independent Columns
Statement
If a matrix is left invertible, then its column vectors are linearly independent.
Proof
$$ \forall \underline{v} \in V, A \underline{v} = \underline{0}, A^{-1}A \underline{v} = \underline{0} \implies \underline{v} = \underline{0} $$This indicates that column vectors are linearly independent.
Square + Linearly Independent Columns ⟹ Right Invertible
Statement
If a square matrix has linearly independent column vectors, then matrix is right invertible.
Proof
$$ A = \begin{bmatrix} \underline{a_1} & \cdots & \underline{a_n} \end{bmatrix} $$The vector space has dimension n(Row Size). There are n linearly independent column vectors(Column Size). Thus by 5.2, the column vectors form a basis. The standard basis vectors can be formed as: $$ \underline{e_j} = \sum_{i=1}^{n} s_{ij} \underline{a_i} \\ \sum_{i=1}^{n} s_{ij} a_{ik} = \begin{cases} 0, & k \neq j \\ 1, & k = j \end{cases} $$
Then we can construct a matrix $B$ by the parameters $s$, such that $AB = I$:
$$ \begin{aligned} AB &= \begin{bmatrix} \underline{a_1} &\cdots & \underline{a_n} \end{bmatrix} \begin{bmatrix} s_{11} & \cdots & s_{1n} \\ \vdots & \ddots & \vdots\\ s_{n1} & \cdots & s_{nn} \\ \end{bmatrix} = \begin{bmatrix} \underline{e_1} &\cdots &\underline{e_n} \end{bmatrix} = I \end{aligned} $$Thus $A$ is right invertible.
For Square Matrix, Left Invertible ⟺ Right Invertible
Statement
For square matrix, $AB = I \Leftrightarrow BA = I$.
Proof4
If $A$ is square matrix and $AB = I$, then $B$ is a square matrix and is left invertible. By 6.1, $B$ has linearly independent columns: $$ AB = I \overset{6.1}{\implies} B \text{ has linearly independent columns} \overset{6.2}{\implies} BC = I $$
We can also show that B has linearly independent rows:
$$ BC = I \Longleftrightarrow C^T B^T = I \overset{6.1}{\implies} B^T \text{ has linearly independent columns} $$
If $AB = I$, we have: $$ B = BI = B(AB) = (BA)B \\ B - (BA)B = 0 \\ (I - BA)B = 0 \\ B^T (I - BA)^T = 0 $$
According to linearly independence of row vectors of $B$, we must have:
$$ (I - BA)^T = 0 \Longleftrightarrow (I - BA) = 0 \Longleftrightarrow BA = I $$
Invertible ⟺ Square + Column Vectors Linearly Independent
Statement
A matrix is invertible if it is square, and its columns vectors are linearly independent.
Proof
Now we may begin the proof:
If matrix is invertible, then it is left invertible, thus its column vectors are linearly independent. "AB = BA = I" means that it must be square.
If a square matrix A has linearly independent column vectors, then it is right invertible: $$ AB = I $$ Then by 6.3, for square matrix we also have: $$ BA = I $$ Thus matrix A is invertible. In addition, A has linearly independent rows: $$ B^T A^T = (AB)^T = I^T = I \\ A^T \text{ is left invertible} \overset{(a)}{=} A^T \text{ has linearly independent columns} $$
Invertible Matrix Theorem
Statement
It describes a type of matrix that posses the following characteristics all at once:
Invertible = Square = Linearly Independent Columns(Is Also Basis for $R^n$) & Rows = Full Rank = Non-Singular = Hermitian Invertible
Proof
The key is theorem 6.3 "Invertible ⟺ Square + Column Vectors Linearly Independent".
(Non-singular because $R^n$ = Range($\dim = n$) + Null-Space($\dim = 0$), as these two form a direct sum. So nullity = 0 means that null space is $\{0\}$)
Positive-definite ⟹ Invertible
Statement
If a matrix is positive-definite, then it is invertible.
Proof: Definition 4.2.2
If A is Positive-definite, then we have: $$ \forall \underline{v} \notin F^n - \{\underline{0}\}, \\ \underline{v}^H A \underline{v} \gt 0 \implies A\underline{v} \neq 0 $$ Thus A has linearly independent columns: $$ A \underline{v} = 0 \implies \underline{v} = 0 $$
Since positive-definite is defined on square matrix, 6.4 can be applies: A is square matrix with linearly independent column vectors, thus A is invertible.
Linearly Independent Columns ⟺ Grammian Positive-definite
Statement
A has linearly independent columns if and only if $A^H A$ is positive-definite. ($A^H A$: Grammian Matrix G)
Proof
A Has Linearly Independent Columns ⟹ $A^H A$ Positive-definite: Positive Definiteness, Khan, 6.7
$$ \forall \underline{v} \in F^n - \{\underline{0}\}, \\ A\underline{v} \neq 0 \\ (A\underline{v})^H (A\underline{v}) = \underline{v}^H(A^H A)\underline{v} \gt 0 \\ (\text{by standard inner-product}) $$
Thus $A^H A$ is positive definite, and by 6.7 it is also invertible.
$A^H A$ Positive-definite ⟹ A Has Linearly Independent Columns: Theorem 4.2.3 $$ \forall \underline{v} \in F^n - \{\underline{0}\}, \\ \underline{v}^H (A^H A) \underline{v} \gt 0 \\ (A\underline{v})^H (A\underline{v}) \gt 0 \\ A\underline{v} \neq \underline{0} $$ Thus A has linearly independent columns.
Best Approximation
Best Approximation Theorem
Best Approximation ⟺ Error ⟂ Entire Target Space
Normal Equation for Best Approximation
If $b$ is not in $R(A)$, $Ax = b$ cannot be solved. (Because $R(A)$ is defined as every vector Ax can reach.)
Then we are interested in finding a "$p = Ax$", where $p$ is as close as possible to $b$. Or we can say we want to find the best approximation of $b$ by vectors in $R(A)$ (or column space of $A$). $$ \underline{p} = A \underline{x}, \\ ||\underline{b} - \underline{p}|| \le ||\underline{b} - \underline{w}||, \forall \underline{w} \in \mathcal{R}(A) $$
By the theorem of best approximation, we have: $$ \langle \underline{b} - \underline{p} | \underline{v} \rangle = \underline{0}, \forall \underline{v} \in \mathcal{R}(A) $$ This means the error "$b - p$" is also perpendicular to all column vectors of A: $$ A^T \cdot (\underline{b} - \underline{p}) = \begin{bmatrix} \underline{0} \\ \vdots \\ \underline{0} \end{bmatrix} = \underline{0} \\ A^T(\underline{b} - A\underline{x}) = \underline{0} \\ A^T \underline{b} = (A^T A) \underline{x} \\ (\text{This is the Normal Equation}, \underline{t} = G \underline{s}) $$