Proof of The Fundamental Theorem of Linear Algebra

[TOC]

Orthogonality of $\mathcal{R}(A), \mathcal{R}(A^T), \mathcal{N}(A), \mathcal{N}(A^T)$

Statement

$$ \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{R}(A) \perp \mathcal{N}(A^T) $$

Proof

Strang's "The Fundamental Theorem of Linear Algebra" Page 2-3: $$ \begin{aligned} \mathcal{R}(A) &: \text{Column Space(Space Spanned by Column Vectors)} \\ \mathcal{R}(A^T) &: \text{Row Space(Space Spanned by Row Vectors)} \\ \mathcal{N}(A) &: \underline{x} \in \mathcal{N}(A), A\underline{x} = 0, \underline{x} \cdot (\forall \underline{r} \in \text{Row(A)}) = 0 \\ &\implies \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{N}(A^T) &: \mathcal{N}(A^T) \perp \mathcal{R}(A) (\text{Based on above}) \end{aligned} $$ Thus we have: $$ \mathcal{N}(A) \perp \mathcal{R}(A^T) \\ \mathcal{R}(A) \perp \mathcal{N}(A^T) $$


Every Vector Space = A Subspace + Its Orthogonal Complement

Statement

Any vector space is the direct sum of a subspace and its orthogonal complement. $$ V = W \oplus W^{\bot} $$

Proof1

$$ \forall \underline{v} \in V, \exists \underline{w} = \sum_{i=1}^{r} \frac{\langle \underline{v} | \underline{w_i}\rangle}{||\underline{w_i}||^2} \underline{w_i} \in W, \\ (\text{Every vector space have orthonormal basis, through G-S}: \{w_i\}) \\ \underline{v} = (\underline{w}) + (\underline{v} - \underline{w}) \\ (\underline{v} = \text{orthogonal projection onto W + complement}) $$

Then we can verify: $$ \langle \underline{v} - \underline{w} | \underline{w} \rangle = 0 \implies \underline{v} - \underline{w} \in W^{\bot} $$ Also: $$ \underline{w} \in W, \underline{w} \in W^{\bot} \implies\langle\underline{w} | \underline{w} \rangle = 0 \implies \underline{w} = 0 \\ W \cap W^{\bot} = \{\underline{0}\} $$ Therefore: $$ \underline{w} \in W, \underline{v} - \underline{w} \in W^{\bot}, W \cap W^{\bot} = \{\underline{0}\} \\ \implies V = W \oplus W^{\bot}\\ \implies dim(V) = dim(W) + dim(W^T) $$


The Two Most Important Laws of Linear Algebra

Rank-Nullity Theorem

Statement

$$ \begin{aligned} dim(V) &= rank(A) + nullity(A) \\ \end{aligned} $$

Proof: Theorem 3.4.11

  1. Suppose a basis for $N(A)$: $$ \{\underline{v_1}, ..., \underline{v_k} \} $$

  2. Then the basis for $V$: $$ \{\underline{v_1}, ..., \underline{v_k}, ..., \underline{v_n} \} $$ Then we only need to show that: $$ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} \text{ is the basis of } \mathcal{R}(A) $$

  3. For $R(A)$: $$ \begin{aligned} \forall \underline{w} \in \mathcal{R}(A), \exists \underline{v} &= \sum_{i=1}^{n}s_i\underline{v_i} \in V, \\ \underline{w} = A(\underline{v}) = \sum_{i=1}^{n}s_i A(\underline{v_i}) &= \sum_{i=k+1}^{n}s_i A(\underline{v_i}) \\ \Downarrow \\ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} &\text{ span } \mathcal{R}(A) \end{aligned} $$ Every vector in range of $A$ can be represented as a linear combination of: $$ \{ A(\underline{v_{k+1}}), \cdots, A(\underline{v_n})\} $$ And: $$ \because \{\underline{v_{k+1}}, \underline{v_n}\} \notin \mathcal{N}(A), A(\underline{v_i}) \neq 0 \\ \therefore \underline{w} = \sum_{i=k+1}^{n}s_i A(\underline{v_i}) = \underline{0}\implies s_i = 0 \\ \implies \{A(\underline{v_{k+1}}), ..., A(\underline{v_n})\} \text{ are linearly independent} $$

  4. This means every vector has a unique representation of linear combination of vectors in this set. According to the definition of "basis", these vectors form a basis for the range of A. This means: $$ rank(A) = dim \mathcal({R}(A)) = n - k = dim(V) - dim(\mathcal{N}(A)) $$


Row Space and Column Space Has Same Dimension

Statement

$$ dim \mathcal{R}(A) = dim\mathcal{R}(A^T) = r $$

One way to show it is through row reduction. Row reduction is simply linear recombination of row vectors, thus row space won't change. Suppose there are r pivots after row reduction. Pivots in row space are also pivots in column space. Therefore row space and column space has same number of linearly independent vectors in their bases.

Proof: Corollary 3.4.12

$$ \begin{aligned} \mathcal{R}(A)^{\bot} &= \{\underline{v} \in F^m | \underline{v} \cdot \underline{y} = 0, \forall \underline{y} \in \mathcal{R}(A) \} \\ \mathcal{R}(A^H)^{\bot} &= \{\underline{v} \in F^n | \underline{v}^H \cdot \underline{x} = 0, \forall \underline{x} \in \mathcal{R}(A^H) \} \\ &= \{\underline{v} \in F^n | \underline{v}^H (A^H \underline{y}) = 0, \forall \underline{y} \in F^m\} \\ &= \{\underline{v} \in F^n | \underline{v}^H A^H = 0\} \\ &= \{\underline{v} \in F^n | A\underline{v} = 0\} \\ &= \mathcal{N}(A) \end{aligned} $$

Since every vector space is the direct sum of a subspace + its orthogonal complement:

$$ \begin{aligned} dim(F^n) &= dim(\mathcal{R}(A^H)) + dim(\mathcal{R}(A^H)^{\bot}) \\ n &= dim(\mathcal{R}(A^H)) + dim(\mathcal{N}(A)) \end{aligned} $$

Also, by rank-nullity theorem:

$$ dim(F^n) = n = dim(\mathcal{R}(A)) +dim(\mathcal{N}(A)) $$

Therefore we have:

$$ \begin{aligned} dim(\mathcal{R}(A^H)) = n - dim(\mathcal{N}(A)) = dim(\mathcal{R}(A)) \end{aligned} $$

Thus the row space of $A$ has same dimension as its column space.


Dimensions

Statement

The Dimensions of $\mathcal{R}(A), \mathcal{R}(A^T), \mathcal{N}(A), \mathcal{N}(A^T)$.

Proof

First of all, we assume:

$$ dim(\mathcal{R}(A)) = dim(\mathcal{R}(A^T)) = r $$

By rank-nullity theorem, we have:

$$ dim(\mathbb{R}^m) = m = dim(\mathcal{R}(A)) + dim(\mathcal{N}(A)) = r + (n-r) \\ dim(\mathbb{R}^n) = n = dim(\mathcal{R}(A^T)) + dim(\mathcal{N}(A^T)) = r + (m-r) $$

Thus:

$$ dim(\mathcal{N}(A)) = n - r \\ dim(\mathcal{N}(A^T)) = m - r $$

Vector Space Decomposition

Statement

Every vector $x \in\mathbb{R}^n$ can be decomposed as: $x_r + x_n$.

Proof

  1. We need only to prove that $V$ is the direct sum of row space and null space: $$ V = \mathcal{R}(A^T) \oplus \mathcal{N}(A) $$

  2. It's the same as to prove subspace and its orthogonal complement: $$ \mathcal{R}(A^T) = \mathcal{N}(A)^{\bot} \\\vee\\ \mathcal{N}(A) = \mathcal{R}(A^T)^{\bot} $$

    We have proved the orthogonality of these spaces, but we haven’t proved that that they are orthogonal complement to each other. Nevertheless this is indeed the case:

    $$ \begin{aligned} \mathcal{N}(A) &\perp \mathcal{R}(A^T), \\ \forall \underline{v} \in \mathcal{N}(A), \forall \underline{r} &\in \mathcal{R}(A^T), \\ \underline{v} &\perp \underline{r} \\ &\Downarrow \\ \underline{v} &\subseteq \mathcal{R}(A^T)^{\bot} \\ &\Downarrow \\ \mathcal{N}(A) &\subseteq \mathcal{R}(A^T)^{\bot} \end{aligned} $$

    And:

    $$ \begin{aligned} \forall \underline{v} \in \mathcal{R}(A^T)^{\bot}, \forall \underline{r} &\in \mathcal{R}(A^T), \\ \underline{v} &\cdot \underline{r} = 0 \\ &\Downarrow \\ \underline{v} &\in \mathcal{N}(A), \\ &\Downarrow \\ \mathcal{R}(A^T)^{\bot} &\subseteq \mathcal{N}(A) \end{aligned} $$

    Thus:

    $$ \mathcal{N}(A) = \mathcal{R}(A^T)^{\bot} $$

    Therefore according to 2:

    $$ \begin{aligned} \mathbb{R}^n &= \mathcal{R}(A^T) \oplus \mathcal{R}(A^T)^{\bot} \\ &= \mathcal{R}(A^T) \oplus \mathcal{N}(A) \end{aligned} $$

  3. Similarly, $$ \begin{aligned} \mathbb{R}^m &= \mathcal{R}(A) \oplus \mathcal{R}(A)^{\bot} \\ &= \mathcal{R}(A) \oplus \mathcal{N}(A^T) \end{aligned} $$

  4. Finally, we write them together: $$ \mathbb{R}^n = \mathcal{R}(A^T) \oplus \mathcal{N}(A) \\ \mathbb{R}^m = \mathcal{R}(A) \oplus \mathcal{N}(A^T) $$

Fundamental Theorem of Linear Algebra


Below are supplements to the original theorems.

Qualification for Basis

Statement

Any set of $n$ linearly independent vectors is a basis for vector space $V$($\dim V = n$).

The proof is separated in multiple steps.

Lemma I: Maximum Capacity of Linearly Independent Set

Statement

Any linearly independent set in $V$($\dim V = n$) contains no more than $n$ vectors.

Proof2

  1. Suppose a linearly independent set in $V$ contains more than $n$ vectors: $$ A = \{ \underline{a_1}, \cdots, \underline{a_m}\} \text{ is a basis for } Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \\ dim(Span\{ \underline{a_1}, \cdots, \underline{a_m}\}) = m \gt n $$

  2. But the dimension of the space spanned by the set is $n$: $$ \text{Suppose } \{ \underline{v_1}, \cdots, \underline{v_n}\} \text{ is a basis for } V, \\ \forall \underline{a} \in Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \subseteq V, \\ (\text{ as } \underline{a_1}, \cdots, \underline{a_m} \in V, Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \subseteq V)\\ \underline{a} = \sum_{i=1}^n s_i \underline{v_i} \\ \implies \{ \underline{v_1}, \cdots, \underline{v_n}\} \text{ is a basis for } Span\{ \underline{a_1}, \cdots, \underline{a_m}\} \\ \implies dim(Span\{ \underline{a_1}, \cdots, \underline{a_m}\}) = n \ngtr n $$

  3. We have a contradiction here. Thus the assumption is wrong. Any linearly independent set in $V$ contains no more than $n​$ vectors.


Lemma II: Is Basis?

Statement

Any set of $n$ linearly independent vectors is a basis for $V$($dimV = n$).

Proof3

  1. Suppose a linearly independent set of $n$ vectors: $$ A = \{ \underline{a_1}, \cdots, \underline{a_n}\} $$

  2. Then we can construct a linearly dependent set: $$ \forall \underline{b} \in V, \\ B = \{ \underline{a_1}, \cdots, \underline{a_n}, \underline{b}\} $$ Because it is linearly dependent, we have: $$ (\sum_{i=1}^{n} s_i \underline{a_i}) + t\underline{b} = \underline{0} \implies \text{Parameters s or b are not all zero.}\\ \Downarrow \\ t \neq 0 (\text{otherwise all parameters s and t are zero}) \implies \underline{b} = \sum_{i \in B\backslash Z} \frac{s_i}{t} \underline{a_i} \\ \Downarrow \\ \text{Every vector in } V \text{can be expressed as a linear combination vectors in A. A span } V. $$

  3. Because $A$ is linearly independent, every vector in V can be expressed as a unique linear combination of vectors in $A$. According to the definition of basis, $A$ is a basis for $V$.


A strait forward conclusion is the following(Reference):

Lemma III: Subspace = Vector Space?

Statement

Suppose $W$ is a subspace of $V$. If $\dim W = \dim V$, then $W = V$.

Proof

This can be proved as the basis of $W$ is a set of $n$ linearly independent vectors. Since dim $V = n$, the basis of $W$ form a basis of $V$: $$ V \subseteq Span\{\underline{w_1}, \cdots, \underline{w_n}\} = W $$ Because $W$ is subspace of $V$, we must have $W = V​$.


Invertible Matrix Theorem

First, we prove two basic conclusions(Reference: UCLA, Theorem 3.3.18):

Left Invertible ⟹ Linearly Independent Columns

Statement

If a matrix is left invertible, then its column vectors are linearly independent.

Proof

$$ \forall \underline{v} \in V, A \underline{v} = \underline{0}, A^{-1}A \underline{v} = \underline{0} \implies \underline{v} = \underline{0} $$

This indicates that column vectors are linearly independent.


Square + Linearly Independent Columns ⟹ Right Invertible

Statement

If a square matrix has linearly independent column vectors, then matrix is right invertible.

Proof

$$ A = \begin{bmatrix} \underline{a_1} & \cdots & \underline{a_n} \end{bmatrix} $$

The vector space has dimension n(Row Size). There are n linearly independent column vectors(Column Size). Thus by 5.2, the column vectors form a basis. The standard basis vectors can be formed as: $$ \underline{e_j} = \sum_{i=1}^{n} s_{ij} \underline{a_i} \\ \sum_{i=1}^{n} s_{ij} a_{ik} = \begin{cases} 0, & k \neq j \\ 1, & k = j \end{cases} $$

Then we can construct a matrix $B$ by the parameters $s$, such that $AB = I​$:

$$ \begin{aligned} AB &= \begin{bmatrix} \underline{a_1} &\cdots & \underline{a_n} \end{bmatrix} \begin{bmatrix} s_{11} & \cdots & s_{1n} \\ \vdots & \ddots & \vdots\\ s_{n1} & \cdots & s_{nn} \\ \end{bmatrix} = \begin{bmatrix} \underline{e_1} &\cdots &\underline{e_n} \end{bmatrix} = I \end{aligned} $$

Thus $A$ is right invertible.


For Square Matrix, Left Invertible ⟺ Right Invertible

Statement

For square matrix, $AB = I \Leftrightarrow BA = I$.

Proof4

  1. If $A$ is square matrix and $AB = I$, then $B$ is a square matrix and is left invertible. By 6.1, $B$ has linearly independent columns: $$ AB = I \overset{6.1}{\implies} B \text{ has linearly independent columns} \overset{6.2}{\implies} BC = I $$

    We can also show that B has linearly independent rows:

    $$ BC = I \Longleftrightarrow C^T B^T = I \overset{6.1}{\implies} B^T \text{ has linearly independent columns} $$

  2. If $AB = I$, we have: $$ B = BI = B(AB) = (BA)B \\ B - (BA)B = 0 \\ (I - BA)B = 0 \\ B^T (I - BA)^T = 0 $$

    According to linearly independence of row vectors of $B$, we must have:

    $$ (I - BA)^T = 0 \Longleftrightarrow (I - BA) = 0 \Longleftrightarrow BA = I $$


Invertible ⟺ Square + Column Vectors Linearly Independent

Statement

A matrix is invertible if it is square, and its columns vectors are linearly independent.

Proof

Now we may begin the proof:

  1. If matrix is invertible, then it is left invertible, thus its column vectors are linearly independent. "AB = BA = I" means that it must be square.

  2. If a square matrix A has linearly independent column vectors, then it is right invertible: $$ AB = I $$ Then by 6.3, for square matrix we also have: $$ BA = I $$ Thus matrix A is invertible. In addition, A has linearly independent rows: $$ B^T A^T = (AB)^T = I^T = I \\ A^T \text{ is left invertible} \overset{(a)}{=} A^T \text{ has linearly independent columns} $$


Invertible Matrix Theorem

Statement

It describes a type of matrix that posses the following characteristics all at once:

Invertible = Square = Linearly Independent Columns(Is Also Basis for $R^n$) & Rows = Full Rank = Non-Singular = Hermitian Invertible

Proof

The key is theorem 6.3 "Invertible ⟺ Square + Column Vectors Linearly Independent".

(Non-singular because $R^n$ = Range($\dim = n$) + Null-Space($\dim = 0$), as these two form a direct sum. So nullity = 0 means that null space is $\{0\}$)

Positive-definite ⟹ Invertible

Statement

If a matrix is positive-definite, then it is invertible.

Proof: Definition 4.2.2

  1. If A is Positive-definite, then we have: $$ \forall \underline{v} \notin F^n - \{\underline{0}\}, \\ \underline{v}^H A \underline{v} \gt 0 \implies A\underline{v} \neq 0 $$ Thus A has linearly independent columns: $$ A \underline{v} = 0 \implies \underline{v} = 0 $$

  2. Since positive-definite is defined on square matrix, 6.4 can be applies: A is square matrix with linearly independent column vectors, thus A is invertible.

Linearly Independent Columns ⟺ Grammian Positive-definite

Statement

A has linearly independent columns if and only if $A^H A$ is positive-definite. ($A^H A$: Grammian Matrix G)

Proof

  1. A Has Linearly Independent Columns ⟹ $A^H A$ Positive-definite: Positive Definiteness, Khan, 6.7

    $$ \forall \underline{v} \in F^n - \{\underline{0}\}, \\ A\underline{v} \neq 0 \\ (A\underline{v})^H (A\underline{v}) = \underline{v}^H(A^H A)\underline{v} \gt 0 \\ (\text{by standard inner-product}) $$

    Thus $A^H A$ is positive definite, and by 6.7 it is also invertible.

  2. $A^H A$ Positive-definite ⟹ A Has Linearly Independent Columns: Theorem 4.2.3 $$ \forall \underline{v} \in F^n - \{\underline{0}\}, \\ \underline{v}^H (A^H A) \underline{v} \gt 0 \\ (A\underline{v})^H (A\underline{v}) \gt 0 \\ A\underline{v} \neq \underline{0} $$ Thus A has linearly independent columns.


Best Approximation

Best Approximation Theorem

Best Approximation ⟺ Error ⟂ Entire Target Space

Normal Equation for Best Approximation

  1. If $b$ is not in $R(A)$, $Ax = b$ cannot be solved. (Because $R(A)$ is defined as every vector Ax can reach.)

  2. Then we are interested in finding a "$p = Ax$", where $p$ is as close as possible to $b$. Or we can say we want to find the best approximation of $b$ by vectors in $R(A)$ (or column space of $A​$). $$ \underline{p} = A \underline{x}, \\ ||\underline{b} - \underline{p}|| \le ||\underline{b} - \underline{w}||, \forall \underline{w} \in \mathcal{R}(A) $$

  3. By the theorem of best approximation, we have: $$ \langle \underline{b} - \underline{p} | \underline{v} \rangle = \underline{0}, \forall \underline{v} \in \mathcal{R}(A) $$ This means the error "$b - p$" is also perpendicular to all column vectors of A: $$ A^T \cdot (\underline{b} - \underline{p}) = \begin{bmatrix} \underline{0} \\ \vdots \\ \underline{0} \end{bmatrix} = \underline{0} \\ A^T(\underline{b} - A\underline{x}) = \underline{0} \\ A^T \underline{b} = (A^T A) \underline{x} \\ (\text{This is the Normal Equation}, \underline{t} = G \underline{s}) $$


Reference


  1. Reference - UCDavis

  2. Reference - StackExchange

  3. Reference - UPenn

  4. Reference - StackExchange

by Jon