1.3 The fundamental compression theorem

Source coding is seen as a way to compress the source, or a way to efficiently describe the source output.

The source is specified by its alphabet, and by the source statistic (?)

flowchart LR
	a["`Source`"]
	b@{ shape: text, label: "Blabla" }
	c["`Encoder`"]
	d@{ shape: text, label: "426c61626c61"}
	a --> b
	b --> c
	c --> d

The encoder is specified by:

The input alphabet ( $A$ )
the output alphabet ( $D$ )
the codebook ( $C$ ) which consists of finite sequences over the output alphabet
the one-to-one encoding map $Γ$ from $A^{k} \to C$

Example

$A a b c d Γ_{O} 00011011 Γ_{A} 0011011 Γ_{B} 0101101110 Γ_{C} 0010110111$
$dac \to Γ_{O} \to 110010$ $dac \to Γ_{A} \to 11010$ $dac \to Γ_{B} \to 11100110$ $dac \to Γ_{C} \to 110011$

But we want to avoid the issue where some decoding can be ambiguous

Definition

We say that a code is uniquely decodable if every concatenation of codewords always has a unique parsing into a sequence of codewords (unique function)

Example

The code $A$ is not uniquely decodable $b c \to 0110$ $a d a \to 0110$

Example

Code $O$ is always uniquely decodable, because it is of fixed length.

Theorem

A fixed-length code is always uniquely decodable

Definition

A code is prefix-free if no codeword is a prefix of another codeword

Theorem

A prefix-free code is always uniquely decodable

A uniquely decodable code is not necessarily prefix-free

Definition

A prefix free code is also called instantaneous code

Theorem, Kraft-McMillan's inequality

If a D-ary code is uniquely decodable then its codeword lengths $l_{1}, \dots, l_{M}$ satisfies $D^{- l_{1}} + \dots + D^{- l_{M}} \leq 1$

So if Kraft’s inequality is not fulfilled, there doesn’t exist a uniquely decodable code with this length (Contrapositive)

$l_{1}, \dots, l_{M}$ satisfy Kraft's inequality for some integer $D$ , then there exists a $D$ -ary prefix-free code (hence decodable)

If the positive integers

This implies that any uniquely decodable code can be substituted by a prefix free code of the same codeword lengths

We will focus on prefix-free codes

No loss of optimality

A prefix-free codeword is recognized as soon as its last digit is seen $⟹$ less delay

Definition

The average codeword-length is given by $L (S, Γ) = \sum_{s \in A} p_{S} (s) l (Γ (s))$

Theorem

Let $Γ : A \to C$ be the encoding map of a base $D$ code for the random variable $S \in A$ If the code is uniquely decodable, then the average codeword-length is lower bounded by the entropy of $S$
$H_{D} (S) \leq L (S, Γ)$

Theorem

For every random variable $S \in A$ , and for every integer $D \geq 2$ , there exists a prefix-free $D$ -ary code for $S$ such that
$l (Γ (s)) = ⌈ - lo g_{D} p_{S (s)} ⌉, \forall s \in A$
We call those codes $D$ -ary Shannon Fano codes

But these codes seem complicated for nothing ! Well no, they are actually rather good :

Theorem

The average codeword-length of a $D$ -ary Shannon-Fano code for the random variable $S$ fulfills
$H_{D} (S) \leq L (S, Γ_{SF}) < H_{D} (S) + 1$

Sadly they are not optimal, some some examples can be drawn that have a smaller average length

Example

Let’s say we have a letter that has an absurdly small probability. It’s length will be very big, and could probably truncated, because it has a lot of empty nodes

Optimal solution : Huffman code

We start by having all the leaves of the tree with their given weights. We order them by their weights. We join the leaves that have the lowest weights together. We keep going until everything

Lemma

The average path length of a tree with probabilities is the sum of the probabilities of the intermediate nodes
$i \sum p_{i} l_{i} = j \sum q_{j}$

Theorem

It is guaranteed for a Huffman code $Γ_{H}$ to be more optimized than another code ; $L (S, Γ_{H}) \leq L (S, Γ)$

Lecture Notes

Explorer

1.3 The fundamental compression theorem

Optimal solution : Huffman code

Graph View

Backlinks