We explore the role of entropy in prediction and learning problems

Example — Classify Images

For every image $x$ , our Neural Network $NN$ outputs a probability distribution over all possible labels $l$

Q_{NN} (l ∣ x)

For every image $x$ , the true label distribution is

P_{true} (l ∣ x) = {1, 0, l is the correct label for x wrong label

Ideally, we want (for every pair image-label)

Q_{NN} (l ∣ x) = P_{true} (l ∣ x)

→ This will never happen !

Instead, we consider cross entropy loss

Concretely, for every image $x$ , we wish to minimize the loss

L (P_{true} (\cdot ∣ x), Q_{NN} (\cdot ∣ x)) = - l = 1 \sum K P_{true} (l ∣ x) lo g_{D} Q_{NN} (l ∣ x)

Definition

Cross Entropy Loss between two distributions $P$ and $Q$
$L (P, Q) = - y \sum P (y) lo g_{D} Q (y)$

Theorem

For a fixed probability distribution $P$ , the minimum
$Q min L (P, Q)$
Is attained iff we select $Q^{*} = P$ , and in this case would be given by
$L (P, Q^{*}) = L (P, P) = H (P)$

Lecture Notes

Explorer

01.6 Prediction, Learning and Cross-Entropy Loss

Example — Classify Images

Graph View

Backlinks