Cross-Entropy Loss in ML

Inara Koppert-Anisimova
unpack
Published in
4 min readJan 4, 2021

--

Multi-class brain X Rays with different cancer affections

What is Entropy in ML?

Entropy is the number of bits required to transmit a randomly selected event from a probability distribution. A skewed distribution has a low entropy, whereas a distribution where events have equal probability has a larger entropy.

There is binary cross entropy loss and multi-class cross entropy loss.

Cross-entropy builds upon the idea of entropy from information theory and calculates the number of bits required to represent or transmit an average event from one distribution compared to another distribution.

When do we use it?

Cross-entropy loss is used when adjusting model weights during training. The aim is to minimize the loss, i.e, the smaller the loss the better the model. A perfect model has a cross-entropy loss of 0. Normally its serves for multi-class and multi-label classifications.

Determination of multi-class and multi-label data

… the cross entropy is the average number of bits needed to encode data coming from a source with distribution p when we use model q …”

The intuition for this definition comes if we consider a target or underlying probability distribution P and an approximation of the target distribution Q, then the cross-entropy of Q from P is the number of additional bits to represent an event using Q instead of P.

The cross-entropy between two probability distributions, such as Q from P, can be stated formally as:

  • H(P, Q)

Where H() is the cross-entropy function, P may be the target distribution and Q is the approximation of the target distribution.

Cross-entropy can be calculated using the probabilities of the events from P and Q, as follows:

  • H(P, Q) = — sum x in X P(x) * log(Q(x))

Usually an activation function (Sigmoid / Softmax) is applied to the scores before the CE Loss computation. With Softmax, your model predicts a vector of probabilities[0.7, 0.2, 0.1] Sum of 70% 20% 10% is 100%, the first entry is the most likely.

What in fact Softmax and CE do is:

Input image source: Photo by Victor Grabarczyk on Unsplash . Diagram by author.

Softmax converts logits into probabilities. The purpose of the Cross-Entropy is to take the output probabilities (P) and measure the distance from the truth values (as shown in Figure below).

Cross Entropy (L) (Source: Author). (S is Softmax output, T — target)

How would we calculate Cross-Entropy loss? I found a nice image to see visually what’s going on with tensors:

Cross-Entropy Function representation

S(y) is the output of our softmax function. It is a prediction, so we can also call it y_hat. L is the fact! The one hot encoded label of the true class, only one entry is one, the rest are zeroes. For each entry in your output vector, Step 1 Softmax takes the log of that first entry, usually an less-than-one number, so it’s very negative for example log_base_2 of 0.7 is negative 0.5145731728297583 and 2 to the -0.5145731728297583th power is 0.7 , Step 2 the entry is multiplied by the ground truth log(0.7)*1, and then Step 3 we do that for each of the entry log(0.2)*0 which is of course zero and then log(0.1)*0 which is also zero, then step 4 because of the big sigma summing simple in front of L_i Log(S_i) we sum all the losses up which is -0.5145731728297583, Step 5 we multiply by -1 because of the big negative sign in the front and turn the loss into a positive number 0.5145731728297583!

“F.cross_entropy(acts, targ)

tensor(0.51457)"

For problems when we have multi-label classification, Binary Cross-Entropy is the best to be used, which is mnist_loss along with log. Each activation will be compared to each target for each column, so we can get predictions, which labels are on the image.

Sources:

  1. Machine Learning: A Probabilistic Perspective, 2012.
  2. https://machinelearningmastery.com/cross-entropy-for-machine-learning/
  3. https://medium.com/data-science-bootcamp/understand-cross-entropy-loss-in-minutes-9fb263caee9a
  4. https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e
  5. https://gombru.github.io/2018/05/23/cross_entropy_loss/
  6. Chaper 06 Deep Learning for Coders with fastai,Jeremy Howard & Sylvain Gugger

--

--