You are here:

- Home
- Science
- Mathematics
- Probability & Statistics
- KL divergence upper limit

Advertisement

QUESTION: Hi Clyde,

I hope you are doing very well.

The continuous case Kullback-Leibler divergence (KLD) has a lower limit of 0 when the two probability density functions compared are identical. Now, if they are not, is there an upper limit for such divergence? Someone told me that, for instance, if I use a logarithm base 2 to obtain units in bits, then the KLD cannot be bigger than 1 (bit), no matter what. I have done a few tests and I think it can be bigger, but I am not sure if I am missing something. Is what I got told accurate?

If you could also recommend me some textbook where I could learn more, I would be very grateful.

Many thanks in advance,

ANSWER: On the Wikipedia page here:

https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

You can find a definition of this divergence measure as:

D(P,Q) = integral( P(x) ln( P(x) / Q(x) ) dx ) for x from -infinity to +infinity

There is a caveat that if Q(x) is zero, so is P(x), and the integrand there is assumed to be zero (more of a concern for discrete distributions).

Consider the two probability distributions:

P(x) = (1/pi) (1/(x^2+1))

Q(x) = e^x / (e^x+1)^2

You can compute, using a numerical integrator, that D(P,Q) is 0.137564.

However, the integral that defines D(Q,P) appears to be undefined.

(Don't forget, D(P,Q) and D(Q,P) are not the same, nor even required to be finite simultaneously.)

Wikipedia has several links that may provide more information.

---------- FOLLOW-UP ----------

QUESTION: Dear Clyde,

Many thanks for your reply. Now, my question was more concerned with the highest value that the KLD can take. The reason why I am asking is because I am computing the KLD between two Lognormal densities (red and blue in attached plot) through numerical integration and, when they are very different, I get KLD values over 1 bit, sometimes more than 5 bits (as in image). This worried me because someone told me that KLD measures information among two possible states (the two densities themselves?) and therefore you cannot have more than 1 bit of information in such a case. I am following the definition of KLD. I checked that my densities (blue and red) integrate to 1. In the attached image I also plot the log2 of the ratio of the two densities in green and the product of this times the blue distribution (magenta). The area under the magenta curve could indeed be a little above 5. So, my result seems correct. What do you think? Is it logical that I am getting a KLD over 1 bit in this setting?

Many thanks again.

Best wishes,

Javier

P.S. This is not homework, it is part of my job. I just have this little gap in my knowledge.

First of all, I am quite confused by your use of the term "bit" here. I am going to ignore it entirely because I don't think you are using the term correctly, and when you say "1 bit" you seem to literally mean "the real number 1."

Now, the major difference between what you are saying in this follow-up and what I said in my previous response is that you have strictly limited yourself to log-normal distributions.

However, even then, it is entirely possible to have two log-normal distributions that have KLD greater than 1. For example, take one with μ=σ=1 and another with μ=4, σ=2. A quick numerical integration gives the KLD as approximately 1.44315.

- Add to this Answer
- Ask a Question

Rating(1-10) | Knowledgeability = 9 | Clarity of Response = 5 | Politeness = 7 |

Comment | Many thanks!!! |

I can answer all questions up to, and including, graduate level mathematics. I do not have expertise in statistics (I can answer questions about the mathematical foundations of statistics). I am very much proficient in probability. I am not inclined to answer questions that appear to be homework, nor questions that are not meaningful or advanced in any way.

I am a PhD educated mathematician working in research at a major university.**Organizations**

AMS**Publications**

Various research journals of mathematics. Various talks & presentations (some short, some long), about either interesting classical material or about research work.**Education/Credentials**

BA mathematics & physics, PhD mathematics from a top 20 US school.**Awards and Honors**

Various honors related to grades, various fellowships & scholarships, awards for contributions to mathematics and education at my schools, etc.**Past/Present Clients**

In the past, and as my career progresses, I have worked and continue to work as an educator and mentor to students of varying age levels, skill levels, and educational levels.