Probability & Statistics/KL divergence upper limit
QUESTION: Hi Clyde,
I hope you are doing very well.
The continuous case Kullback-Leibler divergence (KLD) has a lower limit of 0 when the two probability density functions compared are identical. Now, if they are not, is there an upper limit for such divergence? Someone told me that, for instance, if I use a logarithm base 2 to obtain units in bits, then the KLD cannot be bigger than 1 (bit), no matter what. I have done a few tests and I think it can be bigger, but I am not sure if I am missing something. Is what I got told accurate?
If you could also recommend me some textbook where I could learn more, I would be very grateful.
Many thanks in advance,
ANSWER: On the Wikipedia page here:
You can find a definition of this divergence measure as:
D(P,Q) = integral( P(x) ln( P(x) / Q(x) ) dx ) for x from -infinity to +infinity
There is a caveat that if Q(x) is zero, so is P(x), and the integrand there is assumed to be zero (more of a concern for discrete distributions).
Consider the two probability distributions:
P(x) = (1/pi) (1/(x^2+1))
Q(x) = e^x / (e^x+1)^2
You can compute, using a numerical integrator, that D(P,Q) is 0.137564.
However, the integral that defines D(Q,P) appears to be undefined.
(Don't forget, D(P,Q) and D(Q,P) are not the same, nor even required to be finite simultaneously.)
Wikipedia has several links that may provide more information.
---------- FOLLOW-UP ----------
QUESTION: Dear Clyde,
Many thanks for your reply. Now, my question was more concerned with the highest value that the KLD can take. The reason why I am asking is because I am computing the KLD between two Lognormal densities (red and blue in attached plot) through numerical integration and, when they are very different, I get KLD values over 1 bit, sometimes more than 5 bits (as in image). This worried me because someone told me that KLD measures information among two possible states (the two densities themselves?) and therefore you cannot have more than 1 bit of information in such a case. I am following the definition of KLD. I checked that my densities (blue and red) integrate to 1. In the attached image I also plot the log2 of the ratio of the two densities in green and the product of this times the blue distribution (magenta). The area under the magenta curve could indeed be a little above 5. So, my result seems correct. What do you think? Is it logical that I am getting a KLD over 1 bit in this setting?
Many thanks again.
P.S. This is not homework, it is part of my job. I just have this little gap in my knowledge.
First of all, I am quite confused by your use of the term "bit" here. I am going to ignore it entirely because I don't think you are using the term correctly, and when you say "1 bit" you seem to literally mean "the real number 1."
Now, the major difference between what you are saying in this follow-up and what I said in my previous response is that you have strictly limited yourself to log-normal distributions.
However, even then, it is entirely possible to have two log-normal distributions that have KLD greater than 1. For example, take one with μ=σ=1 and another with μ=4, σ=2. A quick numerical integration gives the KLD as approximately 1.44315.