Probability & Statistics/Re-formulated question
QUESTION: Dear Clyde,
I am the guy of the rabbits again. I am very sorry for not phrasing my question precisely. Please let me try again.
The problem I have is estimating the mean and variance of the weight of a wild rabbit found within a given piece of land. Since they are wild rabbits they can enter and exit the piece of land whenever they want, and so their numbers change all the time. I assume that, when one goes to collect the data one is able to catch all rabbits within the piece of land. Then, one would measure the weight of each and then with this data estimate a mean and variance of the weight of a rabbit for such a collection round. Now, with the aim of increasing the reliability of the estimates, one would like to collect the data in this way several times. The population is assumed 'stable' over time (no sustained decline or increase in average rabbit numbers), collection rounds are assumed independent of each other and the weight of one rabbit is assumed independent of the weight of another one. So one would end up with n samples of rabbit weights, with different numbers of rabbits within each. Each of these samples has a sample mean of the weight of a rabbit and a corresponding variance. My intuition is that the estimator of the mean weight of a rabbit given this set of samples is simply the average of all the n sample means (is that correct?). The other problem is that I am not sure how to estimate in an analogous way the variance of the weight of a rabbit. I hope I managed to explain the question better now.
Many thanks in advance again.
ANSWER: In this case, although the mean can be computed as a "weighted" figure, the variance cannot.
First, I know you already know this, but the weighted mean can be computed somewhat easily. If you make N measurements, x1, x2, ..., xN, then the mean is:
(x1 + x2 + ... + xN) / N
If you have partial means like this:
m1 = (x1 + x2 + ... + xN) / N
m2 = (y1 + y2 + ... + yM) / M
then you can combine them as:
m = (N*m1 + M*m2) / (N+M)
This is the exact weighted mean you describe.
In general, if you have m1, m2, ... mk (for k different partial means) with sample sizes N1, N2, ... Nk, you can find the overall mean as:
m = ( N1*m1 + N2*m2 + ... + Nk*mk ) / ( N1 + N2 + ... + Nk )
Once you have this mean m, the variance is something you can find. Normally variance is:
var = [ (x1-m)^2 + (x2-m)^2 + ... + (xN-m)^2 ] / N
If you compute the variance among each sample, say you just have two groups of samples:
var1 = [ (x1-m1)^2 + (x2-m1)^2 + ... + (xN-m1) ] / N
var2 = [ (y1-m2)^2 + (y2-m2)^2 + ... + (yM-m2) ] / M
Unlike last time, though, it is NOT correct to write:
var = (N*var1 + M*var2) / (N+M)
This would be equal to:
[ (x1-m1)^2 + (x2-m1)^2 + ... + (xN-m1) + (y1-m2)^2 + (y2-m2)^2 + ... + (yM-m2) ] / (N+M)
whereas the actual variance is:
[ (x1-m)^2 + (x2-m)^2 + ... + (xN-m) + (y1-m)^2 + (y2-m)^2 + ... + (yM-m) ] / (N+M)
You have to be using the actual mean
of your sample to get the variance to work out. Although you can combine all the means to get a mean of the total sample, you cannot just combine the variances in the same way because for each sample, the mean is different.
Obviously, if you keep track of all the data, you can compute the overall variance from scratch. Nothing will stop you from doing that. But you can't compute it as a weighted average of the other variances. This would not be correct. It might be close -- maybe a "quick and dirty" way of getting a good idea of what the overall variance is. But it won't be correct.
---------- FOLLOW-UP ----------
Thank you very much. Just a little thing more. I plan to study this kind of problem further, however, I think I don't know some of the technical terms, so it would be really useful if you can give me a couple of keywords (e.g. the technical name of this variance estimator, as in the case of "weighted average") or even a recommendation of a textbook where I can learn more about this.
Many thanks again for the fast and accurate response.
You can begin by looking into a statistics text or course about "sample mean" and "sample variance." Any sample you take has a particular mean and variance, and learning how to use those to estimate the correct mean and variance can tell you a bit more about this topic.