Probability & Statistics/Statistics: comparing two population porportions or means
QUESTION: I have been in a debate with some friends about the process of comparing means/proportions in two populations. For example, consider the question of whether the same proportion of males and proportion females like ice cream. A typical way of dealing with this question seems to be to take an independent random sample of each of the two populations and then perform some statistical test, usually a either a z or a t test, on the two samples.
The question I have is what is the appropriate null hypothesis to test? Typically the null hypothesis used seems to be that "the two means/percents are then same". Here is my issue. If the two populations are large, or if the random variable is continuous, then the likelihood(colloquial) that the two population have the same mean/percent is either extremely close to zero, or zero. One doesn't need a statistical test to argue this. It is a priori obvious. If one does perform a stat test and doesn't detect a difference, all that means is the power of the test was too weak to detect what we already know. However it seems to me that one would be a fool to be agnostic as to whether the two means/proportions are in fact different.
I think that what this points to that "the means/percents of ... in the 2 populations is the same" is the wrong null hypothesis to test. Lets say we are comparing some percent in two populations, and lets say that we detect a difference at the 5 sigma significance level. Do we really care when the two percents differ at the 9th decimal point? It depends on the question we are trying to answer, but probably not. I argue that instead of this null hypothesis, what should really be tested is the null hypothesis which states "the difference in the percent of ... between the two populations is greater than 5%" or something like that. Then if you get a statistically significant result, you also have a practically significant result.
Is my logic sound, or am I missing something?
ANSWER: No -- your logic is not sound.
You are really running up against three issues, which I can illustrate by example.
Human height is a random variable. It is affected by many factors, and is a very complex thing, but for our purposes, we can assume human height is determined by some random process that (in theory) leads to a normal distribution.
However, there are not infinitely many humans, nor could such humans fill in a continuous distribution since height is measured only to a particular refinement (inches, mm, whatever).
So there is a difference between the actual distribution of humans, which is literally the set of all heights of all humans, and the ideal or theoretical distribution that we assert would happen if the number of humans approached infinity (that is, the limit distribution).
There is also
a difference between the mean and deviation of all human heights (which should match, up to some small error, the limit distribution) and that of a sample
. Large samples only tell us what these are (for the whole population or a sub-population) when the sample is big enough, and only to some margin of error.
The null hypothesis (and its rejection) is always
a statement about the entire
population. Obviously, any observation in the sample is literally true of the sample. But is that difference meaningful in the whole population and/or the theoretical limit distribution? That question is the same as deciding whether to reject the null hypothesis.
In particular, this part of your argument is flawed:
"If the two populations are large, or if the random variable is continuous, then the likelihood(colloquial) that the two population have the same mean/percent is either extremely close to zero, or zero. One doesn't need a statistical test to argue this. It is a priori obvious. If one does perform a stat test and doesn't detect a difference, all that means is the power of the test was too weak to detect what we already know."
A random variable has a probability distribution. The parameters of that distribution are not
random variables (at least, not necessarily -- they could be, but that's two layers of randomness). If a population has mean "7" that means the mean of that (theoretical) normal distribution governing its real-life distribution is, literally, 7. The fact that some distribution may or may not have a similar mean is not
a random event!
Further, if you have two populations and you take a sample, you could observe that one has mean 7.01 and the other population has mean 7.00. There are two unrelated
questions. The first is whether these are statistically meaningful differences. That requires you to assume the null hypothesis (which, in sub-population comparisons, is almost always that the two are the same) and then to reject it accordingly if the distinction is statistically meaningful. If you had, for example, sampled the entire population, and the population were very large, the difference would be meaningful because, literally, you know that the two means of the populations are different. If your sample is small, this small difference may not be statistically
Whether a difference of 0.01 is factually
significant is another thing. If 7.00 vs. 7.01 is the average hourly income of two sub-populations, I don't consider 1 cent per hour factually significant, even if it is statistically true.
---------- FOLLOW-UP ----------
QUESTION: Thank you for your response.
It appears that I wasn't being very clear in my description. Let me address some of the points that you made with an aim at clearing up the position I am arguing. Lets ignore measurement resolution/errors in heights/etc to keep things simple, and focused on the statistical aspects the problem we are discussing. If you like we can consider all height measurements to be at an accuracy of, let us say, an inch for the purposes of this discussion.
First off, I don't see how your discussion on limit distributions relates to the question. I am talking about two finite populations. Regardless of how they were created (deterministic, random, ...), both populations have means which can in principle be measured exactly by simply measuring every person in the two populations and doing some arithmetic.
I understand that the null hypothesis is a statement about the entire population, and not the samples taken. I also understand that the sample mean and population mean will, in general, be different. I also understand that the sample mean gives a point estimator of the population mean, and that the sample standard deviation can be use as an estimator of the population standard deviation. One can the use the sample mean and sample std to construct approximate p values and confidence intervals, which can be uses to determine if a certain null hypothesis can be rejected at a given alpha.
I also agree that human height is effected "by some random process that (in theory) leads to a normal distribution.". It might not be normal, but that is besides the point. My point is that if one was to measure the heights of EVERY person (not a sub-sample) in two disjoint populations, and then calculate the two POPULATION means, that they would almost certainly be different. This is precisely BECAUSE heights are determined, at least partially, by random processes.
Consider the following example:
Consider two groups of men with 100,000 men each, selected, via a random process, from the total population of men on earth. Lets label these groups A and B. In this example, the POPULATIONS under consideration are group A and group B. We are NOT considering group A and group B to be sub-samples of the total population of men on earth (though they of course are) and are NOT using them to estimate any parameters of the total population of men on the earth. We are considering them to be two complete populations in and of themselves (which they also are).
Now we can ask some questions about these two populations. On question is whether the mean heights of the two populations are the same or not. If we have access to all the men in both populations, we can directly measure the means, with out resorting to any statistics. We simply measure the heights of all 200,000 men and do a little bit of math. We then know the two population means EXACTLY.
Now suppose we are only allowed to measure 40 men from each group. Lets go do a stat test on these two populations to compare there means. We take a random sample of 40 people from each of the two populations of 100,000. Then calculate the sample means and standard deviations of the two samples we have collected, and use these values in a Z or T test against the hypothesis "The mean height of the men in group A is the same as the mean height of the men in group B".
Since our sample size is pretty small, it is likely that we will NOT be able to reject the null hypothesis, but this is just due to the fact that we have a weak test due to a small sample size. We know that means in population A and B almost certainly have to be different, without even measuring them, simply BECAUSE the heights of the individuals in the two populations are, in part, determined by random events. If I flip a coin 200,000 times, the probability that the number of heads in first 100,000 flips is the same as the number heads in the second 100,000 flips is almost zero. More specific to this example, we could calculate the probability that two random samples of 100,000 men, taken from the population of the earth, would have the same sample means. We can do this because, in fact, we know that our populations A and B are in fact random samples taken from the population of all men on earth. I haven't done this calculation, but I am certain that the probability is exceedingly small.
So my question then is: What is the point of testing the null hypothesis that "The mean height of the men in population A is the same as the mean height of the men in population B"? We know with almost certainty that is false, even if we can't reject it based on our statistical test on our two samples of 40 people. The reason we know is that we know that the populations were constructed via a random process.
In the above example, the random process which created our two populations was that they were both random sub-populations of the total population of men on earth. This is admittedly contrived. A more realistic example would be comparing the mean height men in California to those in Ohio. Here the primary random processes effecting our two populations would be primarily genetic (copy errors, etc) with some input from random environmental factors (cosmic rays, etc).
While the above example is contrived, I hope that it illustrates my point that if two large populations are constructed, at least partially, via random processes, that the null hypothesis "the two populations have the same mean" will almost always be false, and hence is the wrong hypothesis to test. Much more useful, I argue, is the hypothesis "The means of population A and B differ by more than ...".
Also, note that if I am actually interested in the question "the mean height of the two groups differ by more than 4 inches", that knowing that I can reject "The mean heights of the two groups are the same" at a the .05 level doesn't tell me if I can be 95% confident that the two means differ by more than 4 inches. To determine that I need to reject the null hypothesis "the population means in the two groups differ by less than 4 inches" at the .05 level. Thus, to me, it DOES seem to make sense to incorporate the "practical" difference resolution one is interested in into the null hypothesis being tested.
You are viewing the problem in the wrong way.
"What is the point of testing the null hypothesis that "The mean height of the men in population A is the same as the mean height of the men in population B"? We know with almost certainty that is false, even if we can't reject it based on our statistical test on our two samples of 40 people. The reason we know is that we know that the populations were constructed via a random process."
The means of these populations are not random variables
. You cannot measure the "probability" that the means of the populations are equal. You can only measure probabilities of random events. The means of the populations are not
random variables. To say "the probability of the means of the two samples being equal is zero" is nonsense. The population means are not random variables.
The sample means are random variables, but the null hypothesis is not
a statement about the sample means. It is a statement about the population means.
What you test is whether the difference between the sample's means is statistically significant
. (If it is, you may infer that two population means are likely to be different.) The null hypothesis is that the two sample means are not statistically significantly different, and thus that the two population means are statistically indistinguishable
. The null hypothesis does not assert that the sample means are exactly equal.
Keep in mind, if you literally just took two population means and said "these means differ by 2 inches (or 0.002 inches)," then you have indeed proved that the two population means are different. However, this is not
really doing statistics.
As for the "random process" creating the sample, that's not just some "random process," it's the sampling of the two populations. The population
is not the same as the sample
. The population is not random, nor is the mean of that population. The sample is random, and its mean is a random variable, but if the sample is large, it is close to the population mean. How accurate that sample mean is depends on the size of the sample.
If your sample is so small (40 is pretty small), the samples' means may be very different -- because your samples are too small to give statistically significant results. If your sample is larger, the means will be very close. They may not be equal, but they will be close to equal. (Or, if the means are far apart and you have a large enough sample, you'd reject the null hypothesis.)
As for your four inches question, the null hypothesis is not what you think it is. To quote Wikipedia, which sums it up well enough:
"In statistical inference of observed data of a scientific experiment, the null hypothesis refers to a general or default position: that there is no relationship between two measured phenomena, or that a potential medical treatment has no effect. Rejecting or disproving the null hypothesis – and thus concluding that there are grounds for believing that there is a relationship between two phenomena or that a potential treatment has a measurable effect – is a central task in the modern practice of science, and gives a precise sense in which a claim is capable of being proven false."
The null hypothesis is "These are not different." It does not
allow for some quantitative parameter. Whether or not the difference between two samples' means is 4 inches or 0.04 inches, it is a different question as to whether they are statistically significant.