Probability & Statistics/Statistical significance
I have some regression results in excel that I am trying to understand. I have about 1600 predicted and actual observations, and I have divided the predicted observations into deciles and quartiles. I believe that the quartiles should be more statistically significant since they have more observations (400 vs. 160), but I am not sure how to calculate a measure that demonstrates this.
Without a substantially greater amount of information, the best I can do is tell you that you are asking not about significance
but about confidence
The significance is simply the correlation between predictions and observations, which may have nothing to do with the number of data points you are looking at, especially when you are subdividing one large, fixed set of data.
However, if you look at 1/10 of this data and make some statistical observation, it will not be as strong as the same observation with 1/4 of the data. That is not significance (the observation may be "this is significant" or the observation may be "this is not significant"). But with more data comes more confidence.
If you have two sets of data, possibly from the same larger set, possibly overlapping, it is still the case, speaking very generically and without specifics, that more data points will give you a higher confidence.
You may look at some example on Wikipedia
. Here, and in some limited sense, confidence is inversely proportional to the sample size. So if you have 400 vs 160 samples, then it's about √(400)/√(160) ≈ 1.58 more confidence.
However, at that point, why not use all
If you have a large chunk of the data and you are testing your ability to predict something (looking for a correlation presumably), then you need to use all the data for the highest confidence. Whether or not your predictions correlate with real life in a significant way, you will draw the right conclusion (or at least, you are more likely to get the right conclusion) if you use all
If you are hacking your data into pieces in order to find chunks where your predictions are more accurate, those chunks will have less
confidence (and combined, the confidence decreases -- if you have 60% confidence in each decile, you have very poor confidence in all ten deciles!!).