You are here:

# Probability & Statistics/Statistical significance

Question
I have some regression results in excel that I am trying to understand. I have about 1600 predicted and actual observations, and I have divided the predicted observations into deciles and quartiles. I believe that the quartiles should be more statistically significant since they have more observations (400 vs. 160), but I am not sure how to calculate a measure that demonstrates this.

Without a substantially greater amount of information, the best I can do is tell you that you are asking not about significance but about confidence .

The significance is simply the correlation between predictions and observations, which may have nothing to do with the number of data points you are looking at, especially when you are subdividing one large, fixed set of data.

However, if you look at 1/10 of this data and make some statistical observation, it will not be as strong as the same observation with 1/4 of the data. That is not significance (the observation may be "this is significant" or the observation may be "this is not significant"). But with more data comes more confidence.

If you have two sets of data, possibly from the same larger set, possibly overlapping, it is still the case, speaking very generically and without specifics, that more data points will give you a higher confidence.

You may look at some example on Wikipedia. Here, and in some limited sense, confidence is inversely proportional to the sample size. So if you have 400 vs 160 samples, then it's about √(400)/√(160) ≈ 1.58 more confidence.

However, at that point, why not use all the data?

If you have a large chunk of the data and you are testing your ability to predict something (looking for a correlation presumably), then you need to use all the data for the highest confidence. Whether or not your predictions correlate with real life in a significant way, you will draw the right conclusion (or at least, you are more likely to get the right conclusion) if you use all your data.

If you are hacking your data into pieces in order to find chunks where your predictions are more accurate, those chunks will have less confidence (and combined, the confidence decreases -- if you have 60% confidence in each decile, you have very poor confidence in all ten deciles!!).

Probability & Statistics

Volunteer

#### Clyde Oliver

##### Expertise

I can answer all questions up to, and including, graduate level mathematics. I do not have expertise in statistics (I can answer questions about the mathematical foundations of statistics). I am very much proficient in probability. I am not inclined to answer questions that appear to be homework, nor questions that are not meaningful or advanced in any way.

##### Experience

I am a PhD educated mathematician working in research at a major university.

Organizations
AMS

Publications
Various research journals of mathematics. Various talks & presentations (some short, some long), about either interesting classical material or about research work.

Education/Credentials
BA mathematics & physics, PhD mathematics from a top 20 US school.

Awards and Honors
Various honors related to grades, various fellowships & scholarships, awards for contributions to mathematics and education at my schools, etc.

Past/Present Clients
In the past, and as my career progresses, I have worked and continue to work as an educator and mentor to students of varying age levels, skill levels, and educational levels.