Probability & Statistics/Is CORREL enough?
QUESTION: I'm researching rain in drought effected California causing earthquakes there. I'm not an expert in any related fields, just a guy with a hunch, access to data, and the ability to use the CORREL function in Excel 2013 to find the rate of correlation coefficients in the data.
The data compared is the highest daily amounts of rain in California and surrounding drought effected regions, and the amount of quakes in the same regions. My hypothesis is that if rain in drought effected regions triggers earthquakes, and since California has a steady rate of daily earthquakes, then rain on parched land there should increase the number of quakes there.
I expected I would find correlations because I watched some happen in my preliminary assessment of whether this research would be worthwhile. I however did not expect to find such strong correlations, many instances showing hundreds of days with very high correlative values. So I went over the data item by item to see they're truly there. Then to strengthen my argument, I ran the regression function in Excel, which proved to not work well with this data. There's strong correlations between rain and upticks in amounts of quakes, even amounts of rain and amounts of quakes in some instances. But I think with a lot of them, the magnitudes may be needed to make predictive models work, if they will work.
My question is does Excel have other data mining functions I can use to cross analyze this kind of data? If not, do you have any way to gauge if what I have is enough, if I make a case based solely on CORREL results that have been exhaustively applied from every possible direction, is that enough to prove the hypothesis?
ANSWER: I'm afraid that you may have fallen into the classic trap: "Correlation does not imply causation." It is, in fact, easy to find very scientifically-minded essays and even scholarly work arguing that both rain and drought cause earthquakes --
You can see, though, that the last article from a local CBS affiliate makes a very relevant comment -- that a correlation between drought and earthquakes may exist, but the cause could be an intermediate factor (in this case, a man-made factor, groundwater removal).
While scientific evidence frequently indicates that extreme weather conditions of one type often exacerbate those of another type, it is never so easy as saying "this one causes that one" without more evidence. We know, for example, that rain causes flooding. We know that earthquakes cause catastrophic waves in the oceans. But we understand these not just through some observed correlation, but through a scientific examination of how these pairs of events are directly related. In the case of heavy rainfall (the first two links) there seem to be some hypotheses about how rainfall causes a temporary increase in observable seismic activity. I do not see such a hypothesis directly linking drought to earthquakes. It is also important to note that these relationships are often supported by statistics, argued by some scientific model, but could be refuted or changed significantly as the science advances.
It is important to note that both rainfall and earthquakes are each governed by their own stet of very complex dynamics. Earthquakes, especially long-term trends, are governed by very slow-moving tectonic plates, while rain is governed by the dynamics of water in the ecosystem. It is virtually impossible to exclude some intermediate cause or effect and conclude that any correlation between the two indicates a meaningful relationship (let alone determines what that relationship is).
Although you can certainly find correlations between two sets of data, statistically speaking the only conclusion you can draw is "there is a correlation between these two sets of data." You must then argue from other principles that this correlation demonstrates a meaningful relationship, rather than a statistically significant but otherwise unexplained phenomenon.
To give you an example, one that is well-known in the news media and current affairs, if you consider the so-called anti-vaccination movement and their argumentation, it is argued that there is a correlation between the administration of vaccines to children and the onset of developmental disorders such as autism.
This reasoning is entirely faulty, however. Vaccines are administered at a particular age, and that age is -- coincidentally -- the age at which symptoms of these developmental disorders occur. The fact that there is a very strong correlation is irrelevant. You might as well argue that eating dinner causes the sun to go down, due to a similar correlation. That is, of course, false. If we as a species stopped eating dinner, I am quite sure the sun would continue to move across the sky as it has for quite a long time.
[an error occurred while processing this directive]---------- FOLLOW-UP ----------
QUESTION: Thank you for the in depth answer and links, it's all very helpful. However my initial question was to you as a statistician, to wit "My question is does Excel have other data mining functions I can use to cross analyze this kind of data?" Judging from your response, I'm assuming the answer is no.
As per your responses as an expert in whatever field of geology you also endeavor, I have no expertise of any kind. I have a lot of theories as to how rain in drought areas causes earthquakes, but I don't see how I could ever prove any of them other than the obvious. So all I'm trying to do is come up with a strong enough case to get people a lot more qualified and resourced than me to figure that part out.
Generally, the answer to both of your follow up questions is the same: Excel doesn't have advanced statistical functions that can do what you want because what you need really requires intensive scientific study and modeling.
Excel certainly has a number of statistical tests that it can do besides the correlation computation you have tried, but in many cases, it would be ineffective or irresponsible to just throw all possible statistical tests at some data hoping they are valid. Depending on the type of data and type of relationship you want to test, some other tests may be useful as well, but only if you can specifically determine that the qualities of your data are suitable for that test and that the test could lead you to a statistically valid conclusion. But even at that point, you only have statistical evidence, you would need more/different scientific methods to determine what the actual causal relationship is (if any) between the two phenomena.