Math and Science Solutions for Businesses/Interaction Terms
I'm doing an introductory course to Statistics for my Engineering degree and am having a little trouble understanding interactive terms. I was given a set of data in the tutorial to input into Minitab which I have done but the results are not making sense to me.
To get the regression I used Y for the response and the predictors x1,x2 and t. The equation I get is 55.0 + 5.25x1 + 5.51x2 + 0.398t.
The p-values for that equation are 0.000 for the constant,0.002 for x1, 0.000for x2 and 0.000 for t.
Then I was asked to as t^2 as a predictor and get the following equation : 56.8 +5.37x1 -6/13x2 +0.109t + 0.00929t^2 with p-values of 0.000 for the constant,0.002 for x1,0.000 for x2, 0.733 for t and 0.354 for t^2. Why is that?
Also, is there any point in testing to see if an interaction term could be a predictor if one of the predictors is significant and the other is not. For example if predictor c has a p-value of 0.001 and predictor t has a p-value of 0.475, is there an reason to believe t could influence c? Thanks a million in advance.
ANSWER: I am not familiar with Minitab and so am not exactly sure of the notation. The documentation on the internet is not great but perhaps I can give you some help.
The eqns you give for your regression do not seem to have interactive terms. That is, there are no terms like x1･x2 or x2･t, etc. Is this what you intended?
The 1st regression is linear in the variables x1, x2 and t. I assume that the p-values refer to the coefficients (multi-dimensional "slopes") of the variables. The p-values are all small which I believe means that the linear fit to these variables is very unlikely or at least not at all good (less than the usual 5% confidence level).
The 2nd regression has a quadratic term in t and so is a non-linear regression. The p-values for the coefficients of t and t^2 are relatively large and indicate that the quadratic fit in t is good (the small p-values for the constant, x1 and x2 indicate that they are not contributing significantly to the fit).
My interpretation of the difference in the 2 regression results is that the data show a significant non-linear (in particular, a quadratic) fit, which is perfectly reasonable. Do you have a plot of the Y response vs. t? It could visually confirm this.
I believe that an interactive term can be a good predictor even if one or the other single predictiors has a small p-value. What this would mean is that a single predictor may not be a good linear predictor but that in combination with another variable could model the data pretty well.
Can you give me a little more context as to the nature of the data? Also, a definition of x1, x2 and t would be very useful.
Hope this helps.
---------- FOLLOW-UP ----------
QUESTION: So sorry if I wasn't clear enough. I've attached the worksheet and the Minitab ouputs which might give you a better understanding of what I'm asking. Thanks a million in advance.
OK Anna, thanks for the additional info. Here is how I interpret the results. First of all, from what I gather, really small p-values mean that the predictor IS significant, since the null hypothesis is that the coefficient is NOT significantly different from 0. Another way to say this is that a small p-value means the coefficient IS significant and the predictor does make a difference. (I got it backwards in my previous response).
Thus, the 1st regression shows that both x1 and x2 make a difference when they are on. The variable t is also significant, though the coefficient is small meaning, I think, that it represents a small slope for y vs. t reflecting the fact that the y values are pretty flat as t increases. The significance of t is mentioned in paragraph (b).
For the 2nd regression, the coefficients for t and t^2 are both small, the one for t being significantly smaller than in the 1st regression (factor of 4). I assume that the regression with x1 = x2 = 0 involves both t and t^2 together to fit the values of Y. They do a good job of showing that Y is in actuality very flat with a slight quadratic bend, so flat in fact that the coefficients of t and t^2 cannot be distinguished from 0 at a 0.05 (or pick your favorite small confidence interval) since their p-values are large. The fact that the coefficients for x1 and x2 differ slightly from their values from the first regression is a bit of a mystery. I would have thought they would be the same if the coefficients for t and t^2 were set to 0, so apparently they are not. This gets into how the model parameters (coefficients) are calculated when both categorical variables, x1 and x2, are combined with a so-called continuous variable, t. Perhaps your study materials address this.
The 3rd regression appears to show that the interaction of x1 and x2 is not significant. This would indicate that the controls don't amplify or reduce the effect of the other significantly. Per your question from last time, I still haven't come across a definitive statement as to whether an interaction term should be small if the p-value of one or the other components is small. In fact, I came across a statement that said this issue is still somewhat controversial.
Again, hope this helps.