Significance contradiction in linear regression: significant t-test for a coefficient vs non-significant...











up vote
34
down vote

favorite
20












I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.



Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.



On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.



My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?



I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.










share|cite|improve this question




















  • 1




    If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
    – boscovich
    Mar 15 '12 at 20:01










  • @andrea: I've decided to treat them as numerical variables.
    – Leo
    Mar 15 '12 at 20:12






  • 3




    0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
    – Gala
    Mar 15 '12 at 20:34






  • 2




    For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
    – gung
    Sep 13 '12 at 15:17















up vote
34
down vote

favorite
20












I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.



Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.



On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.



My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?



I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.










share|cite|improve this question




















  • 1




    If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
    – boscovich
    Mar 15 '12 at 20:01










  • @andrea: I've decided to treat them as numerical variables.
    – Leo
    Mar 15 '12 at 20:12






  • 3




    0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
    – Gala
    Mar 15 '12 at 20:34






  • 2




    For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
    – gung
    Sep 13 '12 at 15:17













up vote
34
down vote

favorite
20









up vote
34
down vote

favorite
20






20





I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.



Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.



On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.



My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?



I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.










share|cite|improve this question















I'm fitting a multiple linear regression model between 4 categorical variables (with 4 levels each) and a numerical output. My dataset has 43 observations.



Regression gives me the following $p$-values from the $t$-test for every slope coefficient: $.15, .67, .27, .02$. Thus, the coefficient for the 4th predictor is significant at $alpha = .05$ confidence level.



On the other hand, the regression gives me a $p$-value from an overall $F$-test of the null hypothesis that all my slope coefficients are equal to zero. For my dataset, this $p$-value is $.11$.



My question: how should I interpret these results? Which $p$-value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $alpha = .05$ confidence level?



I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$-test $p$-values and low $F$-test $p$-value. Honestly, I don't quite understand why we would need an $F$-test in addition to a $t$-test to see if linear regression coefficients are significantly different from zero.







regression hypothesis-testing multiple-comparisons multiple-regression t-test






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Oct 18 at 17:54









ttnphns

38k14137318




38k14137318










asked Mar 15 '12 at 19:56









Leo

1,32931527




1,32931527








  • 1




    If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
    – boscovich
    Mar 15 '12 at 20:01










  • @andrea: I've decided to treat them as numerical variables.
    – Leo
    Mar 15 '12 at 20:12






  • 3




    0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
    – Gala
    Mar 15 '12 at 20:34






  • 2




    For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
    – gung
    Sep 13 '12 at 15:17














  • 1




    If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
    – boscovich
    Mar 15 '12 at 20:01










  • @andrea: I've decided to treat them as numerical variables.
    – Leo
    Mar 15 '12 at 20:12






  • 3




    0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
    – Gala
    Mar 15 '12 at 20:34






  • 2




    For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
    – gung
    Sep 13 '12 at 15:17








1




1




If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01




If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...
– boscovich
Mar 15 '12 at 20:01












@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12




@andrea: I've decided to treat them as numerical variables.
– Leo
Mar 15 '12 at 20:12




3




3




0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34




0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.
– Gala
Mar 15 '12 at 20:34




2




2




For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung
Sep 13 '12 at 15:17




For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.
– gung
Sep 13 '12 at 15:17










3 Answers
3






active

oldest

votes

















up vote
36
down vote













I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.



One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.



The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.



Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.






share|cite|improve this answer






























    up vote
    23
    down vote













    I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.



    To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as





    set.seed(17)
    p <- 5 # Number of explanatory variables
    x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
    y <- x[,1] + rnorm(2^p, mean=0, sd=2)


    It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:



    > cor(cbind(x,y))
    Var1 Var2 Var3 Var4 Var5 y
    Var1 1.00 0.000 0.000 0.000 0.00 0.486
    Var2 0.00 1.000 0.000 0.000 0.00 0.088
    Var3 0.00 0.000 1.000 0.000 0.00 0.044
    Var4 0.00 0.000 0.000 1.000 0.00 -0.014
    Var5 0.00 0.000 0.000 0.000 1.00 -0.167
    y 0.49 0.088 0.044 -0.014 -0.17 1.000


    Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:



    >temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

    # Estimate Std. Error t value Pr(>|t|)
    1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
    F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478

    2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
    F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173

    3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
    F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451

    4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
    F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095

    5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
    F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118


    Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.



    I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.



    I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.






    share|cite|improve this answer























    • +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
      – gung
      Aug 21 '12 at 20:59












    • @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
      – whuber
      Aug 21 '12 at 21:23










    • Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
      – gung
      Aug 21 '12 at 21:52










    • @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
      – whuber
      Aug 21 '12 at 21:58


















    up vote
    11
    down vote













    You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.



    Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.






    share|cite|improve this answer





















    • If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
      – Dimitriy V. Masterov
      Mar 15 '12 at 20:36






    • 1




      (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
      – whuber
      Aug 22 '12 at 16:44













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f24720%2fsignificance-contradiction-in-linear-regression-significant-t-test-for-a-coeffi%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    36
    down vote













    I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.



    One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.



    The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.



    Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.






    share|cite|improve this answer



























      up vote
      36
      down vote













      I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.



      One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.



      The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.



      Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.






      share|cite|improve this answer

























        up vote
        36
        down vote










        up vote
        36
        down vote









        I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.



        One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.



        The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.



        Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.






        share|cite|improve this answer














        I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.



        One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.



        The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.



        Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited Oct 18 at 16:48

























        answered Mar 16 '12 at 4:12









        gung

        105k34255519




        105k34255519
























            up vote
            23
            down vote













            I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.



            To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as





            set.seed(17)
            p <- 5 # Number of explanatory variables
            x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
            y <- x[,1] + rnorm(2^p, mean=0, sd=2)


            It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:



            > cor(cbind(x,y))
            Var1 Var2 Var3 Var4 Var5 y
            Var1 1.00 0.000 0.000 0.000 0.00 0.486
            Var2 0.00 1.000 0.000 0.000 0.00 0.088
            Var3 0.00 0.000 1.000 0.000 0.00 0.044
            Var4 0.00 0.000 0.000 1.000 0.00 -0.014
            Var5 0.00 0.000 0.000 0.000 1.00 -0.167
            y 0.49 0.088 0.044 -0.014 -0.17 1.000


            Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:



            >temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

            # Estimate Std. Error t value Pr(>|t|)
            1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
            F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478

            2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
            F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173

            3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
            F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451

            4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
            F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095

            5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
            F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118


            Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.



            I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.



            I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.






            share|cite|improve this answer























            • +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
              – gung
              Aug 21 '12 at 20:59












            • @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
              – whuber
              Aug 21 '12 at 21:23










            • Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
              – gung
              Aug 21 '12 at 21:52










            • @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
              – whuber
              Aug 21 '12 at 21:58















            up vote
            23
            down vote













            I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.



            To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as





            set.seed(17)
            p <- 5 # Number of explanatory variables
            x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
            y <- x[,1] + rnorm(2^p, mean=0, sd=2)


            It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:



            > cor(cbind(x,y))
            Var1 Var2 Var3 Var4 Var5 y
            Var1 1.00 0.000 0.000 0.000 0.00 0.486
            Var2 0.00 1.000 0.000 0.000 0.00 0.088
            Var3 0.00 0.000 1.000 0.000 0.00 0.044
            Var4 0.00 0.000 0.000 1.000 0.00 -0.014
            Var5 0.00 0.000 0.000 0.000 1.00 -0.167
            y 0.49 0.088 0.044 -0.014 -0.17 1.000


            Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:



            >temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

            # Estimate Std. Error t value Pr(>|t|)
            1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
            F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478

            2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
            F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173

            3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
            F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451

            4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
            F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095

            5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
            F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118


            Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.



            I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.



            I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.






            share|cite|improve this answer























            • +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
              – gung
              Aug 21 '12 at 20:59












            • @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
              – whuber
              Aug 21 '12 at 21:23










            • Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
              – gung
              Aug 21 '12 at 21:52










            • @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
              – whuber
              Aug 21 '12 at 21:58













            up vote
            23
            down vote










            up vote
            23
            down vote









            I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.



            To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as





            set.seed(17)
            p <- 5 # Number of explanatory variables
            x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
            y <- x[,1] + rnorm(2^p, mean=0, sd=2)


            It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:



            > cor(cbind(x,y))
            Var1 Var2 Var3 Var4 Var5 y
            Var1 1.00 0.000 0.000 0.000 0.00 0.486
            Var2 0.00 1.000 0.000 0.000 0.00 0.088
            Var3 0.00 0.000 1.000 0.000 0.00 0.044
            Var4 0.00 0.000 0.000 1.000 0.00 -0.014
            Var5 0.00 0.000 0.000 0.000 1.00 -0.167
            y 0.49 0.088 0.044 -0.014 -0.17 1.000


            Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:



            >temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

            # Estimate Std. Error t value Pr(>|t|)
            1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
            F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478

            2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
            F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173

            3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
            F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451

            4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
            F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095

            5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
            F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118


            Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.



            I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.



            I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.






            share|cite|improve this answer














            I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.



            To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as





            set.seed(17)
            p <- 5 # Number of explanatory variables
            x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
            y <- x[,1] + rnorm(2^p, mean=0, sd=2)


            It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:



            > cor(cbind(x,y))
            Var1 Var2 Var3 Var4 Var5 y
            Var1 1.00 0.000 0.000 0.000 0.00 0.486
            Var2 0.00 1.000 0.000 0.000 0.00 0.088
            Var3 0.00 0.000 1.000 0.000 0.00 0.044
            Var4 0.00 0.000 0.000 1.000 0.00 -0.014
            Var5 0.00 0.000 0.000 0.000 1.00 -0.167
            y 0.49 0.088 0.044 -0.014 -0.17 1.000


            Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:



            >temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

            # Estimate Std. Error t value Pr(>|t|)
            1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
            F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478

            2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
            F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173

            3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
            F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451

            4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
            F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095

            5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
            F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118


            Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.



            I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.



            I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Aug 21 '12 at 21:57

























            answered Aug 21 '12 at 20:48









            whuber

            200k32433804




            200k32433804












            • +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
              – gung
              Aug 21 '12 at 20:59












            • @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
              – whuber
              Aug 21 '12 at 21:23










            • Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
              – gung
              Aug 21 '12 at 21:52










            • @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
              – whuber
              Aug 21 '12 at 21:58


















            • +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
              – gung
              Aug 21 '12 at 20:59












            • @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
              – whuber
              Aug 21 '12 at 21:23










            • Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
              – gung
              Aug 21 '12 at 21:52










            • @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
              – whuber
              Aug 21 '12 at 21:58
















            +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
            – gung
            Aug 21 '12 at 20:59






            +1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?
            – gung
            Aug 21 '12 at 20:59














            @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
            – whuber
            Aug 21 '12 at 21:23




            @gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.
            – whuber
            Aug 21 '12 at 21:23












            Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
            – gung
            Aug 21 '12 at 21:52




            Oh, sorry. I guess I was confusing rnorm() w/ $mathcal N(mu, sigma)$.
            – gung
            Aug 21 '12 at 21:52












            @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
            – whuber
            Aug 21 '12 at 21:58




            @gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.
            – whuber
            Aug 21 '12 at 21:58










            up vote
            11
            down vote













            You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.



            Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.






            share|cite|improve this answer





















            • If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
              – Dimitriy V. Masterov
              Mar 15 '12 at 20:36






            • 1




              (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
              – whuber
              Aug 22 '12 at 16:44

















            up vote
            11
            down vote













            You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.



            Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.






            share|cite|improve this answer





















            • If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
              – Dimitriy V. Masterov
              Mar 15 '12 at 20:36






            • 1




              (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
              – whuber
              Aug 22 '12 at 16:44















            up vote
            11
            down vote










            up vote
            11
            down vote









            You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.



            Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.






            share|cite|improve this answer












            You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.



            Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Mar 15 '12 at 20:24









            Dimitriy V. Masterov

            20.3k14091




            20.3k14091












            • If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
              – Dimitriy V. Masterov
              Mar 15 '12 at 20:36






            • 1




              (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
              – whuber
              Aug 22 '12 at 16:44




















            • If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
              – Dimitriy V. Masterov
              Mar 15 '12 at 20:36






            • 1




              (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
              – whuber
              Aug 22 '12 at 16:44


















            If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
            – Dimitriy V. Masterov
            Mar 15 '12 at 20:36




            If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.
            – Dimitriy V. Masterov
            Mar 15 '12 at 20:36




            1




            1




            (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
            – whuber
            Aug 22 '12 at 16:44






            (+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.
            – whuber
            Aug 22 '12 at 16:44




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f24720%2fsignificance-contradiction-in-linear-regression-significant-t-test-for-a-coeffi%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Quarter-circle Tiles

            build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

            Mont Emei