Why is PCA sensitive to outliers?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}






up vote
21
down vote

favorite
8












There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    yesterday










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    yesterday

















up vote
21
down vote

favorite
8












There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    yesterday










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    yesterday













up vote
21
down vote

favorite
8









up vote
21
down vote

favorite
8






8





There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?










share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?







machine-learning pca outliers






share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 2 days ago





















New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Psi

20817




20817




New contributor




Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Psi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    yesterday










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    yesterday














  • 4




    Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
    – mathreadler
    yesterday










  • This answer tells you everything you need. Just picture an outlier and read attentively.
    – Stephan Kolassa
    yesterday








4




4




Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday




Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday












This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday




This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday










1 Answer
1






active

oldest

votes

















up vote
29
down vote



accepted










One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$
is a Frobenius norm of the matrix



Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






share|cite|improve this answer























    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Psi is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    29
    down vote



    accepted










    One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
    $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
    Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
    cdot rVert_F$
    is a Frobenius norm of the matrix



    Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






    share|cite|improve this answer



























      up vote
      29
      down vote



      accepted










      One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
      $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
      Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
      cdot rVert_F$
      is a Frobenius norm of the matrix



      Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






      share|cite|improve this answer

























        up vote
        29
        down vote



        accepted







        up vote
        29
        down vote



        accepted






        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.






        share|cite|improve this answer














        One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
        $$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
        Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
        cdot rVert_F$
        is a Frobenius norm of the matrix



        Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited yesterday

























        answered 2 days ago









        sega_sai

        655610




        655610






















            Psi is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            Psi is a new contributor. Be nice, and check out our Code of Conduct.













            Psi is a new contributor. Be nice, and check out our Code of Conduct.












            Psi is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Ellipse (mathématiques)

            Quarter-circle Tiles

            Mont Emei