Why is PCA sensitive to outliers?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty{ margin-bottom:0;
}
up vote
21
down vote
favorite
There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?
machine-learning pca outliers
New contributor
add a comment |
up vote
21
down vote
favorite
There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?
machine-learning pca outliers
New contributor
4
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday
add a comment |
up vote
21
down vote
favorite
up vote
21
down vote
favorite
There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?
machine-learning pca outliers
New contributor
There are many posts on this SE that discuss robust approaches to Principal Component Analysis (PCA) but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place?
machine-learning pca outliers
machine-learning pca outliers
New contributor
New contributor
edited 2 days ago
New contributor
asked 2 days ago
Psi
20817
20817
New contributor
New contributor
4
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday
add a comment |
4
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday
4
4
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday
add a comment |
1 Answer
1
active
oldest
votes
up vote
29
down vote
accepted
One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$ is a Frobenius norm of the matrix
Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
29
down vote
accepted
One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$ is a Frobenius norm of the matrix
Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.
add a comment |
up vote
29
down vote
accepted
One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$ is a Frobenius norm of the matrix
Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.
add a comment |
up vote
29
down vote
accepted
up vote
29
down vote
accepted
One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$ is a Frobenius norm of the matrix
Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.
One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize
$$lVert Y-XA rVert^2_F = sum_{j=1}^{m} lVert Y_j - X A_{j.} rVert^2 $$
Here $A$ is the matrix of coefficients of PCA decomposition and $lVert
cdot rVert_F$ is a Frobenius norm of the matrix
Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.
edited yesterday
answered 2 days ago
sega_sai
655610
655610
add a comment |
add a comment |
Psi is a new contributor. Be nice, and check out our Code of Conduct.
Psi is a new contributor. Be nice, and check out our Code of Conduct.
Psi is a new contributor. Be nice, and check out our Code of Conduct.
Psi is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f378751%2fwhy-is-pca-sensitive-to-outliers%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Because L2 norm contribution is very high for outliers. Then when minimizing L2 norm (which is what PCA tries to do), those points will pull harder to fit than points closer to middle will.
– mathreadler
yesterday
This answer tells you everything you need. Just picture an outlier and read attentively.
– Stephan Kolassa
yesterday