Different number of outliers with ggplot2











up vote
7
down vote

favorite
1












Can somebody explain to me why I get a different number of outliers with the normal boxplot command and with the geom_boxplot of ggplot2?
Here you have an example:



x <- c(280.9, 135.9, 321.4, 333.7, 0.2, 71.3, 33.0, 102.6, 126.8, 194.8, 35.5, 
107.3, 45.1, 107.2, 55.2, 28.1, 36.9, 24.3, 68.7, 163.5, 0.8, 31.8, 121.4,
84.7, 34.3, 25.2, 101.4, 203.2, 194.1, 27.9, 42.5, 47.0, 85.1, 90.4, 103.8,
45.1, 94.0, 36.0, 60.9, 97.1, 42.5, 96.4, 58.4, 174.0, 173.2, 164.1, 92.1,
41.9, 130.2, 94.7, 121.5, 261.4, 46.7, 16.3, 50.7, 112.9, 112.2, 242.5, 140.6,
112.6, 31.2, 36.7, 97.4, 140.5, 123.5, 42.9, 59.4, 94.5, 37.4, 232.2, 114.6,
60.7, 27.8, 115.5, 111.9, 60.1)
data <- data.frame(x)
boxplot(data$x)
ggplot(data, aes(y=x)) + geom_boxplot()


With the boxplot command I get the plot below with 4 outliers.
enter image description here



And with ggplot2 I get the plot below with 5 outliers.
enter image description here










share|improve this question
























  • Look at the ylimits. You're essentially zooming in.
    – NelsonGon
    5 hours ago






  • 3




    given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
    – IceCreamToucan
    5 hours ago










  • ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
    – PoGibas
    5 hours ago












  • (boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
    – r2evans
    5 hours ago















up vote
7
down vote

favorite
1












Can somebody explain to me why I get a different number of outliers with the normal boxplot command and with the geom_boxplot of ggplot2?
Here you have an example:



x <- c(280.9, 135.9, 321.4, 333.7, 0.2, 71.3, 33.0, 102.6, 126.8, 194.8, 35.5, 
107.3, 45.1, 107.2, 55.2, 28.1, 36.9, 24.3, 68.7, 163.5, 0.8, 31.8, 121.4,
84.7, 34.3, 25.2, 101.4, 203.2, 194.1, 27.9, 42.5, 47.0, 85.1, 90.4, 103.8,
45.1, 94.0, 36.0, 60.9, 97.1, 42.5, 96.4, 58.4, 174.0, 173.2, 164.1, 92.1,
41.9, 130.2, 94.7, 121.5, 261.4, 46.7, 16.3, 50.7, 112.9, 112.2, 242.5, 140.6,
112.6, 31.2, 36.7, 97.4, 140.5, 123.5, 42.9, 59.4, 94.5, 37.4, 232.2, 114.6,
60.7, 27.8, 115.5, 111.9, 60.1)
data <- data.frame(x)
boxplot(data$x)
ggplot(data, aes(y=x)) + geom_boxplot()


With the boxplot command I get the plot below with 4 outliers.
enter image description here



And with ggplot2 I get the plot below with 5 outliers.
enter image description here










share|improve this question
























  • Look at the ylimits. You're essentially zooming in.
    – NelsonGon
    5 hours ago






  • 3




    given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
    – IceCreamToucan
    5 hours ago










  • ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
    – PoGibas
    5 hours ago












  • (boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
    – r2evans
    5 hours ago













up vote
7
down vote

favorite
1









up vote
7
down vote

favorite
1






1





Can somebody explain to me why I get a different number of outliers with the normal boxplot command and with the geom_boxplot of ggplot2?
Here you have an example:



x <- c(280.9, 135.9, 321.4, 333.7, 0.2, 71.3, 33.0, 102.6, 126.8, 194.8, 35.5, 
107.3, 45.1, 107.2, 55.2, 28.1, 36.9, 24.3, 68.7, 163.5, 0.8, 31.8, 121.4,
84.7, 34.3, 25.2, 101.4, 203.2, 194.1, 27.9, 42.5, 47.0, 85.1, 90.4, 103.8,
45.1, 94.0, 36.0, 60.9, 97.1, 42.5, 96.4, 58.4, 174.0, 173.2, 164.1, 92.1,
41.9, 130.2, 94.7, 121.5, 261.4, 46.7, 16.3, 50.7, 112.9, 112.2, 242.5, 140.6,
112.6, 31.2, 36.7, 97.4, 140.5, 123.5, 42.9, 59.4, 94.5, 37.4, 232.2, 114.6,
60.7, 27.8, 115.5, 111.9, 60.1)
data <- data.frame(x)
boxplot(data$x)
ggplot(data, aes(y=x)) + geom_boxplot()


With the boxplot command I get the plot below with 4 outliers.
enter image description here



And with ggplot2 I get the plot below with 5 outliers.
enter image description here










share|improve this question















Can somebody explain to me why I get a different number of outliers with the normal boxplot command and with the geom_boxplot of ggplot2?
Here you have an example:



x <- c(280.9, 135.9, 321.4, 333.7, 0.2, 71.3, 33.0, 102.6, 126.8, 194.8, 35.5, 
107.3, 45.1, 107.2, 55.2, 28.1, 36.9, 24.3, 68.7, 163.5, 0.8, 31.8, 121.4,
84.7, 34.3, 25.2, 101.4, 203.2, 194.1, 27.9, 42.5, 47.0, 85.1, 90.4, 103.8,
45.1, 94.0, 36.0, 60.9, 97.1, 42.5, 96.4, 58.4, 174.0, 173.2, 164.1, 92.1,
41.9, 130.2, 94.7, 121.5, 261.4, 46.7, 16.3, 50.7, 112.9, 112.2, 242.5, 140.6,
112.6, 31.2, 36.7, 97.4, 140.5, 123.5, 42.9, 59.4, 94.5, 37.4, 232.2, 114.6,
60.7, 27.8, 115.5, 111.9, 60.1)
data <- data.frame(x)
boxplot(data$x)
ggplot(data, aes(y=x)) + geom_boxplot()


With the boxplot command I get the plot below with 4 outliers.
enter image description here



And with ggplot2 I get the plot below with 5 outliers.
enter image description here







r ggplot2 data-visualization boxplot






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 hours ago









massisenergy

360112




360112










asked 5 hours ago









Alfredo Sánchez

175111




175111












  • Look at the ylimits. You're essentially zooming in.
    – NelsonGon
    5 hours ago






  • 3




    given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
    – IceCreamToucan
    5 hours ago










  • ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
    – PoGibas
    5 hours ago












  • (boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
    – r2evans
    5 hours ago


















  • Look at the ylimits. You're essentially zooming in.
    – NelsonGon
    5 hours ago






  • 3




    given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
    – IceCreamToucan
    5 hours ago










  • ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
    – PoGibas
    5 hours ago












  • (boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
    – r2evans
    5 hours ago
















Look at the ylimits. You're essentially zooming in.
– NelsonGon
5 hours ago




Look at the ylimits. You're essentially zooming in.
– NelsonGon
5 hours ago




3




3




given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
– IceCreamToucan
5 hours ago




given that both plots show data from 200-300, and that's where the extra outlier is, this isn't a zoom issue
– IceCreamToucan
5 hours ago












ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
– PoGibas
5 hours ago






ggplot2 and base boxplot use same range (1.5), but do they use same way to calculate quantiles?
– PoGibas
5 hours ago














(boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
– r2evans
5 hours ago




(boxplot(data$x)) shows that its upper hinge is at 122.5, not 122.0 as suggested by quantile(data$x). This would put the end of the whisker at 242.5, which is above the 241.25 point. @dww's excellent answer demonstrates a way to mitigate this.
– r2evans
5 hours ago












1 Answer
1






active

oldest

votes

















up vote
9
down vote



accepted










ggplot and boxplot use slightly different methods to calculate the statistics. From ?geom_boxplot we can see




The lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). This differs slightly from the method
used by the boxplot() function, and may be apparent with small
samples. See boxplot.stats() for for more information on how hinge
positions are calculated for boxplot().




You can get ggplot to use boxplot.stats if you want the same results



# Function to use boxplot.stats to set the box-and-whisker locations  
f.bxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
bxp
}

# Function to use boxplot.stats for the outliers
f.out = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}


To use those functions in ggplot:



ggplot(data, aes(0, y=x)) + 
stat_summary(fun.data=f.bxp, geom="boxplot") +
stat_summary(fun.data=f.out, geom="point")


enter image description here



If you want to replicate the statistics that ggplot uses natively, these are exaplined in ?geom_boxplot thusly:




ymin lower whisker = smallest observation greater than or equal to
lower hinge - 1.5 * IQR



lower lower hinge, 25% quantile



notchlower lower edge of notch = median - 1.58 * IQR / sqrt(n)



middle median, 50% quantile



notchupper upper edge of notch = median + 1.58 * IQR / sqrt(n)



upper upper hinge, 75% quantile



ymax upper whisker = largest observation less than or equal to upper
hinge + 1.5 * IQR




We can calculate these accordingly:



y = sort(x)
iqr = quantile(y,0.75) - quantile(y,0.25)
ymin = y[which(y >= quantile(y,0.25) - 1.5*iqr)][1]
ymax = tail(y[which(y <= quantile(y,0.75) + 1.5*iqr)],1)
lower = quantile(y,0.25)
upper = quantile(y,0.75)
middle = quantile(y,0.5)

ggplot(data, aes(y=x)) +
geom_boxplot() +
geom_hline(aes(yintercept=c(ymin)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(ymax)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(lower)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(upper)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(middle)), color='red', linetype='dashed')


enter image description here



We can also extract these statistics directly from a ggplot object using ggplot_build



p <- ggplot(data, aes(y=x)) + geom_boxplot() 
ggplot_build(p)$data

# ymin lower middle upper ymax outliers notchupper notchlower x PANEL group ymin_final
# 1 0.2 42.5 93.05 122 232.2 280.9, 321.4, 333.7, 261.4, 242.5 107.4585 78.64154 0 1 -1 0.2
# ymax_final xmin xmax xid newx new_width weight colour fill size alpha shape linetype
# 1 333.7 -0.375 0.375 1 0 0.75 1 grey20 white 0.5 NA 19 solid





share|improve this answer























  • Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
    – Alfredo Sánchez
    3 hours ago










  • sure - see edits in answer to show how
    – dww
    2 hours ago










  • Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
    – Alfredo Sánchez
    2 hours ago










  • that's right - ty - corrected
    – dww
    2 hours ago











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53794922%2fdifferent-number-of-outliers-with-ggplot2%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
9
down vote



accepted










ggplot and boxplot use slightly different methods to calculate the statistics. From ?geom_boxplot we can see




The lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). This differs slightly from the method
used by the boxplot() function, and may be apparent with small
samples. See boxplot.stats() for for more information on how hinge
positions are calculated for boxplot().




You can get ggplot to use boxplot.stats if you want the same results



# Function to use boxplot.stats to set the box-and-whisker locations  
f.bxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
bxp
}

# Function to use boxplot.stats for the outliers
f.out = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}


To use those functions in ggplot:



ggplot(data, aes(0, y=x)) + 
stat_summary(fun.data=f.bxp, geom="boxplot") +
stat_summary(fun.data=f.out, geom="point")


enter image description here



If you want to replicate the statistics that ggplot uses natively, these are exaplined in ?geom_boxplot thusly:




ymin lower whisker = smallest observation greater than or equal to
lower hinge - 1.5 * IQR



lower lower hinge, 25% quantile



notchlower lower edge of notch = median - 1.58 * IQR / sqrt(n)



middle median, 50% quantile



notchupper upper edge of notch = median + 1.58 * IQR / sqrt(n)



upper upper hinge, 75% quantile



ymax upper whisker = largest observation less than or equal to upper
hinge + 1.5 * IQR




We can calculate these accordingly:



y = sort(x)
iqr = quantile(y,0.75) - quantile(y,0.25)
ymin = y[which(y >= quantile(y,0.25) - 1.5*iqr)][1]
ymax = tail(y[which(y <= quantile(y,0.75) + 1.5*iqr)],1)
lower = quantile(y,0.25)
upper = quantile(y,0.75)
middle = quantile(y,0.5)

ggplot(data, aes(y=x)) +
geom_boxplot() +
geom_hline(aes(yintercept=c(ymin)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(ymax)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(lower)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(upper)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(middle)), color='red', linetype='dashed')


enter image description here



We can also extract these statistics directly from a ggplot object using ggplot_build



p <- ggplot(data, aes(y=x)) + geom_boxplot() 
ggplot_build(p)$data

# ymin lower middle upper ymax outliers notchupper notchlower x PANEL group ymin_final
# 1 0.2 42.5 93.05 122 232.2 280.9, 321.4, 333.7, 261.4, 242.5 107.4585 78.64154 0 1 -1 0.2
# ymax_final xmin xmax xid newx new_width weight colour fill size alpha shape linetype
# 1 333.7 -0.375 0.375 1 0 0.75 1 grey20 white 0.5 NA 19 solid





share|improve this answer























  • Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
    – Alfredo Sánchez
    3 hours ago










  • sure - see edits in answer to show how
    – dww
    2 hours ago










  • Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
    – Alfredo Sánchez
    2 hours ago










  • that's right - ty - corrected
    – dww
    2 hours ago















up vote
9
down vote



accepted










ggplot and boxplot use slightly different methods to calculate the statistics. From ?geom_boxplot we can see




The lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). This differs slightly from the method
used by the boxplot() function, and may be apparent with small
samples. See boxplot.stats() for for more information on how hinge
positions are calculated for boxplot().




You can get ggplot to use boxplot.stats if you want the same results



# Function to use boxplot.stats to set the box-and-whisker locations  
f.bxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
bxp
}

# Function to use boxplot.stats for the outliers
f.out = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}


To use those functions in ggplot:



ggplot(data, aes(0, y=x)) + 
stat_summary(fun.data=f.bxp, geom="boxplot") +
stat_summary(fun.data=f.out, geom="point")


enter image description here



If you want to replicate the statistics that ggplot uses natively, these are exaplined in ?geom_boxplot thusly:




ymin lower whisker = smallest observation greater than or equal to
lower hinge - 1.5 * IQR



lower lower hinge, 25% quantile



notchlower lower edge of notch = median - 1.58 * IQR / sqrt(n)



middle median, 50% quantile



notchupper upper edge of notch = median + 1.58 * IQR / sqrt(n)



upper upper hinge, 75% quantile



ymax upper whisker = largest observation less than or equal to upper
hinge + 1.5 * IQR




We can calculate these accordingly:



y = sort(x)
iqr = quantile(y,0.75) - quantile(y,0.25)
ymin = y[which(y >= quantile(y,0.25) - 1.5*iqr)][1]
ymax = tail(y[which(y <= quantile(y,0.75) + 1.5*iqr)],1)
lower = quantile(y,0.25)
upper = quantile(y,0.75)
middle = quantile(y,0.5)

ggplot(data, aes(y=x)) +
geom_boxplot() +
geom_hline(aes(yintercept=c(ymin)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(ymax)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(lower)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(upper)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(middle)), color='red', linetype='dashed')


enter image description here



We can also extract these statistics directly from a ggplot object using ggplot_build



p <- ggplot(data, aes(y=x)) + geom_boxplot() 
ggplot_build(p)$data

# ymin lower middle upper ymax outliers notchupper notchlower x PANEL group ymin_final
# 1 0.2 42.5 93.05 122 232.2 280.9, 321.4, 333.7, 261.4, 242.5 107.4585 78.64154 0 1 -1 0.2
# ymax_final xmin xmax xid newx new_width weight colour fill size alpha shape linetype
# 1 333.7 -0.375 0.375 1 0 0.75 1 grey20 white 0.5 NA 19 solid





share|improve this answer























  • Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
    – Alfredo Sánchez
    3 hours ago










  • sure - see edits in answer to show how
    – dww
    2 hours ago










  • Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
    – Alfredo Sánchez
    2 hours ago










  • that's right - ty - corrected
    – dww
    2 hours ago













up vote
9
down vote



accepted







up vote
9
down vote



accepted






ggplot and boxplot use slightly different methods to calculate the statistics. From ?geom_boxplot we can see




The lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). This differs slightly from the method
used by the boxplot() function, and may be apparent with small
samples. See boxplot.stats() for for more information on how hinge
positions are calculated for boxplot().




You can get ggplot to use boxplot.stats if you want the same results



# Function to use boxplot.stats to set the box-and-whisker locations  
f.bxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
bxp
}

# Function to use boxplot.stats for the outliers
f.out = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}


To use those functions in ggplot:



ggplot(data, aes(0, y=x)) + 
stat_summary(fun.data=f.bxp, geom="boxplot") +
stat_summary(fun.data=f.out, geom="point")


enter image description here



If you want to replicate the statistics that ggplot uses natively, these are exaplined in ?geom_boxplot thusly:




ymin lower whisker = smallest observation greater than or equal to
lower hinge - 1.5 * IQR



lower lower hinge, 25% quantile



notchlower lower edge of notch = median - 1.58 * IQR / sqrt(n)



middle median, 50% quantile



notchupper upper edge of notch = median + 1.58 * IQR / sqrt(n)



upper upper hinge, 75% quantile



ymax upper whisker = largest observation less than or equal to upper
hinge + 1.5 * IQR




We can calculate these accordingly:



y = sort(x)
iqr = quantile(y,0.75) - quantile(y,0.25)
ymin = y[which(y >= quantile(y,0.25) - 1.5*iqr)][1]
ymax = tail(y[which(y <= quantile(y,0.75) + 1.5*iqr)],1)
lower = quantile(y,0.25)
upper = quantile(y,0.75)
middle = quantile(y,0.5)

ggplot(data, aes(y=x)) +
geom_boxplot() +
geom_hline(aes(yintercept=c(ymin)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(ymax)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(lower)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(upper)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(middle)), color='red', linetype='dashed')


enter image description here



We can also extract these statistics directly from a ggplot object using ggplot_build



p <- ggplot(data, aes(y=x)) + geom_boxplot() 
ggplot_build(p)$data

# ymin lower middle upper ymax outliers notchupper notchlower x PANEL group ymin_final
# 1 0.2 42.5 93.05 122 232.2 280.9, 321.4, 333.7, 261.4, 242.5 107.4585 78.64154 0 1 -1 0.2
# ymax_final xmin xmax xid newx new_width weight colour fill size alpha shape linetype
# 1 333.7 -0.375 0.375 1 0 0.75 1 grey20 white 0.5 NA 19 solid





share|improve this answer














ggplot and boxplot use slightly different methods to calculate the statistics. From ?geom_boxplot we can see




The lower and upper hinges correspond to the first and third quartiles
(the 25th and 75th percentiles). This differs slightly from the method
used by the boxplot() function, and may be apparent with small
samples. See boxplot.stats() for for more information on how hinge
positions are calculated for boxplot().




You can get ggplot to use boxplot.stats if you want the same results



# Function to use boxplot.stats to set the box-and-whisker locations  
f.bxp = function(x) {
bxp = boxplot.stats(x)[["stats"]]
names(bxp) = c("ymin","lower", "middle","upper","ymax")
bxp
}

# Function to use boxplot.stats for the outliers
f.out = function(x) {
data.frame(y=boxplot.stats(x)[["out"]])
}


To use those functions in ggplot:



ggplot(data, aes(0, y=x)) + 
stat_summary(fun.data=f.bxp, geom="boxplot") +
stat_summary(fun.data=f.out, geom="point")


enter image description here



If you want to replicate the statistics that ggplot uses natively, these are exaplined in ?geom_boxplot thusly:




ymin lower whisker = smallest observation greater than or equal to
lower hinge - 1.5 * IQR



lower lower hinge, 25% quantile



notchlower lower edge of notch = median - 1.58 * IQR / sqrt(n)



middle median, 50% quantile



notchupper upper edge of notch = median + 1.58 * IQR / sqrt(n)



upper upper hinge, 75% quantile



ymax upper whisker = largest observation less than or equal to upper
hinge + 1.5 * IQR




We can calculate these accordingly:



y = sort(x)
iqr = quantile(y,0.75) - quantile(y,0.25)
ymin = y[which(y >= quantile(y,0.25) - 1.5*iqr)][1]
ymax = tail(y[which(y <= quantile(y,0.75) + 1.5*iqr)],1)
lower = quantile(y,0.25)
upper = quantile(y,0.75)
middle = quantile(y,0.5)

ggplot(data, aes(y=x)) +
geom_boxplot() +
geom_hline(aes(yintercept=c(ymin)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(ymax)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(lower)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(upper)), color='red', linetype='dashed') +
geom_hline(aes(yintercept=c(middle)), color='red', linetype='dashed')


enter image description here



We can also extract these statistics directly from a ggplot object using ggplot_build



p <- ggplot(data, aes(y=x)) + geom_boxplot() 
ggplot_build(p)$data

# ymin lower middle upper ymax outliers notchupper notchlower x PANEL group ymin_final
# 1 0.2 42.5 93.05 122 232.2 280.9, 321.4, 333.7, 261.4, 242.5 107.4585 78.64154 0 1 -1 0.2
# ymax_final xmin xmax xid newx new_width weight colour fill size alpha shape linetype
# 1 333.7 -0.375 0.375 1 0 0.75 1 grey20 white 0.5 NA 19 solid






share|improve this answer














share|improve this answer



share|improve this answer








edited 1 hour ago

























answered 5 hours ago









dww

14.1k22553




14.1k22553












  • Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
    – Alfredo Sánchez
    3 hours ago










  • sure - see edits in answer to show how
    – dww
    2 hours ago










  • Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
    – Alfredo Sánchez
    2 hours ago










  • that's right - ty - corrected
    – dww
    2 hours ago


















  • Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
    – Alfredo Sánchez
    3 hours ago










  • sure - see edits in answer to show how
    – dww
    2 hours ago










  • Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
    – Alfredo Sánchez
    2 hours ago










  • that's right - ty - corrected
    – dww
    2 hours ago
















Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
– Alfredo Sánchez
3 hours ago




Is it possible to get the stats from the geom_boxplot like in boxplot.stats()?
– Alfredo Sánchez
3 hours ago












sure - see edits in answer to show how
– dww
2 hours ago




sure - see edits in answer to show how
– dww
2 hours ago












Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
– Alfredo Sánchez
2 hours ago




Thanks @dww for answering so quickly. Just one thing, in the computation of ymin you must use >=, and in the computation of ymax <=, must'n you?
– Alfredo Sánchez
2 hours ago












that's right - ty - corrected
– dww
2 hours ago




that's right - ty - corrected
– dww
2 hours ago


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53794922%2fdifferent-number-of-outliers-with-ggplot2%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei