Kruskal Wallis - Effect size











up vote
4
down vote

favorite
2












I analyse 4 algorithms and 3 sets of metrics for each algorithm in which I apply the non-parametric Kruskal-Wallis test for each metric to detect any differences in performance between these algorithms.



I would like to know whether there is a way to calculate the effect size when applying the Kruskal-Wallis test.



As mentioned in other posts in CV, a post-hoc analysis for Kruskal-Wallis should use the Dunn's test and not the Mann-Witney test for pairwise comparisons between groups (algorithms).



By applying the "inaccurate" MW test, I can calculate the effect size, but what can I do if I apply Dunn's test?



Thanks in advance for any comment/advice.



PS: I posted this question to CV some time ago, but I didn't receive any reply yet. Hence, I post it in this forum too.










share|cite|improve this question


















  • 1




    Please provide a link to the question you posted at Cross Validated.
    – Joel Reyes Noche
    Aug 10 '16 at 8:16















up vote
4
down vote

favorite
2












I analyse 4 algorithms and 3 sets of metrics for each algorithm in which I apply the non-parametric Kruskal-Wallis test for each metric to detect any differences in performance between these algorithms.



I would like to know whether there is a way to calculate the effect size when applying the Kruskal-Wallis test.



As mentioned in other posts in CV, a post-hoc analysis for Kruskal-Wallis should use the Dunn's test and not the Mann-Witney test for pairwise comparisons between groups (algorithms).



By applying the "inaccurate" MW test, I can calculate the effect size, but what can I do if I apply Dunn's test?



Thanks in advance for any comment/advice.



PS: I posted this question to CV some time ago, but I didn't receive any reply yet. Hence, I post it in this forum too.










share|cite|improve this question


















  • 1




    Please provide a link to the question you posted at Cross Validated.
    – Joel Reyes Noche
    Aug 10 '16 at 8:16













up vote
4
down vote

favorite
2









up vote
4
down vote

favorite
2






2





I analyse 4 algorithms and 3 sets of metrics for each algorithm in which I apply the non-parametric Kruskal-Wallis test for each metric to detect any differences in performance between these algorithms.



I would like to know whether there is a way to calculate the effect size when applying the Kruskal-Wallis test.



As mentioned in other posts in CV, a post-hoc analysis for Kruskal-Wallis should use the Dunn's test and not the Mann-Witney test for pairwise comparisons between groups (algorithms).



By applying the "inaccurate" MW test, I can calculate the effect size, but what can I do if I apply Dunn's test?



Thanks in advance for any comment/advice.



PS: I posted this question to CV some time ago, but I didn't receive any reply yet. Hence, I post it in this forum too.










share|cite|improve this question













I analyse 4 algorithms and 3 sets of metrics for each algorithm in which I apply the non-parametric Kruskal-Wallis test for each metric to detect any differences in performance between these algorithms.



I would like to know whether there is a way to calculate the effect size when applying the Kruskal-Wallis test.



As mentioned in other posts in CV, a post-hoc analysis for Kruskal-Wallis should use the Dunn's test and not the Mann-Witney test for pairwise comparisons between groups (algorithms).



By applying the "inaccurate" MW test, I can calculate the effect size, but what can I do if I apply Dunn's test?



Thanks in advance for any comment/advice.



PS: I posted this question to CV some time ago, but I didn't receive any reply yet. Hence, I post it in this forum too.







probability statistics statistical-inference






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Apr 15 '15 at 8:47









STiGMa

1214




1214








  • 1




    Please provide a link to the question you posted at Cross Validated.
    – Joel Reyes Noche
    Aug 10 '16 at 8:16














  • 1




    Please provide a link to the question you posted at Cross Validated.
    – Joel Reyes Noche
    Aug 10 '16 at 8:16








1




1




Please provide a link to the question you posted at Cross Validated.
– Joel Reyes Noche
Aug 10 '16 at 8:16




Please provide a link to the question you posted at Cross Validated.
– Joel Reyes Noche
Aug 10 '16 at 8:16










1 Answer
1






active

oldest

votes

















up vote
0
down vote













First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups.
Let's suppose you are testing at the 5% level.
Once we know there is some pattern of differences discuss, we
can try to discover what that pattern is by making pairwise
comparisons among the $g$ treatment groups.



There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons.
If we make each of these at the 5% level, there is a possibility
that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially
more than 5%.



A Bonferroni procedure is based on the Bonferroni inequality
of probability theory. In your application, the idea is that
if we test all $c$ comparisons at the level $.05/c$, then the
family error probability of the pattern cannot exceed 5%.



So if you have $g = 4$ groups, then $c = 6$ and you should
make each multiple comparison at level $.05/6 = 0.0083.$
You could use six 2-sample Wilcoxon tests at that level.
or you could look at six Wilcoxon confidence intervals for differences
in medians at confidence level $1 - 0.0083$ or 99.2%.



There is no guarantee that the pattern will be absolutely
clear. For example, if we have three levels 1, 2, and 3 in
increasing order of sample medians, it is possible you might
find a clear difference between extremes 1 and 3, but not
be able to resolve whether 2 is significantly different
from either 1 or 3.



Caveats: I wonder what you mean that you have 4 algorithms
and 3 sets of metrics. From that description, I have no
idea what your experimental design looks like. Are you
doing three separate Kruskal-Wallis tests, one for each
'set of metrics'? If so, what I have said above is OK,
with $g = 4.$



Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't
see how a Kruskal-Wallis test can give an appropriate analysis.



If you want to say what the the factors in your design are
and how many replications for each factor (or combination thereof),
maybe my advice would be different.



Also, I'm wondering what kind of data you have that causes you
to think in terms of nonparametric tests.






share|cite|improve this answer





















  • Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
    – STiGMa
    Apr 16 '15 at 13:34










  • I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
    – STiGMa
    Apr 16 '15 at 13:41










  • Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
    – STiGMa
    Apr 16 '15 at 13:43










  • (1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
    – BruceET
    Apr 16 '15 at 18:22






  • 1




    I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
    – BruceET
    Apr 16 '15 at 19:24













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1235594%2fkruskal-wallis-effect-size%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote













First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups.
Let's suppose you are testing at the 5% level.
Once we know there is some pattern of differences discuss, we
can try to discover what that pattern is by making pairwise
comparisons among the $g$ treatment groups.



There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons.
If we make each of these at the 5% level, there is a possibility
that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially
more than 5%.



A Bonferroni procedure is based on the Bonferroni inequality
of probability theory. In your application, the idea is that
if we test all $c$ comparisons at the level $.05/c$, then the
family error probability of the pattern cannot exceed 5%.



So if you have $g = 4$ groups, then $c = 6$ and you should
make each multiple comparison at level $.05/6 = 0.0083.$
You could use six 2-sample Wilcoxon tests at that level.
or you could look at six Wilcoxon confidence intervals for differences
in medians at confidence level $1 - 0.0083$ or 99.2%.



There is no guarantee that the pattern will be absolutely
clear. For example, if we have three levels 1, 2, and 3 in
increasing order of sample medians, it is possible you might
find a clear difference between extremes 1 and 3, but not
be able to resolve whether 2 is significantly different
from either 1 or 3.



Caveats: I wonder what you mean that you have 4 algorithms
and 3 sets of metrics. From that description, I have no
idea what your experimental design looks like. Are you
doing three separate Kruskal-Wallis tests, one for each
'set of metrics'? If so, what I have said above is OK,
with $g = 4.$



Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't
see how a Kruskal-Wallis test can give an appropriate analysis.



If you want to say what the the factors in your design are
and how many replications for each factor (or combination thereof),
maybe my advice would be different.



Also, I'm wondering what kind of data you have that causes you
to think in terms of nonparametric tests.






share|cite|improve this answer





















  • Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
    – STiGMa
    Apr 16 '15 at 13:34










  • I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
    – STiGMa
    Apr 16 '15 at 13:41










  • Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
    – STiGMa
    Apr 16 '15 at 13:43










  • (1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
    – BruceET
    Apr 16 '15 at 18:22






  • 1




    I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
    – BruceET
    Apr 16 '15 at 19:24

















up vote
0
down vote













First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups.
Let's suppose you are testing at the 5% level.
Once we know there is some pattern of differences discuss, we
can try to discover what that pattern is by making pairwise
comparisons among the $g$ treatment groups.



There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons.
If we make each of these at the 5% level, there is a possibility
that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially
more than 5%.



A Bonferroni procedure is based on the Bonferroni inequality
of probability theory. In your application, the idea is that
if we test all $c$ comparisons at the level $.05/c$, then the
family error probability of the pattern cannot exceed 5%.



So if you have $g = 4$ groups, then $c = 6$ and you should
make each multiple comparison at level $.05/6 = 0.0083.$
You could use six 2-sample Wilcoxon tests at that level.
or you could look at six Wilcoxon confidence intervals for differences
in medians at confidence level $1 - 0.0083$ or 99.2%.



There is no guarantee that the pattern will be absolutely
clear. For example, if we have three levels 1, 2, and 3 in
increasing order of sample medians, it is possible you might
find a clear difference between extremes 1 and 3, but not
be able to resolve whether 2 is significantly different
from either 1 or 3.



Caveats: I wonder what you mean that you have 4 algorithms
and 3 sets of metrics. From that description, I have no
idea what your experimental design looks like. Are you
doing three separate Kruskal-Wallis tests, one for each
'set of metrics'? If so, what I have said above is OK,
with $g = 4.$



Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't
see how a Kruskal-Wallis test can give an appropriate analysis.



If you want to say what the the factors in your design are
and how many replications for each factor (or combination thereof),
maybe my advice would be different.



Also, I'm wondering what kind of data you have that causes you
to think in terms of nonparametric tests.






share|cite|improve this answer





















  • Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
    – STiGMa
    Apr 16 '15 at 13:34










  • I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
    – STiGMa
    Apr 16 '15 at 13:41










  • Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
    – STiGMa
    Apr 16 '15 at 13:43










  • (1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
    – BruceET
    Apr 16 '15 at 18:22






  • 1




    I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
    – BruceET
    Apr 16 '15 at 19:24















up vote
0
down vote










up vote
0
down vote









First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups.
Let's suppose you are testing at the 5% level.
Once we know there is some pattern of differences discuss, we
can try to discover what that pattern is by making pairwise
comparisons among the $g$ treatment groups.



There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons.
If we make each of these at the 5% level, there is a possibility
that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially
more than 5%.



A Bonferroni procedure is based on the Bonferroni inequality
of probability theory. In your application, the idea is that
if we test all $c$ comparisons at the level $.05/c$, then the
family error probability of the pattern cannot exceed 5%.



So if you have $g = 4$ groups, then $c = 6$ and you should
make each multiple comparison at level $.05/6 = 0.0083.$
You could use six 2-sample Wilcoxon tests at that level.
or you could look at six Wilcoxon confidence intervals for differences
in medians at confidence level $1 - 0.0083$ or 99.2%.



There is no guarantee that the pattern will be absolutely
clear. For example, if we have three levels 1, 2, and 3 in
increasing order of sample medians, it is possible you might
find a clear difference between extremes 1 and 3, but not
be able to resolve whether 2 is significantly different
from either 1 or 3.



Caveats: I wonder what you mean that you have 4 algorithms
and 3 sets of metrics. From that description, I have no
idea what your experimental design looks like. Are you
doing three separate Kruskal-Wallis tests, one for each
'set of metrics'? If so, what I have said above is OK,
with $g = 4.$



Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't
see how a Kruskal-Wallis test can give an appropriate analysis.



If you want to say what the the factors in your design are
and how many replications for each factor (or combination thereof),
maybe my advice would be different.



Also, I'm wondering what kind of data you have that causes you
to think in terms of nonparametric tests.






share|cite|improve this answer












First, Dunn-Bonferroni tests are to be done only if the Kruskal-Wallis test indicates there are some differences among the groups.
Let's suppose you are testing at the 5% level.
Once we know there is some pattern of differences discuss, we
can try to discover what that pattern is by making pairwise
comparisons among the $g$ treatment groups.



There are $c = C(g, 2) = g(g-1)/2$ possible paired comparisons.
If we make each of these at the 5% level, there is a possibility
that the 'grand' or 'family' error probability for a pattern of differences emerging from the paired comparisons will be substantially
more than 5%.



A Bonferroni procedure is based on the Bonferroni inequality
of probability theory. In your application, the idea is that
if we test all $c$ comparisons at the level $.05/c$, then the
family error probability of the pattern cannot exceed 5%.



So if you have $g = 4$ groups, then $c = 6$ and you should
make each multiple comparison at level $.05/6 = 0.0083.$
You could use six 2-sample Wilcoxon tests at that level.
or you could look at six Wilcoxon confidence intervals for differences
in medians at confidence level $1 - 0.0083$ or 99.2%.



There is no guarantee that the pattern will be absolutely
clear. For example, if we have three levels 1, 2, and 3 in
increasing order of sample medians, it is possible you might
find a clear difference between extremes 1 and 3, but not
be able to resolve whether 2 is significantly different
from either 1 or 3.



Caveats: I wonder what you mean that you have 4 algorithms
and 3 sets of metrics. From that description, I have no
idea what your experimental design looks like. Are you
doing three separate Kruskal-Wallis tests, one for each
'set of metrics'? If so, what I have said above is OK,
with $g = 4.$



Or do you have a two factor design, in which one factor is 'algorithm' and the other is 'metric'? In that case I don't
see how a Kruskal-Wallis test can give an appropriate analysis.



If you want to say what the the factors in your design are
and how many replications for each factor (or combination thereof),
maybe my advice would be different.



Also, I'm wondering what kind of data you have that causes you
to think in terms of nonparametric tests.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Apr 16 '15 at 2:28









BruceET

35.1k71440




35.1k71440












  • Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
    – STiGMa
    Apr 16 '15 at 13:34










  • I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
    – STiGMa
    Apr 16 '15 at 13:41










  • Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
    – STiGMa
    Apr 16 '15 at 13:43










  • (1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
    – BruceET
    Apr 16 '15 at 18:22






  • 1




    I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
    – BruceET
    Apr 16 '15 at 19:24




















  • Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
    – STiGMa
    Apr 16 '15 at 13:34










  • I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
    – STiGMa
    Apr 16 '15 at 13:41










  • Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
    – STiGMa
    Apr 16 '15 at 13:43










  • (1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
    – BruceET
    Apr 16 '15 at 18:22






  • 1




    I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
    – BruceET
    Apr 16 '15 at 19:24


















Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
– STiGMa
Apr 16 '15 at 13:34




Bruce, thanks a lot for your reply. This is indeed what I am doing: Pairwise comparisons using Dunn-Bonferroni tests, only if the Kruskal-Wallis test shows significant difference. However, I think that running a 2-sample Wilcoxon test (using the Bonferronin correction) is not the same as running Dunn-Bonferroni test because the latter takes into account the pooled variance implied by the null hypothesis and retains the ranking obtained from the Kruskal-Wallis test. Please correct me if this is wrong.
– STiGMa
Apr 16 '15 at 13:34












I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
– STiGMa
Apr 16 '15 at 13:41




I will try to explain the experimental design more clearly: I have 4 machine learning algorithms which I test in a set of 3 problem instances of increasing difficulty. For each problem instance I calculate a metric (4 algorithms x 3 problems) and run the Kruskal-Wallis test to test the performance of the 4 algorithms per problem instance. If significant difference is found, I run pairwise comparisons using Dunn-Bonferroni test to identify the best algorithms.
– STiGMa
Apr 16 '15 at 13:41












Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
– STiGMa
Apr 16 '15 at 13:43




Regarding your last question on selecting non-parametric tests, this is common in my area. To confirm this, however, I run before a Shapiro-Wilk's test to test whether the data follow the normal distribution and Levene's test to test for homogeneity of variance. Since, non of them holds True, I opted for using non-parametric tests.
– STiGMa
Apr 16 '15 at 13:43












(1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
– BruceET
Apr 16 '15 at 18:22




(1) I'm still unclear whether 'problem instance' is an effect. If so KW is the wrong test. (2) KW requires groups to differ only by a shift, but pop dist'ns for groups must be of the same shape, and that implies equal variances. (3) If your number of replications per treatment is large, you may be better off with a Welsh version of standard ANOVA to adjust for heteroscedasticity. (4) What you're calling Dunn-Bonferroni may somehow use 'pooled variance', but if your variances are unequal, that information may be worse than useless. (5) I fear all of this is beyond your origl question.
– BruceET
Apr 16 '15 at 18:22




1




1




I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
– BruceET
Apr 16 '15 at 19:24






I think you need to close this question out and start a new one in which you show an example of actual data (or similar to actual data). Then state your doubts about assumptions and ask for advice on analysis. This is like going to a doctor with a stomach ache and your own diagnosis. Then quibbling about the medication.
– BruceET
Apr 16 '15 at 19:24




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1235594%2fkruskal-wallis-effect-size%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei