How to test if data follows a distribution?












0












$begingroup$


Have been given some data and the question says to determine if the data follows any distribution. It says to compare the observed data vs expected graphically and to test further. The distributions we've studied so far are normal, binomial and poisson so I assume it will be one of these. We've used these techniques so far: 1 and 2 sample t-tests, chi-squared tests of association and goodness of fit, one way and two way anova, confidence intervals, estimating abundance techniques, linear regression amongst a few others.



The data is as follows, 100 cows hooves are swabbed and checked for a bacteria. The results are as follows. The bacteria survives on the hooves for weeks, and on the grass for days.



No. of hooves that test positive per animal - Frequency
[0 - 25]
[1 - 5]
[2 - 15]
[3 - 30]
[4 - 25]



What I have done so far, I've calculated the normal, poisson and binomial distribution probabilities and the expected frequencies from them. But, I believe that the Binomial distribtuon is the right distribution as the number is limited (maximum of 4 hooves). I places the observed v expected in a bar chart to graphically compare and they do not appear to match.



I think my next steps should be carrying out a chi squared test of fitness but after that, I don't know what more to do or if I need to. Do I need to calculate a Confidence intervals, and what do I use it for? Is there anything else you would recommend?



Thanks in advance!










share|cite|improve this question









$endgroup$












  • $begingroup$
    This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
    $endgroup$
    – BruceET
    May 3 '15 at 7:04
















0












$begingroup$


Have been given some data and the question says to determine if the data follows any distribution. It says to compare the observed data vs expected graphically and to test further. The distributions we've studied so far are normal, binomial and poisson so I assume it will be one of these. We've used these techniques so far: 1 and 2 sample t-tests, chi-squared tests of association and goodness of fit, one way and two way anova, confidence intervals, estimating abundance techniques, linear regression amongst a few others.



The data is as follows, 100 cows hooves are swabbed and checked for a bacteria. The results are as follows. The bacteria survives on the hooves for weeks, and on the grass for days.



No. of hooves that test positive per animal - Frequency
[0 - 25]
[1 - 5]
[2 - 15]
[3 - 30]
[4 - 25]



What I have done so far, I've calculated the normal, poisson and binomial distribution probabilities and the expected frequencies from them. But, I believe that the Binomial distribtuon is the right distribution as the number is limited (maximum of 4 hooves). I places the observed v expected in a bar chart to graphically compare and they do not appear to match.



I think my next steps should be carrying out a chi squared test of fitness but after that, I don't know what more to do or if I need to. Do I need to calculate a Confidence intervals, and what do I use it for? Is there anything else you would recommend?



Thanks in advance!










share|cite|improve this question









$endgroup$












  • $begingroup$
    This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
    $endgroup$
    – BruceET
    May 3 '15 at 7:04














0












0








0





$begingroup$


Have been given some data and the question says to determine if the data follows any distribution. It says to compare the observed data vs expected graphically and to test further. The distributions we've studied so far are normal, binomial and poisson so I assume it will be one of these. We've used these techniques so far: 1 and 2 sample t-tests, chi-squared tests of association and goodness of fit, one way and two way anova, confidence intervals, estimating abundance techniques, linear regression amongst a few others.



The data is as follows, 100 cows hooves are swabbed and checked for a bacteria. The results are as follows. The bacteria survives on the hooves for weeks, and on the grass for days.



No. of hooves that test positive per animal - Frequency
[0 - 25]
[1 - 5]
[2 - 15]
[3 - 30]
[4 - 25]



What I have done so far, I've calculated the normal, poisson and binomial distribution probabilities and the expected frequencies from them. But, I believe that the Binomial distribtuon is the right distribution as the number is limited (maximum of 4 hooves). I places the observed v expected in a bar chart to graphically compare and they do not appear to match.



I think my next steps should be carrying out a chi squared test of fitness but after that, I don't know what more to do or if I need to. Do I need to calculate a Confidence intervals, and what do I use it for? Is there anything else you would recommend?



Thanks in advance!










share|cite|improve this question









$endgroup$




Have been given some data and the question says to determine if the data follows any distribution. It says to compare the observed data vs expected graphically and to test further. The distributions we've studied so far are normal, binomial and poisson so I assume it will be one of these. We've used these techniques so far: 1 and 2 sample t-tests, chi-squared tests of association and goodness of fit, one way and two way anova, confidence intervals, estimating abundance techniques, linear regression amongst a few others.



The data is as follows, 100 cows hooves are swabbed and checked for a bacteria. The results are as follows. The bacteria survives on the hooves for weeks, and on the grass for days.



No. of hooves that test positive per animal - Frequency
[0 - 25]
[1 - 5]
[2 - 15]
[3 - 30]
[4 - 25]



What I have done so far, I've calculated the normal, poisson and binomial distribution probabilities and the expected frequencies from them. But, I believe that the Binomial distribtuon is the right distribution as the number is limited (maximum of 4 hooves). I places the observed v expected in a bar chart to graphically compare and they do not appear to match.



I think my next steps should be carrying out a chi squared test of fitness but after that, I don't know what more to do or if I need to. Do I need to calculate a Confidence intervals, and what do I use it for? Is there anything else you would recommend?



Thanks in advance!







hypothesis-testing poisson-distribution biology binomial-distribution






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Apr 9 '15 at 14:29









ConfusedByStatsConfusedByStats

12




12












  • $begingroup$
    This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
    $endgroup$
    – BruceET
    May 3 '15 at 7:04


















  • $begingroup$
    This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
    $endgroup$
    – BruceET
    May 3 '15 at 7:04
















$begingroup$
This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
$endgroup$
– BruceET
May 3 '15 at 7:04




$begingroup$
This problem has gone unanswered for a while, and it is possible that the original proposer has moved on to other interests. But I found it to be an interesting and simple version of a kind of model that occurs frequently in practice.
$endgroup$
– BruceET
May 3 '15 at 7:04










1 Answer
1






active

oldest

votes


















0












$begingroup$

A Mixture Model for Exposed and Unexposed Animals



Suppose 1/4 of the cows are unexposed to bacteria,
and among the 3/4 of cows that are exposed the number of hooves with
bacteria is Binomial with n = 4 and p = 3/4.
This model gives the probability table of hooves
with bacteria shown in the table below.



This binomial distribution was deduced from
the fact that there are 225 hooves with bacteria out of 75
exposed animals for an average of 3 hooves per animal.
So the binomial mean
must be $mu = 3 = 4p$, whence $p = 3/4.$



Each of the probabilities for 1 through 4 hooves is
3/4 of the probabilities assigned by $Bin(4, 3/4).$
The probability for 0 is .25 plus the the 3/4 of the binomial
probability. (Probabilities are rounded to four places
and slightly 'fudged' in the fourth place so probabilities
add to 1. This method works without complication only because the binomial part of the
model contributes extremely little probability for 0 hooves.)



Expected counts are probabilities multiplied by 100 cows.
Observed counts are the data reported in the problem.



 Hooves       0      1      2      3      4
---------------------------------
Prob .2528 .0351 .1586 .3163 .2372
Exp 25.28 3.51 15.86 31.63 23.72
Obs 25 5 15 30 25


The standard chi-squared goodness-of-fit test (as implemented
in R) gives the output shown below.



 prob=c(.2528, .0351, .1586, .3163, .2372)
obs = c(25, 5, 15, 30, 25)
chisq.test(obs, p=prob)

## Chi-squared test for given probabilities
##
## data: obs
## X-squared = 0.8353, df = 4, p-value = 0.9337


There is a warning message because the expected count in cell 1
is less than 5, putting the approximation of the chi-squared
statistic to the chi-squared distribution in some doubt.
However, an exact test (simulated permutation test) gave
a P-value of 0.9371.



So there is no question that that the observed counts are
consistent with the proposed probability model. (Other distributions
might fit as well, but the question implied we should look
for an answer based on a binomial or Poisson distribution.
The data fit the model almost 'too well', suggesting that
the data might have been contrived to make the solution to the problem easier to find.)






share|cite|improve this answer











$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "69"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1227033%2fhow-to-test-if-data-follows-a-distribution%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0












    $begingroup$

    A Mixture Model for Exposed and Unexposed Animals



    Suppose 1/4 of the cows are unexposed to bacteria,
    and among the 3/4 of cows that are exposed the number of hooves with
    bacteria is Binomial with n = 4 and p = 3/4.
    This model gives the probability table of hooves
    with bacteria shown in the table below.



    This binomial distribution was deduced from
    the fact that there are 225 hooves with bacteria out of 75
    exposed animals for an average of 3 hooves per animal.
    So the binomial mean
    must be $mu = 3 = 4p$, whence $p = 3/4.$



    Each of the probabilities for 1 through 4 hooves is
    3/4 of the probabilities assigned by $Bin(4, 3/4).$
    The probability for 0 is .25 plus the the 3/4 of the binomial
    probability. (Probabilities are rounded to four places
    and slightly 'fudged' in the fourth place so probabilities
    add to 1. This method works without complication only because the binomial part of the
    model contributes extremely little probability for 0 hooves.)



    Expected counts are probabilities multiplied by 100 cows.
    Observed counts are the data reported in the problem.



     Hooves       0      1      2      3      4
    ---------------------------------
    Prob .2528 .0351 .1586 .3163 .2372
    Exp 25.28 3.51 15.86 31.63 23.72
    Obs 25 5 15 30 25


    The standard chi-squared goodness-of-fit test (as implemented
    in R) gives the output shown below.



     prob=c(.2528, .0351, .1586, .3163, .2372)
    obs = c(25, 5, 15, 30, 25)
    chisq.test(obs, p=prob)

    ## Chi-squared test for given probabilities
    ##
    ## data: obs
    ## X-squared = 0.8353, df = 4, p-value = 0.9337


    There is a warning message because the expected count in cell 1
    is less than 5, putting the approximation of the chi-squared
    statistic to the chi-squared distribution in some doubt.
    However, an exact test (simulated permutation test) gave
    a P-value of 0.9371.



    So there is no question that that the observed counts are
    consistent with the proposed probability model. (Other distributions
    might fit as well, but the question implied we should look
    for an answer based on a binomial or Poisson distribution.
    The data fit the model almost 'too well', suggesting that
    the data might have been contrived to make the solution to the problem easier to find.)






    share|cite|improve this answer











    $endgroup$


















      0












      $begingroup$

      A Mixture Model for Exposed and Unexposed Animals



      Suppose 1/4 of the cows are unexposed to bacteria,
      and among the 3/4 of cows that are exposed the number of hooves with
      bacteria is Binomial with n = 4 and p = 3/4.
      This model gives the probability table of hooves
      with bacteria shown in the table below.



      This binomial distribution was deduced from
      the fact that there are 225 hooves with bacteria out of 75
      exposed animals for an average of 3 hooves per animal.
      So the binomial mean
      must be $mu = 3 = 4p$, whence $p = 3/4.$



      Each of the probabilities for 1 through 4 hooves is
      3/4 of the probabilities assigned by $Bin(4, 3/4).$
      The probability for 0 is .25 plus the the 3/4 of the binomial
      probability. (Probabilities are rounded to four places
      and slightly 'fudged' in the fourth place so probabilities
      add to 1. This method works without complication only because the binomial part of the
      model contributes extremely little probability for 0 hooves.)



      Expected counts are probabilities multiplied by 100 cows.
      Observed counts are the data reported in the problem.



       Hooves       0      1      2      3      4
      ---------------------------------
      Prob .2528 .0351 .1586 .3163 .2372
      Exp 25.28 3.51 15.86 31.63 23.72
      Obs 25 5 15 30 25


      The standard chi-squared goodness-of-fit test (as implemented
      in R) gives the output shown below.



       prob=c(.2528, .0351, .1586, .3163, .2372)
      obs = c(25, 5, 15, 30, 25)
      chisq.test(obs, p=prob)

      ## Chi-squared test for given probabilities
      ##
      ## data: obs
      ## X-squared = 0.8353, df = 4, p-value = 0.9337


      There is a warning message because the expected count in cell 1
      is less than 5, putting the approximation of the chi-squared
      statistic to the chi-squared distribution in some doubt.
      However, an exact test (simulated permutation test) gave
      a P-value of 0.9371.



      So there is no question that that the observed counts are
      consistent with the proposed probability model. (Other distributions
      might fit as well, but the question implied we should look
      for an answer based on a binomial or Poisson distribution.
      The data fit the model almost 'too well', suggesting that
      the data might have been contrived to make the solution to the problem easier to find.)






      share|cite|improve this answer











      $endgroup$
















        0












        0








        0





        $begingroup$

        A Mixture Model for Exposed and Unexposed Animals



        Suppose 1/4 of the cows are unexposed to bacteria,
        and among the 3/4 of cows that are exposed the number of hooves with
        bacteria is Binomial with n = 4 and p = 3/4.
        This model gives the probability table of hooves
        with bacteria shown in the table below.



        This binomial distribution was deduced from
        the fact that there are 225 hooves with bacteria out of 75
        exposed animals for an average of 3 hooves per animal.
        So the binomial mean
        must be $mu = 3 = 4p$, whence $p = 3/4.$



        Each of the probabilities for 1 through 4 hooves is
        3/4 of the probabilities assigned by $Bin(4, 3/4).$
        The probability for 0 is .25 plus the the 3/4 of the binomial
        probability. (Probabilities are rounded to four places
        and slightly 'fudged' in the fourth place so probabilities
        add to 1. This method works without complication only because the binomial part of the
        model contributes extremely little probability for 0 hooves.)



        Expected counts are probabilities multiplied by 100 cows.
        Observed counts are the data reported in the problem.



         Hooves       0      1      2      3      4
        ---------------------------------
        Prob .2528 .0351 .1586 .3163 .2372
        Exp 25.28 3.51 15.86 31.63 23.72
        Obs 25 5 15 30 25


        The standard chi-squared goodness-of-fit test (as implemented
        in R) gives the output shown below.



         prob=c(.2528, .0351, .1586, .3163, .2372)
        obs = c(25, 5, 15, 30, 25)
        chisq.test(obs, p=prob)

        ## Chi-squared test for given probabilities
        ##
        ## data: obs
        ## X-squared = 0.8353, df = 4, p-value = 0.9337


        There is a warning message because the expected count in cell 1
        is less than 5, putting the approximation of the chi-squared
        statistic to the chi-squared distribution in some doubt.
        However, an exact test (simulated permutation test) gave
        a P-value of 0.9371.



        So there is no question that that the observed counts are
        consistent with the proposed probability model. (Other distributions
        might fit as well, but the question implied we should look
        for an answer based on a binomial or Poisson distribution.
        The data fit the model almost 'too well', suggesting that
        the data might have been contrived to make the solution to the problem easier to find.)






        share|cite|improve this answer











        $endgroup$



        A Mixture Model for Exposed and Unexposed Animals



        Suppose 1/4 of the cows are unexposed to bacteria,
        and among the 3/4 of cows that are exposed the number of hooves with
        bacteria is Binomial with n = 4 and p = 3/4.
        This model gives the probability table of hooves
        with bacteria shown in the table below.



        This binomial distribution was deduced from
        the fact that there are 225 hooves with bacteria out of 75
        exposed animals for an average of 3 hooves per animal.
        So the binomial mean
        must be $mu = 3 = 4p$, whence $p = 3/4.$



        Each of the probabilities for 1 through 4 hooves is
        3/4 of the probabilities assigned by $Bin(4, 3/4).$
        The probability for 0 is .25 plus the the 3/4 of the binomial
        probability. (Probabilities are rounded to four places
        and slightly 'fudged' in the fourth place so probabilities
        add to 1. This method works without complication only because the binomial part of the
        model contributes extremely little probability for 0 hooves.)



        Expected counts are probabilities multiplied by 100 cows.
        Observed counts are the data reported in the problem.



         Hooves       0      1      2      3      4
        ---------------------------------
        Prob .2528 .0351 .1586 .3163 .2372
        Exp 25.28 3.51 15.86 31.63 23.72
        Obs 25 5 15 30 25


        The standard chi-squared goodness-of-fit test (as implemented
        in R) gives the output shown below.



         prob=c(.2528, .0351, .1586, .3163, .2372)
        obs = c(25, 5, 15, 30, 25)
        chisq.test(obs, p=prob)

        ## Chi-squared test for given probabilities
        ##
        ## data: obs
        ## X-squared = 0.8353, df = 4, p-value = 0.9337


        There is a warning message because the expected count in cell 1
        is less than 5, putting the approximation of the chi-squared
        statistic to the chi-squared distribution in some doubt.
        However, an exact test (simulated permutation test) gave
        a P-value of 0.9371.



        So there is no question that that the observed counts are
        consistent with the proposed probability model. (Other distributions
        might fit as well, but the question implied we should look
        for an answer based on a binomial or Poisson distribution.
        The data fit the model almost 'too well', suggesting that
        the data might have been contrived to make the solution to the problem easier to find.)







        share|cite|improve this answer














        share|cite|improve this answer



        share|cite|improve this answer








        edited May 3 '15 at 7:31

























        answered May 3 '15 at 7:01









        BruceETBruceET

        35.7k71440




        35.7k71440






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Mathematics Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1227033%2fhow-to-test-if-data-follows-a-distribution%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Quarter-circle Tiles

            build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

            Mont Emei