calculation of median of grouped data












1












$begingroup$


While calculating the median of grouped data of total frequency $N$, in order to find the median class which value should be taken into consideration to match against cumulative frequency : $frac N2$ or $frac{N+1}{2}$ (it seems both are used)? I think $frac{N+1}{2}$ should be taken since in case of list of values (i.e. ungrouped data), its fractional value indicates that the average of $frac N2 th$ and $(frac N2 + 1) th$ values should give the median.



And then comes the second part of my question -- while calculating the median of grouped data, if the value of $frac{N+1}{2}$ ( or $frac N2$) is a fraction, say 50.5, and there is a cumulative frequency 50, then what should we do? Should we take two median classes, one having cumulative frequency 50 and another coming next to it, and calculate two medians considering each of the median class using the formula: $L + frac {frac N2 - C}{f} times w$ and take their average as the ultimate median? Or do something else? I mean what is the correct procedure in this kind of situation?






EDIT:



So, here is a specific problem regarding the second part of my question-



We have to find out the median score from the following frequency distribution table:



Score                :  0-10    10-20    20-30    30-40    40-50
Number of students : 4 3 5 6 7
Cumulative frequency : 4 7 12 18 25


Here intervals are of type (,] .



Now, $N=25 implies frac N2 = 12.5$, which means that we have to look for the interval which covers 12th item and 13th item. Looking at the cumulative frequencies, we see that the 3rd interval(i.e. 20-30) covers the 12th item,while 4th interval(i.e. 30-40) covers the 13th item. If we are supposed to take both the intervals as median class for the sake of using the formula:
$median=L + frac {frac N2 - C}{f} times w$, then we will end up with two medians. We can take the average of these as the required median, though. I want to the correct procedure here.



Note 1:



I am only concerned with using the above formula and not any other method of finding median of grouped data. There is a variation of the above formula where $frac{N+1}{2}$ is used instead of $frac N2$, the first part of my question refers to this confusion as well.



Note 2:



In the formula,



L = lower boundary of the median class
N = total frequency
C = cumulative frequency of the class preceding the median class
f = frequency of the median class
w = width of the median class i.e. upper boundary - lower boundary


Note 3:



If we consider the interval 20-30 as the median class and use the above formula, then the median will be



$20 + frac{frac{25}{2} - 7}{5} times 10 = 31$



Interestingly, considering the interval 30-40 as the median class, we would get the same median using the above formula. Though, I am not sure if this will be the case for every problem of this type. In that case we can take any of the two interval as the median class.



Note 4:



I don't know whether there is any rule for such kind of situation saying that we have to select that cumulative frequency (and hence the corresponding interval as the median class) which is nearer to the value of $frac N2$, in that case we have to take the interval 20-30 in this example as median class. It will be great and enough if anyone can confirm such a rule.










share|cite|improve this question











$endgroup$

















    1












    $begingroup$


    While calculating the median of grouped data of total frequency $N$, in order to find the median class which value should be taken into consideration to match against cumulative frequency : $frac N2$ or $frac{N+1}{2}$ (it seems both are used)? I think $frac{N+1}{2}$ should be taken since in case of list of values (i.e. ungrouped data), its fractional value indicates that the average of $frac N2 th$ and $(frac N2 + 1) th$ values should give the median.



    And then comes the second part of my question -- while calculating the median of grouped data, if the value of $frac{N+1}{2}$ ( or $frac N2$) is a fraction, say 50.5, and there is a cumulative frequency 50, then what should we do? Should we take two median classes, one having cumulative frequency 50 and another coming next to it, and calculate two medians considering each of the median class using the formula: $L + frac {frac N2 - C}{f} times w$ and take their average as the ultimate median? Or do something else? I mean what is the correct procedure in this kind of situation?






    EDIT:



    So, here is a specific problem regarding the second part of my question-



    We have to find out the median score from the following frequency distribution table:



    Score                :  0-10    10-20    20-30    30-40    40-50
    Number of students : 4 3 5 6 7
    Cumulative frequency : 4 7 12 18 25


    Here intervals are of type (,] .



    Now, $N=25 implies frac N2 = 12.5$, which means that we have to look for the interval which covers 12th item and 13th item. Looking at the cumulative frequencies, we see that the 3rd interval(i.e. 20-30) covers the 12th item,while 4th interval(i.e. 30-40) covers the 13th item. If we are supposed to take both the intervals as median class for the sake of using the formula:
    $median=L + frac {frac N2 - C}{f} times w$, then we will end up with two medians. We can take the average of these as the required median, though. I want to the correct procedure here.



    Note 1:



    I am only concerned with using the above formula and not any other method of finding median of grouped data. There is a variation of the above formula where $frac{N+1}{2}$ is used instead of $frac N2$, the first part of my question refers to this confusion as well.



    Note 2:



    In the formula,



    L = lower boundary of the median class
    N = total frequency
    C = cumulative frequency of the class preceding the median class
    f = frequency of the median class
    w = width of the median class i.e. upper boundary - lower boundary


    Note 3:



    If we consider the interval 20-30 as the median class and use the above formula, then the median will be



    $20 + frac{frac{25}{2} - 7}{5} times 10 = 31$



    Interestingly, considering the interval 30-40 as the median class, we would get the same median using the above formula. Though, I am not sure if this will be the case for every problem of this type. In that case we can take any of the two interval as the median class.



    Note 4:



    I don't know whether there is any rule for such kind of situation saying that we have to select that cumulative frequency (and hence the corresponding interval as the median class) which is nearer to the value of $frac N2$, in that case we have to take the interval 20-30 in this example as median class. It will be great and enough if anyone can confirm such a rule.










    share|cite|improve this question











    $endgroup$















      1












      1








      1


      1



      $begingroup$


      While calculating the median of grouped data of total frequency $N$, in order to find the median class which value should be taken into consideration to match against cumulative frequency : $frac N2$ or $frac{N+1}{2}$ (it seems both are used)? I think $frac{N+1}{2}$ should be taken since in case of list of values (i.e. ungrouped data), its fractional value indicates that the average of $frac N2 th$ and $(frac N2 + 1) th$ values should give the median.



      And then comes the second part of my question -- while calculating the median of grouped data, if the value of $frac{N+1}{2}$ ( or $frac N2$) is a fraction, say 50.5, and there is a cumulative frequency 50, then what should we do? Should we take two median classes, one having cumulative frequency 50 and another coming next to it, and calculate two medians considering each of the median class using the formula: $L + frac {frac N2 - C}{f} times w$ and take their average as the ultimate median? Or do something else? I mean what is the correct procedure in this kind of situation?






      EDIT:



      So, here is a specific problem regarding the second part of my question-



      We have to find out the median score from the following frequency distribution table:



      Score                :  0-10    10-20    20-30    30-40    40-50
      Number of students : 4 3 5 6 7
      Cumulative frequency : 4 7 12 18 25


      Here intervals are of type (,] .



      Now, $N=25 implies frac N2 = 12.5$, which means that we have to look for the interval which covers 12th item and 13th item. Looking at the cumulative frequencies, we see that the 3rd interval(i.e. 20-30) covers the 12th item,while 4th interval(i.e. 30-40) covers the 13th item. If we are supposed to take both the intervals as median class for the sake of using the formula:
      $median=L + frac {frac N2 - C}{f} times w$, then we will end up with two medians. We can take the average of these as the required median, though. I want to the correct procedure here.



      Note 1:



      I am only concerned with using the above formula and not any other method of finding median of grouped data. There is a variation of the above formula where $frac{N+1}{2}$ is used instead of $frac N2$, the first part of my question refers to this confusion as well.



      Note 2:



      In the formula,



      L = lower boundary of the median class
      N = total frequency
      C = cumulative frequency of the class preceding the median class
      f = frequency of the median class
      w = width of the median class i.e. upper boundary - lower boundary


      Note 3:



      If we consider the interval 20-30 as the median class and use the above formula, then the median will be



      $20 + frac{frac{25}{2} - 7}{5} times 10 = 31$



      Interestingly, considering the interval 30-40 as the median class, we would get the same median using the above formula. Though, I am not sure if this will be the case for every problem of this type. In that case we can take any of the two interval as the median class.



      Note 4:



      I don't know whether there is any rule for such kind of situation saying that we have to select that cumulative frequency (and hence the corresponding interval as the median class) which is nearer to the value of $frac N2$, in that case we have to take the interval 20-30 in this example as median class. It will be great and enough if anyone can confirm such a rule.










      share|cite|improve this question











      $endgroup$




      While calculating the median of grouped data of total frequency $N$, in order to find the median class which value should be taken into consideration to match against cumulative frequency : $frac N2$ or $frac{N+1}{2}$ (it seems both are used)? I think $frac{N+1}{2}$ should be taken since in case of list of values (i.e. ungrouped data), its fractional value indicates that the average of $frac N2 th$ and $(frac N2 + 1) th$ values should give the median.



      And then comes the second part of my question -- while calculating the median of grouped data, if the value of $frac{N+1}{2}$ ( or $frac N2$) is a fraction, say 50.5, and there is a cumulative frequency 50, then what should we do? Should we take two median classes, one having cumulative frequency 50 and another coming next to it, and calculate two medians considering each of the median class using the formula: $L + frac {frac N2 - C}{f} times w$ and take their average as the ultimate median? Or do something else? I mean what is the correct procedure in this kind of situation?






      EDIT:



      So, here is a specific problem regarding the second part of my question-



      We have to find out the median score from the following frequency distribution table:



      Score                :  0-10    10-20    20-30    30-40    40-50
      Number of students : 4 3 5 6 7
      Cumulative frequency : 4 7 12 18 25


      Here intervals are of type (,] .



      Now, $N=25 implies frac N2 = 12.5$, which means that we have to look for the interval which covers 12th item and 13th item. Looking at the cumulative frequencies, we see that the 3rd interval(i.e. 20-30) covers the 12th item,while 4th interval(i.e. 30-40) covers the 13th item. If we are supposed to take both the intervals as median class for the sake of using the formula:
      $median=L + frac {frac N2 - C}{f} times w$, then we will end up with two medians. We can take the average of these as the required median, though. I want to the correct procedure here.



      Note 1:



      I am only concerned with using the above formula and not any other method of finding median of grouped data. There is a variation of the above formula where $frac{N+1}{2}$ is used instead of $frac N2$, the first part of my question refers to this confusion as well.



      Note 2:



      In the formula,



      L = lower boundary of the median class
      N = total frequency
      C = cumulative frequency of the class preceding the median class
      f = frequency of the median class
      w = width of the median class i.e. upper boundary - lower boundary


      Note 3:



      If we consider the interval 20-30 as the median class and use the above formula, then the median will be



      $20 + frac{frac{25}{2} - 7}{5} times 10 = 31$



      Interestingly, considering the interval 30-40 as the median class, we would get the same median using the above formula. Though, I am not sure if this will be the case for every problem of this type. In that case we can take any of the two interval as the median class.



      Note 4:



      I don't know whether there is any rule for such kind of situation saying that we have to select that cumulative frequency (and hence the corresponding interval as the median class) which is nearer to the value of $frac N2$, in that case we have to take the interval 20-30 in this example as median class. It will be great and enough if anyone can confirm such a rule.







      statistics data-analysis median






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Nov 10 '17 at 20:54







      Snehasish Karmakar

















      asked Jan 18 '16 at 18:56









      Snehasish KarmakarSnehasish Karmakar

      133115




      133115






















          1 Answer
          1






          active

          oldest

          votes


















          1












          $begingroup$

          Because this is essentially a duplicate, I address a few issues
          that are do not explicitly overlap the related question or answer:



          If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.



          If $N$ is large (really the only case where this method is
          generally successful), there is little difference between $N/2$
          and $(N+1)/2$ in the formula. All references I checked use $N/2$.



          Before computers were widely available, large datasets were
          customarily reduced to categories (classes) and plotted as histograms.
          Then the histograms were used to approximate the mean, variance,
          median, and other descriptive measures. Nowadays, it is best
          just to use a statistical computer package to find exact values
          of all measures.



          One remaining application is to try to re-claim the descriptive
          measures from grouped data or from a histogram published in a
          journal. These are cases in which the original data are no longer
          available.



          This procedure to approximate the sample median from grouped
          data $assumes$ that data are distributed in roughly a uniform
          fashion throughout the median interval. Then it uses interpolation
          to approximate the median. (By contrast, methods to approximate
          the sample mean and sample variance from grouped data one assumes
          that all obseervations are concentrated at their class midpoints.)






          share|cite|improve this answer











          $endgroup$









          • 1




            $begingroup$
            I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:32












          • $begingroup$
            The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:48










          • $begingroup$
            Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
            $endgroup$
            – BruceET
            Jan 19 '16 at 19:00








          • 1




            $begingroup$
            Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
            $endgroup$
            – Snehasish Karmakar
            Jan 20 '16 at 12:09










          • $begingroup$
            Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
            $endgroup$
            – BruceET
            Jan 20 '16 at 17:02











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1617208%2fcalculation-of-median-of-grouped-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          Because this is essentially a duplicate, I address a few issues
          that are do not explicitly overlap the related question or answer:



          If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.



          If $N$ is large (really the only case where this method is
          generally successful), there is little difference between $N/2$
          and $(N+1)/2$ in the formula. All references I checked use $N/2$.



          Before computers were widely available, large datasets were
          customarily reduced to categories (classes) and plotted as histograms.
          Then the histograms were used to approximate the mean, variance,
          median, and other descriptive measures. Nowadays, it is best
          just to use a statistical computer package to find exact values
          of all measures.



          One remaining application is to try to re-claim the descriptive
          measures from grouped data or from a histogram published in a
          journal. These are cases in which the original data are no longer
          available.



          This procedure to approximate the sample median from grouped
          data $assumes$ that data are distributed in roughly a uniform
          fashion throughout the median interval. Then it uses interpolation
          to approximate the median. (By contrast, methods to approximate
          the sample mean and sample variance from grouped data one assumes
          that all obseervations are concentrated at their class midpoints.)






          share|cite|improve this answer











          $endgroup$









          • 1




            $begingroup$
            I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:32












          • $begingroup$
            The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:48










          • $begingroup$
            Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
            $endgroup$
            – BruceET
            Jan 19 '16 at 19:00








          • 1




            $begingroup$
            Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
            $endgroup$
            – Snehasish Karmakar
            Jan 20 '16 at 12:09










          • $begingroup$
            Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
            $endgroup$
            – BruceET
            Jan 20 '16 at 17:02
















          1












          $begingroup$

          Because this is essentially a duplicate, I address a few issues
          that are do not explicitly overlap the related question or answer:



          If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.



          If $N$ is large (really the only case where this method is
          generally successful), there is little difference between $N/2$
          and $(N+1)/2$ in the formula. All references I checked use $N/2$.



          Before computers were widely available, large datasets were
          customarily reduced to categories (classes) and plotted as histograms.
          Then the histograms were used to approximate the mean, variance,
          median, and other descriptive measures. Nowadays, it is best
          just to use a statistical computer package to find exact values
          of all measures.



          One remaining application is to try to re-claim the descriptive
          measures from grouped data or from a histogram published in a
          journal. These are cases in which the original data are no longer
          available.



          This procedure to approximate the sample median from grouped
          data $assumes$ that data are distributed in roughly a uniform
          fashion throughout the median interval. Then it uses interpolation
          to approximate the median. (By contrast, methods to approximate
          the sample mean and sample variance from grouped data one assumes
          that all obseervations are concentrated at their class midpoints.)






          share|cite|improve this answer











          $endgroup$









          • 1




            $begingroup$
            I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:32












          • $begingroup$
            The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:48










          • $begingroup$
            Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
            $endgroup$
            – BruceET
            Jan 19 '16 at 19:00








          • 1




            $begingroup$
            Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
            $endgroup$
            – Snehasish Karmakar
            Jan 20 '16 at 12:09










          • $begingroup$
            Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
            $endgroup$
            – BruceET
            Jan 20 '16 at 17:02














          1












          1








          1





          $begingroup$

          Because this is essentially a duplicate, I address a few issues
          that are do not explicitly overlap the related question or answer:



          If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.



          If $N$ is large (really the only case where this method is
          generally successful), there is little difference between $N/2$
          and $(N+1)/2$ in the formula. All references I checked use $N/2$.



          Before computers were widely available, large datasets were
          customarily reduced to categories (classes) and plotted as histograms.
          Then the histograms were used to approximate the mean, variance,
          median, and other descriptive measures. Nowadays, it is best
          just to use a statistical computer package to find exact values
          of all measures.



          One remaining application is to try to re-claim the descriptive
          measures from grouped data or from a histogram published in a
          journal. These are cases in which the original data are no longer
          available.



          This procedure to approximate the sample median from grouped
          data $assumes$ that data are distributed in roughly a uniform
          fashion throughout the median interval. Then it uses interpolation
          to approximate the median. (By contrast, methods to approximate
          the sample mean and sample variance from grouped data one assumes
          that all obseervations are concentrated at their class midpoints.)






          share|cite|improve this answer











          $endgroup$



          Because this is essentially a duplicate, I address a few issues
          that are do not explicitly overlap the related question or answer:



          If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one.



          If $N$ is large (really the only case where this method is
          generally successful), there is little difference between $N/2$
          and $(N+1)/2$ in the formula. All references I checked use $N/2$.



          Before computers were widely available, large datasets were
          customarily reduced to categories (classes) and plotted as histograms.
          Then the histograms were used to approximate the mean, variance,
          median, and other descriptive measures. Nowadays, it is best
          just to use a statistical computer package to find exact values
          of all measures.



          One remaining application is to try to re-claim the descriptive
          measures from grouped data or from a histogram published in a
          journal. These are cases in which the original data are no longer
          available.



          This procedure to approximate the sample median from grouped
          data $assumes$ that data are distributed in roughly a uniform
          fashion throughout the median interval. Then it uses interpolation
          to approximate the median. (By contrast, methods to approximate
          the sample mean and sample variance from grouped data one assumes
          that all obseervations are concentrated at their class midpoints.)







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Apr 13 '17 at 12:21









          Community

          1




          1










          answered Jan 18 '16 at 21:22









          BruceETBruceET

          35.6k71440




          35.6k71440








          • 1




            $begingroup$
            I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:32












          • $begingroup$
            The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:48










          • $begingroup$
            Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
            $endgroup$
            – BruceET
            Jan 19 '16 at 19:00








          • 1




            $begingroup$
            Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
            $endgroup$
            – Snehasish Karmakar
            Jan 20 '16 at 12:09










          • $begingroup$
            Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
            $endgroup$
            – BruceET
            Jan 20 '16 at 17:02














          • 1




            $begingroup$
            I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:32












          • $begingroup$
            The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
            $endgroup$
            – Snehasish Karmakar
            Jan 19 '16 at 17:48










          • $begingroup$
            Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
            $endgroup$
            – BruceET
            Jan 19 '16 at 19:00








          • 1




            $begingroup$
            Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
            $endgroup$
            – Snehasish Karmakar
            Jan 20 '16 at 12:09










          • $begingroup$
            Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
            $endgroup$
            – BruceET
            Jan 20 '16 at 17:02








          1




          1




          $begingroup$
          I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
          $endgroup$
          – Snehasish Karmakar
          Jan 19 '16 at 17:32






          $begingroup$
          I didn't understand the part : "If a class has cumulative frequency .5, then the median is at the boundary of that class and the next larger one". Also, I was not talking about the histogram method for finding the median class. The method known to me is like this : 1. calculate (N+1)/2 (or N/2), 2. look for the cumulative frequency which is just greater than or equal to that value, and the corresponding class is the median class.
          $endgroup$
          – Snehasish Karmakar
          Jan 19 '16 at 17:32














          $begingroup$
          The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
          $endgroup$
          – Snehasish Karmakar
          Jan 19 '16 at 17:48




          $begingroup$
          The problem arises when the value of N/2 is a fraction like 50.5, then we are supposed to look for a cumulative frequency(c.f.) which covers both 50th term and 51st term. Now, if there is a c.f. of 60 (and the preceding one is 40), then there is no problem. The class corresponding to the c.f 60 is the median class as it covers both 50th term and 51st term. However, there is a problem when the two c.f are 50 and 60, where the c.f. 50 covers the 50th term, but the c.f. 60 covers the 51st term. In that case, we are essentially left with two median classes!
          $endgroup$
          – Snehasish Karmakar
          Jan 19 '16 at 17:48












          $begingroup$
          Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
          $endgroup$
          – BruceET
          Jan 19 '16 at 19:00






          $begingroup$
          Then, as I said, the median is the boundary between classes. In upper or lower class, depending on whether intervals are of style $[,)$ or $(,].$. You are dealing with an $approximate$ rule. Adapt as necessary to get a reasonable result. If still unclear, pls edit $specific$ troublesome example into your question, (not a description); intervals and counts.
          $endgroup$
          – BruceET
          Jan 19 '16 at 19:00






          1




          1




          $begingroup$
          Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
          $endgroup$
          – Snehasish Karmakar
          Jan 20 '16 at 12:09




          $begingroup$
          Yes, it is an approximation rule, and I want to know the standard version of that rule since I am to teach this to others (so I want to be as accurate as possible :), from theoretical point of view), though I know that in practice the different variations of the rule may be accepted. I am going to include a specific problem in the question, as requested.
          $endgroup$
          – Snehasish Karmakar
          Jan 20 '16 at 12:09












          $begingroup$
          Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
          $endgroup$
          – BruceET
          Jan 20 '16 at 17:02




          $begingroup$
          Rule with example in Ott/Longnecker (recent editions) Ch3 Sec 4.
          $endgroup$
          – BruceET
          Jan 20 '16 at 17:02


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1617208%2fcalculation-of-median-of-grouped-data%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Quarter-circle Tiles

          build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

          Mont Emei