Confusion in Relationship between regression line slope and covariance











up vote
0
down vote

favorite












In simple linear regression model between RVs $(X,Y)$, the slope $hatbeta_1$ is given as



$$
hatbeta_1 = dfrac{sum_i^N(x-overline{x})(y - overline{y})}{sum_i^N(x - overline{x})^2} tag{1}
$$



This is then interpreted quickly in relation to Covariance and Varaince in many text books 1, as



$$
hatbeta_1 = dfrac{Cov(x,y)}{Var(x)} tag{2}
$$



Question:

But couldn't this only be true if we assume uniform distribution of both joint pmf in covariance and pmf in varaince? That is, it is like assuming as below, and cancelling out $dfrac{1}{N}$?



$$
hatbeta_1 = dfrac{dfrac{1}{N}sum_i^N(x-overline{x})(y - overline{y})}{dfrac{1}{N}sum_i^N(x - overline{x})^2} tag{3}
$$



In case both pmfs not uniform,



$$
dfrac{Cov(x,y)}{Var(x)} = dfrac{sumlimits_{x}sumlimits_{y}(x-overline{x})(y - overline{y})p(x,y)}{sumlimits_{x}(x - overline{x})^2p(x)} tag{4}
$$



which is not same as (1), so (2) cant be true, right?










share|cite|improve this question


























    up vote
    0
    down vote

    favorite












    In simple linear regression model between RVs $(X,Y)$, the slope $hatbeta_1$ is given as



    $$
    hatbeta_1 = dfrac{sum_i^N(x-overline{x})(y - overline{y})}{sum_i^N(x - overline{x})^2} tag{1}
    $$



    This is then interpreted quickly in relation to Covariance and Varaince in many text books 1, as



    $$
    hatbeta_1 = dfrac{Cov(x,y)}{Var(x)} tag{2}
    $$



    Question:

    But couldn't this only be true if we assume uniform distribution of both joint pmf in covariance and pmf in varaince? That is, it is like assuming as below, and cancelling out $dfrac{1}{N}$?



    $$
    hatbeta_1 = dfrac{dfrac{1}{N}sum_i^N(x-overline{x})(y - overline{y})}{dfrac{1}{N}sum_i^N(x - overline{x})^2} tag{3}
    $$



    In case both pmfs not uniform,



    $$
    dfrac{Cov(x,y)}{Var(x)} = dfrac{sumlimits_{x}sumlimits_{y}(x-overline{x})(y - overline{y})p(x,y)}{sumlimits_{x}(x - overline{x})^2p(x)} tag{4}
    $$



    which is not same as (1), so (2) cant be true, right?










    share|cite|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      In simple linear regression model between RVs $(X,Y)$, the slope $hatbeta_1$ is given as



      $$
      hatbeta_1 = dfrac{sum_i^N(x-overline{x})(y - overline{y})}{sum_i^N(x - overline{x})^2} tag{1}
      $$



      This is then interpreted quickly in relation to Covariance and Varaince in many text books 1, as



      $$
      hatbeta_1 = dfrac{Cov(x,y)}{Var(x)} tag{2}
      $$



      Question:

      But couldn't this only be true if we assume uniform distribution of both joint pmf in covariance and pmf in varaince? That is, it is like assuming as below, and cancelling out $dfrac{1}{N}$?



      $$
      hatbeta_1 = dfrac{dfrac{1}{N}sum_i^N(x-overline{x})(y - overline{y})}{dfrac{1}{N}sum_i^N(x - overline{x})^2} tag{3}
      $$



      In case both pmfs not uniform,



      $$
      dfrac{Cov(x,y)}{Var(x)} = dfrac{sumlimits_{x}sumlimits_{y}(x-overline{x})(y - overline{y})p(x,y)}{sumlimits_{x}(x - overline{x})^2p(x)} tag{4}
      $$



      which is not same as (1), so (2) cant be true, right?










      share|cite|improve this question













      In simple linear regression model between RVs $(X,Y)$, the slope $hatbeta_1$ is given as



      $$
      hatbeta_1 = dfrac{sum_i^N(x-overline{x})(y - overline{y})}{sum_i^N(x - overline{x})^2} tag{1}
      $$



      This is then interpreted quickly in relation to Covariance and Varaince in many text books 1, as



      $$
      hatbeta_1 = dfrac{Cov(x,y)}{Var(x)} tag{2}
      $$



      Question:

      But couldn't this only be true if we assume uniform distribution of both joint pmf in covariance and pmf in varaince? That is, it is like assuming as below, and cancelling out $dfrac{1}{N}$?



      $$
      hatbeta_1 = dfrac{dfrac{1}{N}sum_i^N(x-overline{x})(y - overline{y})}{dfrac{1}{N}sum_i^N(x - overline{x})^2} tag{3}
      $$



      In case both pmfs not uniform,



      $$
      dfrac{Cov(x,y)}{Var(x)} = dfrac{sumlimits_{x}sumlimits_{y}(x-overline{x})(y - overline{y})p(x,y)}{sumlimits_{x}(x - overline{x})^2p(x)} tag{4}
      $$



      which is not same as (1), so (2) cant be true, right?







      regression covariance variance correlation linear-regression






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Nov 9 at 17:14









      Paari Vendhan

      19817




      19817






















          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote













          The OP's Eq. $(1)$ is the slope of the regression line if we have $N$ pairs $(x_i, y_i)$ of real numbers. and ask what is the "best" straight line that fits these $N$ data points. In general, it is not the slope of the regression line when we have a pair of random variables $(X, Y)$ and ask what is the random variable $hat{Y} = alpha + beta X$ such that $E[(Y-hat{Y})^2]$ is as small as possible. The answer to the latter question is indeed that $beta$ must have value $frac{operatorname{cov}(X,Y)}{operatorname{var}(X)}$ as the OP states in $(2)$ but this result applies to all random variables with finite variances, not just discrete random variables. Indeed, if $X$ and $Y$ are discrete random variables taking on values $x_1, x_2, ldots, x_M$ and $y_1,y_2,ldots, y_N$ respectively, then the covariance $operatorname{cov}(X,Y)$ is given by
          begin{align}operatorname{cov}(X,Y) &= sum_{m=1}^M sum_{n=1}^N P(X=x_m, Y = y_n)(x_m-bar{x})(y_n-bar{y})\&= sum_{m=1}^M sum_{n=1}^N p_{X,Y}(x_m, y_n)(x_m-bar{x})(y_n-bar{y})end{align}
          where $bar{x}$ and $bar{y}$ are the means $E[X]$ and $E[Y]$ respectively and $p_{X,Y}(x_m, y_n)$ is the joint probability mass function (joint pmf) of $(X,Y)$. This is a slightly more general version of the numerator of $(4)$ in the OP's question. As the OP correctly asserts, if $M=N$ and the joint pmf has value $frac 1N$ for exactly $N$ points $(x_i,y_i)$, then it is indeed the case that $operatorname{cov}(X,Y)$ is (proportional to) the numerator of $(1)$ in the OP's question.






          share|cite|improve this answer





















          • thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
            – Paari Vendhan
            Nov 10 at 3:10












          • gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
            – Paari Vendhan
            Nov 13 at 13:47










          • We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
            – Dilip Sarwate
            Nov 13 at 19:55












          • I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
            – Paari Vendhan
            Nov 14 at 5:24












          • I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
            – Paari Vendhan
            Nov 14 at 6:56


















          up vote
          0
          down vote













          I think my confusion stems from failing to differentiate sample correlation coefficient from population correlation coefficient. So I will try to summarize my improved understanding here, instead of in individual comments, and request viewers to correct me.



          In case of Sample Correlation Coefficient:

          Suppose $(X,Y)$ is a given sample set of size $N$. Then, the sample correlation coefficient is given by



          $$
          r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} tag{1}
          $$



          where, $mathrm{cov}(X,Y)$ is again a sample unbiased covariance, $(s_X,s_Y)$ are sample unbiased standard deviations. For given sample set, (also as per MLE), the assumption is samples are uniformly distributed. That is,



          $$
          mathrm{cov}(X,Y) = dfrac{1}{N-1}sum_i(x_i - overline{x})(y_i - overline{y}) tag{2}
          $$



          $$
          s_X = dfrac{1}{N-1}sum_i(x_i - overline{x})^2 \
          s_Y = dfrac{1}{N-1}sum_i(y_i - overline{y})^2 tag{3}
          $$



          Applying equations (2) and (3), in equation (1), we get,



          $$
          r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} = dfrac{mathrm{cov}(X,Y)}{s_X s_Y} tag{4}
          $$



          Applying similarly in Simple regression line slope



          $$
          beta_1 = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sum_i (x_i - overline{x})^2} = dfrac{mathrm{cov(X,Y)}}{s_X^2} tag{5}
          $$



          In case of Population Correlation Coefficient:

          Suppose $(X,Y)$ are two RVs (can be discrete or continuous, for simplicity, here we take discrete) with joint pmf $p(X,Y)$, and marginal pmfs $p(X), p(Y)$, then



          $$
          rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } tag{6}
          $$



          where $mathrm{Cov}(X,Y)$ is population covariance (there is no bias here, as its population itself)., and $(sigma_X, sigma_Y)$ are respective individual population standard deviations of $(X,Y)$ respectively. For given population, with their joint and marginal pmfs,



          $$
          mathrm{Cov}(X,Y) = sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y) tag{7} \
          $$



          $$
          sigma_X^2 = sum_x (x - mu_X)^2p(X) \
          sigma_Y^2 = sum_y (y - mu_Y)^2p(Y) tag{8} \
          $$



          Applying equations (8) and (7) in (6), we get the simplified form



          $$
          rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } = dfrac{mathrm{Cov}(X,Y)}{sigma_Xsigma_Y} tag{9}
          $$



          Applying similarly in linear regression line slope for population,



          $$
          beta_1 = dfrac{mathrm{Cov}(X,Y)}{sigma_X^2}
          $$



          Pending gaps:

          If my above approach is correct, then I have another question on how to prove equation
          (7) and (6) directly, individually without just saying its analogous for sample case?






          share|cite|improve this answer























            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "69"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2991663%2fconfusion-in-relationship-between-regression-line-slope-and-covariance%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            The OP's Eq. $(1)$ is the slope of the regression line if we have $N$ pairs $(x_i, y_i)$ of real numbers. and ask what is the "best" straight line that fits these $N$ data points. In general, it is not the slope of the regression line when we have a pair of random variables $(X, Y)$ and ask what is the random variable $hat{Y} = alpha + beta X$ such that $E[(Y-hat{Y})^2]$ is as small as possible. The answer to the latter question is indeed that $beta$ must have value $frac{operatorname{cov}(X,Y)}{operatorname{var}(X)}$ as the OP states in $(2)$ but this result applies to all random variables with finite variances, not just discrete random variables. Indeed, if $X$ and $Y$ are discrete random variables taking on values $x_1, x_2, ldots, x_M$ and $y_1,y_2,ldots, y_N$ respectively, then the covariance $operatorname{cov}(X,Y)$ is given by
            begin{align}operatorname{cov}(X,Y) &= sum_{m=1}^M sum_{n=1}^N P(X=x_m, Y = y_n)(x_m-bar{x})(y_n-bar{y})\&= sum_{m=1}^M sum_{n=1}^N p_{X,Y}(x_m, y_n)(x_m-bar{x})(y_n-bar{y})end{align}
            where $bar{x}$ and $bar{y}$ are the means $E[X]$ and $E[Y]$ respectively and $p_{X,Y}(x_m, y_n)$ is the joint probability mass function (joint pmf) of $(X,Y)$. This is a slightly more general version of the numerator of $(4)$ in the OP's question. As the OP correctly asserts, if $M=N$ and the joint pmf has value $frac 1N$ for exactly $N$ points $(x_i,y_i)$, then it is indeed the case that $operatorname{cov}(X,Y)$ is (proportional to) the numerator of $(1)$ in the OP's question.






            share|cite|improve this answer





















            • thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
              – Paari Vendhan
              Nov 10 at 3:10












            • gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
              – Paari Vendhan
              Nov 13 at 13:47










            • We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
              – Dilip Sarwate
              Nov 13 at 19:55












            • I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
              – Paari Vendhan
              Nov 14 at 5:24












            • I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
              – Paari Vendhan
              Nov 14 at 6:56















            up vote
            0
            down vote













            The OP's Eq. $(1)$ is the slope of the regression line if we have $N$ pairs $(x_i, y_i)$ of real numbers. and ask what is the "best" straight line that fits these $N$ data points. In general, it is not the slope of the regression line when we have a pair of random variables $(X, Y)$ and ask what is the random variable $hat{Y} = alpha + beta X$ such that $E[(Y-hat{Y})^2]$ is as small as possible. The answer to the latter question is indeed that $beta$ must have value $frac{operatorname{cov}(X,Y)}{operatorname{var}(X)}$ as the OP states in $(2)$ but this result applies to all random variables with finite variances, not just discrete random variables. Indeed, if $X$ and $Y$ are discrete random variables taking on values $x_1, x_2, ldots, x_M$ and $y_1,y_2,ldots, y_N$ respectively, then the covariance $operatorname{cov}(X,Y)$ is given by
            begin{align}operatorname{cov}(X,Y) &= sum_{m=1}^M sum_{n=1}^N P(X=x_m, Y = y_n)(x_m-bar{x})(y_n-bar{y})\&= sum_{m=1}^M sum_{n=1}^N p_{X,Y}(x_m, y_n)(x_m-bar{x})(y_n-bar{y})end{align}
            where $bar{x}$ and $bar{y}$ are the means $E[X]$ and $E[Y]$ respectively and $p_{X,Y}(x_m, y_n)$ is the joint probability mass function (joint pmf) of $(X,Y)$. This is a slightly more general version of the numerator of $(4)$ in the OP's question. As the OP correctly asserts, if $M=N$ and the joint pmf has value $frac 1N$ for exactly $N$ points $(x_i,y_i)$, then it is indeed the case that $operatorname{cov}(X,Y)$ is (proportional to) the numerator of $(1)$ in the OP's question.






            share|cite|improve this answer





















            • thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
              – Paari Vendhan
              Nov 10 at 3:10












            • gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
              – Paari Vendhan
              Nov 13 at 13:47










            • We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
              – Dilip Sarwate
              Nov 13 at 19:55












            • I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
              – Paari Vendhan
              Nov 14 at 5:24












            • I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
              – Paari Vendhan
              Nov 14 at 6:56













            up vote
            0
            down vote










            up vote
            0
            down vote









            The OP's Eq. $(1)$ is the slope of the regression line if we have $N$ pairs $(x_i, y_i)$ of real numbers. and ask what is the "best" straight line that fits these $N$ data points. In general, it is not the slope of the regression line when we have a pair of random variables $(X, Y)$ and ask what is the random variable $hat{Y} = alpha + beta X$ such that $E[(Y-hat{Y})^2]$ is as small as possible. The answer to the latter question is indeed that $beta$ must have value $frac{operatorname{cov}(X,Y)}{operatorname{var}(X)}$ as the OP states in $(2)$ but this result applies to all random variables with finite variances, not just discrete random variables. Indeed, if $X$ and $Y$ are discrete random variables taking on values $x_1, x_2, ldots, x_M$ and $y_1,y_2,ldots, y_N$ respectively, then the covariance $operatorname{cov}(X,Y)$ is given by
            begin{align}operatorname{cov}(X,Y) &= sum_{m=1}^M sum_{n=1}^N P(X=x_m, Y = y_n)(x_m-bar{x})(y_n-bar{y})\&= sum_{m=1}^M sum_{n=1}^N p_{X,Y}(x_m, y_n)(x_m-bar{x})(y_n-bar{y})end{align}
            where $bar{x}$ and $bar{y}$ are the means $E[X]$ and $E[Y]$ respectively and $p_{X,Y}(x_m, y_n)$ is the joint probability mass function (joint pmf) of $(X,Y)$. This is a slightly more general version of the numerator of $(4)$ in the OP's question. As the OP correctly asserts, if $M=N$ and the joint pmf has value $frac 1N$ for exactly $N$ points $(x_i,y_i)$, then it is indeed the case that $operatorname{cov}(X,Y)$ is (proportional to) the numerator of $(1)$ in the OP's question.






            share|cite|improve this answer












            The OP's Eq. $(1)$ is the slope of the regression line if we have $N$ pairs $(x_i, y_i)$ of real numbers. and ask what is the "best" straight line that fits these $N$ data points. In general, it is not the slope of the regression line when we have a pair of random variables $(X, Y)$ and ask what is the random variable $hat{Y} = alpha + beta X$ such that $E[(Y-hat{Y})^2]$ is as small as possible. The answer to the latter question is indeed that $beta$ must have value $frac{operatorname{cov}(X,Y)}{operatorname{var}(X)}$ as the OP states in $(2)$ but this result applies to all random variables with finite variances, not just discrete random variables. Indeed, if $X$ and $Y$ are discrete random variables taking on values $x_1, x_2, ldots, x_M$ and $y_1,y_2,ldots, y_N$ respectively, then the covariance $operatorname{cov}(X,Y)$ is given by
            begin{align}operatorname{cov}(X,Y) &= sum_{m=1}^M sum_{n=1}^N P(X=x_m, Y = y_n)(x_m-bar{x})(y_n-bar{y})\&= sum_{m=1}^M sum_{n=1}^N p_{X,Y}(x_m, y_n)(x_m-bar{x})(y_n-bar{y})end{align}
            where $bar{x}$ and $bar{y}$ are the means $E[X]$ and $E[Y]$ respectively and $p_{X,Y}(x_m, y_n)$ is the joint probability mass function (joint pmf) of $(X,Y)$. This is a slightly more general version of the numerator of $(4)$ in the OP's question. As the OP correctly asserts, if $M=N$ and the joint pmf has value $frac 1N$ for exactly $N$ points $(x_i,y_i)$, then it is indeed the case that $operatorname{cov}(X,Y)$ is (proportional to) the numerator of $(1)$ in the OP's question.







            share|cite|improve this answer












            share|cite|improve this answer



            share|cite|improve this answer










            answered Nov 9 at 19:57









            Dilip Sarwate

            18.9k13076




            18.9k13076












            • thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
              – Paari Vendhan
              Nov 10 at 3:10












            • gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
              – Paari Vendhan
              Nov 13 at 13:47










            • We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
              – Dilip Sarwate
              Nov 13 at 19:55












            • I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
              – Paari Vendhan
              Nov 14 at 5:24












            • I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
              – Paari Vendhan
              Nov 14 at 6:56


















            • thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
              – Paari Vendhan
              Nov 10 at 3:10












            • gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
              – Paari Vendhan
              Nov 13 at 13:47










            • We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
              – Dilip Sarwate
              Nov 13 at 19:55












            • I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
              – Paari Vendhan
              Nov 14 at 5:24












            • I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
              – Paari Vendhan
              Nov 14 at 6:56
















            thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
            – Paari Vendhan
            Nov 10 at 3:10






            thank you sir :) the key answer lies in "random variable $hat{Y}=alpha+beta X$ such that $E[(Y−hat{Y})2]$ is as small as possible", which I am unable to understand. can you kindly elaborate on this with derivation of slope as in (2) or direct me to a good source?
            – Paari Vendhan
            Nov 10 at 3:10














            gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
            – Paari Vendhan
            Nov 13 at 13:47




            gentle reminder, can you kindly clarify if for equation (2) to be true, given the samples we are assuming uniform distribution $p(x,y)=dfrac{1}{N}$, else (4) cannot reduce to (2) right?
            – Paari Vendhan
            Nov 13 at 13:47












            We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
            – Dilip Sarwate
            Nov 13 at 19:55






            We are assuming that the uniform distribution is on the $N$ points $(x_i, y_i)$. This is different from assuming that the uniform distribution is on the individual $x_i$ and the individual $y_i$ which gives rise to many more points etc. If we have three points $(0,0), (0,1), (1,0)$, then those three points are assumed to have equal probability $frac 13$. $X$ and $Y$ take on values in ${0,1}$ but neither is uniformly distributed on ${0,1}$.
            – Dilip Sarwate
            Nov 13 at 19:55














            I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
            – Paari Vendhan
            Nov 14 at 5:24






            I will try to rephrase your last line to understand better sir. So you are saying $p(X,Y)$ of given sample points $(0,0),(0,1),(1,0)$ is assumed to be $dfrac{1}{N}=dfrac{1}{3}$, but neither $p(X)$ or $p(Y)$ is uniformly distributed on ${0,1}$?
            – Paari Vendhan
            Nov 14 at 5:24














            I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
            – Paari Vendhan
            Nov 14 at 6:56




            I have elaborated my current understanding as separate answer. So can you kindly directly check that and correct me?
            – Paari Vendhan
            Nov 14 at 6:56










            up vote
            0
            down vote













            I think my confusion stems from failing to differentiate sample correlation coefficient from population correlation coefficient. So I will try to summarize my improved understanding here, instead of in individual comments, and request viewers to correct me.



            In case of Sample Correlation Coefficient:

            Suppose $(X,Y)$ is a given sample set of size $N$. Then, the sample correlation coefficient is given by



            $$
            r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} tag{1}
            $$



            where, $mathrm{cov}(X,Y)$ is again a sample unbiased covariance, $(s_X,s_Y)$ are sample unbiased standard deviations. For given sample set, (also as per MLE), the assumption is samples are uniformly distributed. That is,



            $$
            mathrm{cov}(X,Y) = dfrac{1}{N-1}sum_i(x_i - overline{x})(y_i - overline{y}) tag{2}
            $$



            $$
            s_X = dfrac{1}{N-1}sum_i(x_i - overline{x})^2 \
            s_Y = dfrac{1}{N-1}sum_i(y_i - overline{y})^2 tag{3}
            $$



            Applying equations (2) and (3), in equation (1), we get,



            $$
            r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} = dfrac{mathrm{cov}(X,Y)}{s_X s_Y} tag{4}
            $$



            Applying similarly in Simple regression line slope



            $$
            beta_1 = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sum_i (x_i - overline{x})^2} = dfrac{mathrm{cov(X,Y)}}{s_X^2} tag{5}
            $$



            In case of Population Correlation Coefficient:

            Suppose $(X,Y)$ are two RVs (can be discrete or continuous, for simplicity, here we take discrete) with joint pmf $p(X,Y)$, and marginal pmfs $p(X), p(Y)$, then



            $$
            rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } tag{6}
            $$



            where $mathrm{Cov}(X,Y)$ is population covariance (there is no bias here, as its population itself)., and $(sigma_X, sigma_Y)$ are respective individual population standard deviations of $(X,Y)$ respectively. For given population, with their joint and marginal pmfs,



            $$
            mathrm{Cov}(X,Y) = sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y) tag{7} \
            $$



            $$
            sigma_X^2 = sum_x (x - mu_X)^2p(X) \
            sigma_Y^2 = sum_y (y - mu_Y)^2p(Y) tag{8} \
            $$



            Applying equations (8) and (7) in (6), we get the simplified form



            $$
            rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } = dfrac{mathrm{Cov}(X,Y)}{sigma_Xsigma_Y} tag{9}
            $$



            Applying similarly in linear regression line slope for population,



            $$
            beta_1 = dfrac{mathrm{Cov}(X,Y)}{sigma_X^2}
            $$



            Pending gaps:

            If my above approach is correct, then I have another question on how to prove equation
            (7) and (6) directly, individually without just saying its analogous for sample case?






            share|cite|improve this answer



























              up vote
              0
              down vote













              I think my confusion stems from failing to differentiate sample correlation coefficient from population correlation coefficient. So I will try to summarize my improved understanding here, instead of in individual comments, and request viewers to correct me.



              In case of Sample Correlation Coefficient:

              Suppose $(X,Y)$ is a given sample set of size $N$. Then, the sample correlation coefficient is given by



              $$
              r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} tag{1}
              $$



              where, $mathrm{cov}(X,Y)$ is again a sample unbiased covariance, $(s_X,s_Y)$ are sample unbiased standard deviations. For given sample set, (also as per MLE), the assumption is samples are uniformly distributed. That is,



              $$
              mathrm{cov}(X,Y) = dfrac{1}{N-1}sum_i(x_i - overline{x})(y_i - overline{y}) tag{2}
              $$



              $$
              s_X = dfrac{1}{N-1}sum_i(x_i - overline{x})^2 \
              s_Y = dfrac{1}{N-1}sum_i(y_i - overline{y})^2 tag{3}
              $$



              Applying equations (2) and (3), in equation (1), we get,



              $$
              r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} = dfrac{mathrm{cov}(X,Y)}{s_X s_Y} tag{4}
              $$



              Applying similarly in Simple regression line slope



              $$
              beta_1 = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sum_i (x_i - overline{x})^2} = dfrac{mathrm{cov(X,Y)}}{s_X^2} tag{5}
              $$



              In case of Population Correlation Coefficient:

              Suppose $(X,Y)$ are two RVs (can be discrete or continuous, for simplicity, here we take discrete) with joint pmf $p(X,Y)$, and marginal pmfs $p(X), p(Y)$, then



              $$
              rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } tag{6}
              $$



              where $mathrm{Cov}(X,Y)$ is population covariance (there is no bias here, as its population itself)., and $(sigma_X, sigma_Y)$ are respective individual population standard deviations of $(X,Y)$ respectively. For given population, with their joint and marginal pmfs,



              $$
              mathrm{Cov}(X,Y) = sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y) tag{7} \
              $$



              $$
              sigma_X^2 = sum_x (x - mu_X)^2p(X) \
              sigma_Y^2 = sum_y (y - mu_Y)^2p(Y) tag{8} \
              $$



              Applying equations (8) and (7) in (6), we get the simplified form



              $$
              rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } = dfrac{mathrm{Cov}(X,Y)}{sigma_Xsigma_Y} tag{9}
              $$



              Applying similarly in linear regression line slope for population,



              $$
              beta_1 = dfrac{mathrm{Cov}(X,Y)}{sigma_X^2}
              $$



              Pending gaps:

              If my above approach is correct, then I have another question on how to prove equation
              (7) and (6) directly, individually without just saying its analogous for sample case?






              share|cite|improve this answer

























                up vote
                0
                down vote










                up vote
                0
                down vote









                I think my confusion stems from failing to differentiate sample correlation coefficient from population correlation coefficient. So I will try to summarize my improved understanding here, instead of in individual comments, and request viewers to correct me.



                In case of Sample Correlation Coefficient:

                Suppose $(X,Y)$ is a given sample set of size $N$. Then, the sample correlation coefficient is given by



                $$
                r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} tag{1}
                $$



                where, $mathrm{cov}(X,Y)$ is again a sample unbiased covariance, $(s_X,s_Y)$ are sample unbiased standard deviations. For given sample set, (also as per MLE), the assumption is samples are uniformly distributed. That is,



                $$
                mathrm{cov}(X,Y) = dfrac{1}{N-1}sum_i(x_i - overline{x})(y_i - overline{y}) tag{2}
                $$



                $$
                s_X = dfrac{1}{N-1}sum_i(x_i - overline{x})^2 \
                s_Y = dfrac{1}{N-1}sum_i(y_i - overline{y})^2 tag{3}
                $$



                Applying equations (2) and (3), in equation (1), we get,



                $$
                r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} = dfrac{mathrm{cov}(X,Y)}{s_X s_Y} tag{4}
                $$



                Applying similarly in Simple regression line slope



                $$
                beta_1 = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sum_i (x_i - overline{x})^2} = dfrac{mathrm{cov(X,Y)}}{s_X^2} tag{5}
                $$



                In case of Population Correlation Coefficient:

                Suppose $(X,Y)$ are two RVs (can be discrete or continuous, for simplicity, here we take discrete) with joint pmf $p(X,Y)$, and marginal pmfs $p(X), p(Y)$, then



                $$
                rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } tag{6}
                $$



                where $mathrm{Cov}(X,Y)$ is population covariance (there is no bias here, as its population itself)., and $(sigma_X, sigma_Y)$ are respective individual population standard deviations of $(X,Y)$ respectively. For given population, with their joint and marginal pmfs,



                $$
                mathrm{Cov}(X,Y) = sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y) tag{7} \
                $$



                $$
                sigma_X^2 = sum_x (x - mu_X)^2p(X) \
                sigma_Y^2 = sum_y (y - mu_Y)^2p(Y) tag{8} \
                $$



                Applying equations (8) and (7) in (6), we get the simplified form



                $$
                rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } = dfrac{mathrm{Cov}(X,Y)}{sigma_Xsigma_Y} tag{9}
                $$



                Applying similarly in linear regression line slope for population,



                $$
                beta_1 = dfrac{mathrm{Cov}(X,Y)}{sigma_X^2}
                $$



                Pending gaps:

                If my above approach is correct, then I have another question on how to prove equation
                (7) and (6) directly, individually without just saying its analogous for sample case?






                share|cite|improve this answer














                I think my confusion stems from failing to differentiate sample correlation coefficient from population correlation coefficient. So I will try to summarize my improved understanding here, instead of in individual comments, and request viewers to correct me.



                In case of Sample Correlation Coefficient:

                Suppose $(X,Y)$ is a given sample set of size $N$. Then, the sample correlation coefficient is given by



                $$
                r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} tag{1}
                $$



                where, $mathrm{cov}(X,Y)$ is again a sample unbiased covariance, $(s_X,s_Y)$ are sample unbiased standard deviations. For given sample set, (also as per MLE), the assumption is samples are uniformly distributed. That is,



                $$
                mathrm{cov}(X,Y) = dfrac{1}{N-1}sum_i(x_i - overline{x})(y_i - overline{y}) tag{2}
                $$



                $$
                s_X = dfrac{1}{N-1}sum_i(x_i - overline{x})^2 \
                s_Y = dfrac{1}{N-1}sum_i(y_i - overline{y})^2 tag{3}
                $$



                Applying equations (2) and (3), in equation (1), we get,



                $$
                r = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sqrt{sum_i(x_i - overline{x})^2 sum_i(y_i - overline{y})^2}} = dfrac{mathrm{cov}(X,Y)}{s_X s_Y} tag{4}
                $$



                Applying similarly in Simple regression line slope



                $$
                beta_1 = dfrac{sum_i(x_i - overline{x})(y_i - overline{y})}{sum_i (x_i - overline{x})^2} = dfrac{mathrm{cov(X,Y)}}{s_X^2} tag{5}
                $$



                In case of Population Correlation Coefficient:

                Suppose $(X,Y)$ are two RVs (can be discrete or continuous, for simplicity, here we take discrete) with joint pmf $p(X,Y)$, and marginal pmfs $p(X), p(Y)$, then



                $$
                rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } tag{6}
                $$



                where $mathrm{Cov}(X,Y)$ is population covariance (there is no bias here, as its population itself)., and $(sigma_X, sigma_Y)$ are respective individual population standard deviations of $(X,Y)$ respectively. For given population, with their joint and marginal pmfs,



                $$
                mathrm{Cov}(X,Y) = sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y) tag{7} \
                $$



                $$
                sigma_X^2 = sum_x (x - mu_X)^2p(X) \
                sigma_Y^2 = sum_y (y - mu_Y)^2p(Y) tag{8} \
                $$



                Applying equations (8) and (7) in (6), we get the simplified form



                $$
                rho = dfrac{sum_x sum_y (x - mu_X)(y - mu_Y)p(X,Y)}{sqrt{sum_x (x - mu_X)^2p(X) sum_y (y - mu_Y)^2p(Y)} } = dfrac{mathrm{Cov}(X,Y)}{sigma_Xsigma_Y} tag{9}
                $$



                Applying similarly in linear regression line slope for population,



                $$
                beta_1 = dfrac{mathrm{Cov}(X,Y)}{sigma_X^2}
                $$



                Pending gaps:

                If my above approach is correct, then I have another question on how to prove equation
                (7) and (6) directly, individually without just saying its analogous for sample case?







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Nov 17 at 8:37

























                answered Nov 14 at 6:54









                Paari Vendhan

                19817




                19817






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2991663%2fconfusion-in-relationship-between-regression-line-slope-and-covariance%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Quarter-circle Tiles

                    build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

                    Mont Emei