Apply doubly stochastic matrix M to a probability vector, then entropy increases?

up vote
1
down vote

favorite

Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.

Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.

$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$

Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).

Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?

Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

add a comment |

up vote
1
down vote

favorite

Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.

Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.

$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$

Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).

Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

add a comment |

up vote
1
down vote

favorite

Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.

Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.

$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$

Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).

Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.

Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.

$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$

Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).

Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?

co.combinatorics pr.probability it.information-theory entropy stochastic-matrices

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

asked Dec 1 at 17:44

Alexander Chervov

11k1260139

add a comment |

4 Answers
4

active

oldest

votes

up vote
3
down vote

accepted

This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.

Now, use the fact that $f=-H$ is convex.

For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).

Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

add a comment |

up vote
1
down vote

This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)

Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.

answered Dec 1 at 18:39

Puck Rombach

1036

add a comment |

up vote
1
down vote

This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$

answered Dec 1 at 18:55

R W

10.1k21946

add a comment |

up vote
1
down vote

The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.

The proof (Question 2) is quite simple:

Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.

Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.

For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.

answered Dec 1 at 19:25

Algernon

9591712

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "504"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
3
down vote

accepted

Now, use the fact that $f=-H$ is convex.

For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

add a comment |

up vote
3
down vote

accepted

Now, use the fact that $f=-H$ is convex.

For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

add a comment |

up vote
3
down vote

accepted

Now, use the fact that $f=-H$ is convex.

For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

Now, use the fact that $f=-H$ is convex.

For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

edited Dec 2 at 7:29

answered Dec 1 at 19:10

Denis Serre

29k791195

answered Dec 1 at 19:10

Denis Serre

29k791195

answered Dec 1 at 19:10

Denis Serre

29k791195

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

add a comment |

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32

@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23

Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04

@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27

Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04

add a comment |

up vote
1
down vote

This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)

answered Dec 1 at 18:39

Puck Rombach

1036

add a comment |

up vote
1
down vote

This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)

answered Dec 1 at 18:39

Puck Rombach

1036

add a comment |

up vote
1
down vote

This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)

answered Dec 1 at 18:39

Puck Rombach

1036

This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)

answered Dec 1 at 18:39

Puck Rombach

1036

answered Dec 1 at 18:39

Puck Rombach

1036

answered Dec 1 at 18:39

Puck Rombach

1036

answered Dec 1 at 18:39

Puck Rombach

1036

add a comment |

up vote
1
down vote

answered Dec 1 at 18:55

R W

10.1k21946

add a comment |

up vote
1
down vote

answered Dec 1 at 18:55

R W

10.1k21946

add a comment |

up vote
1
down vote

answered Dec 1 at 18:55

R W

10.1k21946

answered Dec 1 at 18:55

R W

10.1k21946

answered Dec 1 at 18:55

R W

10.1k21946

answered Dec 1 at 18:55

R W

10.1k21946

answered Dec 1 at 18:55

R W

10.1k21946

add a comment |

up vote
1
down vote

The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.

The proof (Question 2) is quite simple:

answered Dec 1 at 19:25

Algernon

9591712

add a comment |

up vote
1
down vote

The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.

The proof (Question 2) is quite simple:

answered Dec 1 at 19:25

Algernon

9591712

add a comment |

up vote
1
down vote

The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.

The proof (Question 2) is quite simple:

answered Dec 1 at 19:25

Algernon

9591712

The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.

The proof (Question 2) is quite simple:

answered Dec 1 at 19:25

Algernon

9591712

answered Dec 1 at 19:25

Algernon

9591712

answered Dec 1 at 19:25

Algernon

9591712

answered Dec 1 at 19:25

Algernon

9591712

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to MathOverflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

3CiPmzJMQtjBeBHVJrzjafrnhO,A7 LaVuPpj RUIY2FOTuiMHA3fY8gC

搜尋此網誌

Krdytkyu