Gradient of Kronecker Product Function
$begingroup$
Suppose I have a matrix A and vectors c,b. Then how can I compute this expression:
$$
nabla_c b^T(A otimes c)b,
$$
assuming the multiplication is compatible of course?
I've found this article but im not sure how reliable it is.
calculus real-analysis linear-algebra matrix-calculus kronecker-product
$endgroup$
add a comment |
$begingroup$
Suppose I have a matrix A and vectors c,b. Then how can I compute this expression:
$$
nabla_c b^T(A otimes c)b,
$$
assuming the multiplication is compatible of course?
I've found this article but im not sure how reliable it is.
calculus real-analysis linear-algebra matrix-calculus kronecker-product
$endgroup$
add a comment |
$begingroup$
Suppose I have a matrix A and vectors c,b. Then how can I compute this expression:
$$
nabla_c b^T(A otimes c)b,
$$
assuming the multiplication is compatible of course?
I've found this article but im not sure how reliable it is.
calculus real-analysis linear-algebra matrix-calculus kronecker-product
$endgroup$
Suppose I have a matrix A and vectors c,b. Then how can I compute this expression:
$$
nabla_c b^T(A otimes c)b,
$$
assuming the multiplication is compatible of course?
I've found this article but im not sure how reliable it is.
calculus real-analysis linear-algebra matrix-calculus kronecker-product
calculus real-analysis linear-algebra matrix-calculus kronecker-product
edited Aug 11 '16 at 0:56
AIM_BLB
asked Aug 10 '16 at 17:42
AIM_BLBAIM_BLB
2,4012718
2,4012718
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Write the function using the Frobenius (:) Inner Product
$$eqalign{
f &= b^T(Aotimes c)b cr
&= (Aotimes c):bb^T cr
}$$
At this point, we need to factor the $bb^T$ matrix
$$eqalign{
bb^T &= sum_{k=1}^r Z_kotimes Y_k cr
}$$
where the $Z_k$ matrices have the same shape as $A$, and $Y_k$ the same shape as $c$.
Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code).
Substitute the factorization, then calculate the differential and gradient
$$eqalign{
f &= (Aotimes c) : sum_{k=1}^r Z_kotimes Y_k cr
&= sum_{k=1}^r (Z_k:A) (Y_k:c) crcr
df &= sum_{k=1}^r (Z_k:A),Y_k :dc crcr
frac{partial f}{partial c} &= sum_{k=1}^r (A:Z_k),Y_k cr
}$$
$endgroup$
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
add a comment |
$begingroup$
@ CSA , why do you want to calculate the gradient, a simple, but complicated to write, tensor, while the calculation of the derivative is so easy and is equally effective? (the knowledge of the derivative is equivalent to the knowledge of the gradient).
Consider theorem 3.1 in your reference paper: let $f:Ain M_{m,n}rightarrow A^TA$. The derivative is the simple linear application $Df_A:Hin M_{m,n}rightarrow H^TA+A^TH$; from the previous result, we can derive the gradient of $f$: $nabla(f)(A)=Ibigotimes A^T+(A^Tbigotimes I)T$ where $T$ is the permutation $Hrightarrow H^T$, that is, why make it simple when you can make it complicated.
In the same way, consider theorem 4.1 in same reference: let $g:Ain M_{m,n}rightarrow Abigotimes B$; since $g$ is linear, its derivative is $Dg_A:Hin M_{m,n}rightarrow Hbigotimes B$. After $2$ pages of calculation, the gradient is presented in a very complicated form; where is the interest ?
Here $p:cin mathbb{R}^nrightarrow b^T(Abigotimes c)b$ is linear and its derivative is $hin mathbb{R}^nrightarrow b^T(Abigotimes h)b$, formula of a biblical simplicity.
$endgroup$
add a comment |
$begingroup$
The given formula only makes dimensional sense if the "matrix" is actually a row vector,
i.e. $$A=a^T$$ in which case the function of interest is the scalar
$$phi = b^T(a^Totimes c)b = b^T(ca^T)b = (ba^Tb)^Tc$$
whose gradient is simply
$$frac{partialphi}{partial c} = ba^Tb$$
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1888410%2fgradient-of-kronecker-product-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Write the function using the Frobenius (:) Inner Product
$$eqalign{
f &= b^T(Aotimes c)b cr
&= (Aotimes c):bb^T cr
}$$
At this point, we need to factor the $bb^T$ matrix
$$eqalign{
bb^T &= sum_{k=1}^r Z_kotimes Y_k cr
}$$
where the $Z_k$ matrices have the same shape as $A$, and $Y_k$ the same shape as $c$.
Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code).
Substitute the factorization, then calculate the differential and gradient
$$eqalign{
f &= (Aotimes c) : sum_{k=1}^r Z_kotimes Y_k cr
&= sum_{k=1}^r (Z_k:A) (Y_k:c) crcr
df &= sum_{k=1}^r (Z_k:A),Y_k :dc crcr
frac{partial f}{partial c} &= sum_{k=1}^r (A:Z_k),Y_k cr
}$$
$endgroup$
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
add a comment |
$begingroup$
Write the function using the Frobenius (:) Inner Product
$$eqalign{
f &= b^T(Aotimes c)b cr
&= (Aotimes c):bb^T cr
}$$
At this point, we need to factor the $bb^T$ matrix
$$eqalign{
bb^T &= sum_{k=1}^r Z_kotimes Y_k cr
}$$
where the $Z_k$ matrices have the same shape as $A$, and $Y_k$ the same shape as $c$.
Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code).
Substitute the factorization, then calculate the differential and gradient
$$eqalign{
f &= (Aotimes c) : sum_{k=1}^r Z_kotimes Y_k cr
&= sum_{k=1}^r (Z_k:A) (Y_k:c) crcr
df &= sum_{k=1}^r (Z_k:A),Y_k :dc crcr
frac{partial f}{partial c} &= sum_{k=1}^r (A:Z_k),Y_k cr
}$$
$endgroup$
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
add a comment |
$begingroup$
Write the function using the Frobenius (:) Inner Product
$$eqalign{
f &= b^T(Aotimes c)b cr
&= (Aotimes c):bb^T cr
}$$
At this point, we need to factor the $bb^T$ matrix
$$eqalign{
bb^T &= sum_{k=1}^r Z_kotimes Y_k cr
}$$
where the $Z_k$ matrices have the same shape as $A$, and $Y_k$ the same shape as $c$.
Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code).
Substitute the factorization, then calculate the differential and gradient
$$eqalign{
f &= (Aotimes c) : sum_{k=1}^r Z_kotimes Y_k cr
&= sum_{k=1}^r (Z_k:A) (Y_k:c) crcr
df &= sum_{k=1}^r (Z_k:A),Y_k :dc crcr
frac{partial f}{partial c} &= sum_{k=1}^r (A:Z_k),Y_k cr
}$$
$endgroup$
Write the function using the Frobenius (:) Inner Product
$$eqalign{
f &= b^T(Aotimes c)b cr
&= (Aotimes c):bb^T cr
}$$
At this point, we need to factor the $bb^T$ matrix
$$eqalign{
bb^T &= sum_{k=1}^r Z_kotimes Y_k cr
}$$
where the $Z_k$ matrices have the same shape as $A$, and $Y_k$ the same shape as $c$.
Look for the classic paper "Approximation with Kronecker Products" by van Loan and Pitsianis, or Pitsianis' 1997 dissertation (which contains Matlab code).
Substitute the factorization, then calculate the differential and gradient
$$eqalign{
f &= (Aotimes c) : sum_{k=1}^r Z_kotimes Y_k cr
&= sum_{k=1}^r (Z_k:A) (Y_k:c) crcr
df &= sum_{k=1}^r (Z_k:A),Y_k :dc crcr
frac{partial f}{partial c} &= sum_{k=1}^r (A:Z_k),Y_k cr
}$$
edited Aug 10 '16 at 22:19
answered Aug 10 '16 at 21:59
lynnlynn
1,766177
1,766177
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
add a comment |
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
I'm looking through these papers and I can't seem to find an explicit description of the $Z_k$ and $Y_k$s, what are that?
$endgroup$
– AIM_BLB
Aug 11 '16 at 0:53
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
See p.34 of Pitsianis' thesis cs.drexel.edu/~jjohnson/2007-08/fall/cs680/papers/…
$endgroup$
– hans
Aug 11 '16 at 2:04
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
thanks, but this only works if the dimensions are not prime...
$endgroup$
– AIM_BLB
Aug 11 '16 at 10:47
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
$begingroup$
The restriction on the dimensions ensures that a factorization of the form $M=Botimes C$ can be found. But in this problem we already know that the dimensions of the Kronecker factors must correspond to $Aotimes c$.
$endgroup$
– hans
Aug 11 '16 at 13:37
add a comment |
$begingroup$
@ CSA , why do you want to calculate the gradient, a simple, but complicated to write, tensor, while the calculation of the derivative is so easy and is equally effective? (the knowledge of the derivative is equivalent to the knowledge of the gradient).
Consider theorem 3.1 in your reference paper: let $f:Ain M_{m,n}rightarrow A^TA$. The derivative is the simple linear application $Df_A:Hin M_{m,n}rightarrow H^TA+A^TH$; from the previous result, we can derive the gradient of $f$: $nabla(f)(A)=Ibigotimes A^T+(A^Tbigotimes I)T$ where $T$ is the permutation $Hrightarrow H^T$, that is, why make it simple when you can make it complicated.
In the same way, consider theorem 4.1 in same reference: let $g:Ain M_{m,n}rightarrow Abigotimes B$; since $g$ is linear, its derivative is $Dg_A:Hin M_{m,n}rightarrow Hbigotimes B$. After $2$ pages of calculation, the gradient is presented in a very complicated form; where is the interest ?
Here $p:cin mathbb{R}^nrightarrow b^T(Abigotimes c)b$ is linear and its derivative is $hin mathbb{R}^nrightarrow b^T(Abigotimes h)b$, formula of a biblical simplicity.
$endgroup$
add a comment |
$begingroup$
@ CSA , why do you want to calculate the gradient, a simple, but complicated to write, tensor, while the calculation of the derivative is so easy and is equally effective? (the knowledge of the derivative is equivalent to the knowledge of the gradient).
Consider theorem 3.1 in your reference paper: let $f:Ain M_{m,n}rightarrow A^TA$. The derivative is the simple linear application $Df_A:Hin M_{m,n}rightarrow H^TA+A^TH$; from the previous result, we can derive the gradient of $f$: $nabla(f)(A)=Ibigotimes A^T+(A^Tbigotimes I)T$ where $T$ is the permutation $Hrightarrow H^T$, that is, why make it simple when you can make it complicated.
In the same way, consider theorem 4.1 in same reference: let $g:Ain M_{m,n}rightarrow Abigotimes B$; since $g$ is linear, its derivative is $Dg_A:Hin M_{m,n}rightarrow Hbigotimes B$. After $2$ pages of calculation, the gradient is presented in a very complicated form; where is the interest ?
Here $p:cin mathbb{R}^nrightarrow b^T(Abigotimes c)b$ is linear and its derivative is $hin mathbb{R}^nrightarrow b^T(Abigotimes h)b$, formula of a biblical simplicity.
$endgroup$
add a comment |
$begingroup$
@ CSA , why do you want to calculate the gradient, a simple, but complicated to write, tensor, while the calculation of the derivative is so easy and is equally effective? (the knowledge of the derivative is equivalent to the knowledge of the gradient).
Consider theorem 3.1 in your reference paper: let $f:Ain M_{m,n}rightarrow A^TA$. The derivative is the simple linear application $Df_A:Hin M_{m,n}rightarrow H^TA+A^TH$; from the previous result, we can derive the gradient of $f$: $nabla(f)(A)=Ibigotimes A^T+(A^Tbigotimes I)T$ where $T$ is the permutation $Hrightarrow H^T$, that is, why make it simple when you can make it complicated.
In the same way, consider theorem 4.1 in same reference: let $g:Ain M_{m,n}rightarrow Abigotimes B$; since $g$ is linear, its derivative is $Dg_A:Hin M_{m,n}rightarrow Hbigotimes B$. After $2$ pages of calculation, the gradient is presented in a very complicated form; where is the interest ?
Here $p:cin mathbb{R}^nrightarrow b^T(Abigotimes c)b$ is linear and its derivative is $hin mathbb{R}^nrightarrow b^T(Abigotimes h)b$, formula of a biblical simplicity.
$endgroup$
@ CSA , why do you want to calculate the gradient, a simple, but complicated to write, tensor, while the calculation of the derivative is so easy and is equally effective? (the knowledge of the derivative is equivalent to the knowledge of the gradient).
Consider theorem 3.1 in your reference paper: let $f:Ain M_{m,n}rightarrow A^TA$. The derivative is the simple linear application $Df_A:Hin M_{m,n}rightarrow H^TA+A^TH$; from the previous result, we can derive the gradient of $f$: $nabla(f)(A)=Ibigotimes A^T+(A^Tbigotimes I)T$ where $T$ is the permutation $Hrightarrow H^T$, that is, why make it simple when you can make it complicated.
In the same way, consider theorem 4.1 in same reference: let $g:Ain M_{m,n}rightarrow Abigotimes B$; since $g$ is linear, its derivative is $Dg_A:Hin M_{m,n}rightarrow Hbigotimes B$. After $2$ pages of calculation, the gradient is presented in a very complicated form; where is the interest ?
Here $p:cin mathbb{R}^nrightarrow b^T(Abigotimes c)b$ is linear and its derivative is $hin mathbb{R}^nrightarrow b^T(Abigotimes h)b$, formula of a biblical simplicity.
edited Aug 12 '16 at 11:13
answered Aug 12 '16 at 11:07
loup blancloup blanc
22.6k21850
22.6k21850
add a comment |
add a comment |
$begingroup$
The given formula only makes dimensional sense if the "matrix" is actually a row vector,
i.e. $$A=a^T$$ in which case the function of interest is the scalar
$$phi = b^T(a^Totimes c)b = b^T(ca^T)b = (ba^Tb)^Tc$$
whose gradient is simply
$$frac{partialphi}{partial c} = ba^Tb$$
$endgroup$
add a comment |
$begingroup$
The given formula only makes dimensional sense if the "matrix" is actually a row vector,
i.e. $$A=a^T$$ in which case the function of interest is the scalar
$$phi = b^T(a^Totimes c)b = b^T(ca^T)b = (ba^Tb)^Tc$$
whose gradient is simply
$$frac{partialphi}{partial c} = ba^Tb$$
$endgroup$
add a comment |
$begingroup$
The given formula only makes dimensional sense if the "matrix" is actually a row vector,
i.e. $$A=a^T$$ in which case the function of interest is the scalar
$$phi = b^T(a^Totimes c)b = b^T(ca^T)b = (ba^Tb)^Tc$$
whose gradient is simply
$$frac{partialphi}{partial c} = ba^Tb$$
$endgroup$
The given formula only makes dimensional sense if the "matrix" is actually a row vector,
i.e. $$A=a^T$$ in which case the function of interest is the scalar
$$phi = b^T(a^Totimes c)b = b^T(ca^T)b = (ba^Tb)^Tc$$
whose gradient is simply
$$frac{partialphi}{partial c} = ba^Tb$$
edited Dec 2 '18 at 21:43
answered Dec 2 '18 at 21:37
greggreg
7,7001821
7,7001821
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1888410%2fgradient-of-kronecker-product-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown