Values in Softmax Derivative
$begingroup$
I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:
$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j
and $-p_j*p_i$ else.
What I don't get is: What exactly are i and j and how do I get them from my input-vector?
Could anyone please explain this?
linear-algebra derivatives neural-networks
$endgroup$
add a comment |
$begingroup$
I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:
$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j
and $-p_j*p_i$ else.
What I don't get is: What exactly are i and j and how do I get them from my input-vector?
Could anyone please explain this?
linear-algebra derivatives neural-networks
$endgroup$
add a comment |
$begingroup$
I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:
$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j
and $-p_j*p_i$ else.
What I don't get is: What exactly are i and j and how do I get them from my input-vector?
Could anyone please explain this?
linear-algebra derivatives neural-networks
$endgroup$
I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:
$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j
and $-p_j*p_i$ else.
What I don't get is: What exactly are i and j and how do I get them from my input-vector?
Could anyone please explain this?
linear-algebra derivatives neural-networks
linear-algebra derivatives neural-networks
asked Jan 1 at 15:26
Yama994Yama994
1
1
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.
It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$
Programmers are often more comfortable with pseudocode examples:
Function Softmax(z, n):
Let p be an array of n reals
Let d = 0.0
# Calculate exponents; unscaled components
For i = 1 to n:
p[i] = Exp(z[i])
d = d + p[i]
End For
# Normalize components
For i = 1 to n:
p[i] = p[i] / d
End For
Return p
End Function
We can calculate the entire Jacobian matrix of $mathbf{z}$ using
Function Jacobian_matrix_of_Softmax(z, n):
Let p = Softmax(z)
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,
Function Jacobian_matrix_of_Softmaxed(p, n):
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
Individual partial derivatives are calculated using
Function Partial_of_Softmaxed(p, i, j):
If i == j:
Return p[i] * (1.0 - p[i])
Else:
Return -p[i] * p[j]
End If
End Function
but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.
Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use
Function Gradient_of_Softmaxed(p, n):
Let g be an array of n reals
For i = 1 to n:
g[i] = p[i] * (1.0 - p[i])
End For
Return g
End Function
Function Gradient_of_Softmax(z, n):
Let g = Softmax(z, n)
For i = 1 to n:
g[i] = g[i] * (1.0 - g[i])
End For
Return g
End Function
instead of Jacobian_matrix_of_Softmaxed()
. Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3058580%2fvalues-in-softmax-derivative%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.
It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$
Programmers are often more comfortable with pseudocode examples:
Function Softmax(z, n):
Let p be an array of n reals
Let d = 0.0
# Calculate exponents; unscaled components
For i = 1 to n:
p[i] = Exp(z[i])
d = d + p[i]
End For
# Normalize components
For i = 1 to n:
p[i] = p[i] / d
End For
Return p
End Function
We can calculate the entire Jacobian matrix of $mathbf{z}$ using
Function Jacobian_matrix_of_Softmax(z, n):
Let p = Softmax(z)
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,
Function Jacobian_matrix_of_Softmaxed(p, n):
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
Individual partial derivatives are calculated using
Function Partial_of_Softmaxed(p, i, j):
If i == j:
Return p[i] * (1.0 - p[i])
Else:
Return -p[i] * p[j]
End If
End Function
but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.
Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use
Function Gradient_of_Softmaxed(p, n):
Let g be an array of n reals
For i = 1 to n:
g[i] = p[i] * (1.0 - p[i])
End For
Return g
End Function
Function Gradient_of_Softmax(z, n):
Let g = Softmax(z, n)
For i = 1 to n:
g[i] = g[i] * (1.0 - g[i])
End For
Return g
End Function
instead of Jacobian_matrix_of_Softmaxed()
. Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.
$endgroup$
add a comment |
$begingroup$
The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.
It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$
Programmers are often more comfortable with pseudocode examples:
Function Softmax(z, n):
Let p be an array of n reals
Let d = 0.0
# Calculate exponents; unscaled components
For i = 1 to n:
p[i] = Exp(z[i])
d = d + p[i]
End For
# Normalize components
For i = 1 to n:
p[i] = p[i] / d
End For
Return p
End Function
We can calculate the entire Jacobian matrix of $mathbf{z}$ using
Function Jacobian_matrix_of_Softmax(z, n):
Let p = Softmax(z)
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,
Function Jacobian_matrix_of_Softmaxed(p, n):
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
Individual partial derivatives are calculated using
Function Partial_of_Softmaxed(p, i, j):
If i == j:
Return p[i] * (1.0 - p[i])
Else:
Return -p[i] * p[j]
End If
End Function
but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.
Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use
Function Gradient_of_Softmaxed(p, n):
Let g be an array of n reals
For i = 1 to n:
g[i] = p[i] * (1.0 - p[i])
End For
Return g
End Function
Function Gradient_of_Softmax(z, n):
Let g = Softmax(z, n)
For i = 1 to n:
g[i] = g[i] * (1.0 - g[i])
End For
Return g
End Function
instead of Jacobian_matrix_of_Softmaxed()
. Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.
$endgroup$
add a comment |
$begingroup$
The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.
It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$
Programmers are often more comfortable with pseudocode examples:
Function Softmax(z, n):
Let p be an array of n reals
Let d = 0.0
# Calculate exponents; unscaled components
For i = 1 to n:
p[i] = Exp(z[i])
d = d + p[i]
End For
# Normalize components
For i = 1 to n:
p[i] = p[i] / d
End For
Return p
End Function
We can calculate the entire Jacobian matrix of $mathbf{z}$ using
Function Jacobian_matrix_of_Softmax(z, n):
Let p = Softmax(z)
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,
Function Jacobian_matrix_of_Softmaxed(p, n):
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
Individual partial derivatives are calculated using
Function Partial_of_Softmaxed(p, i, j):
If i == j:
Return p[i] * (1.0 - p[i])
Else:
Return -p[i] * p[j]
End If
End Function
but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.
Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use
Function Gradient_of_Softmaxed(p, n):
Let g be an array of n reals
For i = 1 to n:
g[i] = p[i] * (1.0 - p[i])
End For
Return g
End Function
Function Gradient_of_Softmax(z, n):
Let g = Softmax(z, n)
For i = 1 to n:
g[i] = g[i] * (1.0 - g[i])
End For
Return g
End Function
instead of Jacobian_matrix_of_Softmaxed()
. Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.
$endgroup$
The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.
It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$
Programmers are often more comfortable with pseudocode examples:
Function Softmax(z, n):
Let p be an array of n reals
Let d = 0.0
# Calculate exponents; unscaled components
For i = 1 to n:
p[i] = Exp(z[i])
d = d + p[i]
End For
# Normalize components
For i = 1 to n:
p[i] = p[i] / d
End For
Return p
End Function
We can calculate the entire Jacobian matrix of $mathbf{z}$ using
Function Jacobian_matrix_of_Softmax(z, n):
Let p = Softmax(z)
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,
Function Jacobian_matrix_of_Softmaxed(p, n):
Let J = n by n matrix of reals
For i = 1 to n:
J[i][i] = p[i] * (1.0 - p[i])
For k = 1 to i-1:
J[k][i] = -p[i]*p[k]
J[i][k] = J[k][i]
End For
End For
Discard p
Return J
End Function
Individual partial derivatives are calculated using
Function Partial_of_Softmaxed(p, i, j):
If i == j:
Return p[i] * (1.0 - p[i])
Else:
Return -p[i] * p[j]
End If
End Function
but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.
Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use
Function Gradient_of_Softmaxed(p, n):
Let g be an array of n reals
For i = 1 to n:
g[i] = p[i] * (1.0 - p[i])
End For
Return g
End Function
Function Gradient_of_Softmax(z, n):
Let g = Softmax(z, n)
For i = 1 to n:
g[i] = g[i] * (1.0 - g[i])
End For
Return g
End Function
instead of Jacobian_matrix_of_Softmaxed()
. Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.
answered Jan 1 at 18:24
Nominal AnimalNominal Animal
7,0632617
7,0632617
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3058580%2fvalues-in-softmax-derivative%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown