Values in Softmax Derivative

I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:

$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j

and $-p_j*p_i$ else.

What I don't get is: What exactly are i and j and how do I get them from my input-vector?

Could anyone please explain this?

asked Jan 1 at 15:26

Yama994

add a comment |

I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:

$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j

and $-p_j*p_i$ else.

What I don't get is: What exactly are i and j and how do I get them from my input-vector?

Could anyone please explain this?

asked Jan 1 at 15:26

Yama994

add a comment |

I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:

$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j

and $-p_j*p_i$ else.

What I don't get is: What exactly are i and j and how do I get them from my input-vector?

Could anyone please explain this?

asked Jan 1 at 15:26

Yama994

I am trying to correctly understand the derivative of the softmax-function so that I can implement it correctly. I already know that the derived formula looks like tbis:

$frac{delta p_i}{delta a_j} = p_i*(1-p_j)$ if i=j

and $-p_j*p_i$ else.

What I don't get is: What exactly are i and j and how do I get them from my input-vector?

Could anyone please explain this?

linear-algebra derivatives neural-networks

asked Jan 1 at 15:26

Yama994

asked Jan 1 at 15:26

Yama994

asked Jan 1 at 15:26

Yama994

asked Jan 1 at 15:26

Yama994

asked Jan 1 at 15:26

Yama994

add a comment |

1 Answer
1

active

oldest

votes

The Softmax function maps an $n$-dimensional ($n ge 2$) vector of reals, $mathbf{z}$,
$$bbox{ z_i in mathbb{R} , quad i = 1 .. n }$$
to another $n$-dimensional real vector $mathbf{p}$ with all components within $0$ and $1$,
$$bbox{ 0 le p_i le 1, quad i = 1 .. n }$$
and the sum of components 1,
$$bbox{ sum_{k=1}^n p_k = 1 }$$
The Softmax function itself is defined component-wise,
$$bbox{ p_i = frac{e^{z_i}}{sum_{k=1}^n e^{z_k}}, quad i = 1 .. n }$$
Its derivative turns out to be simple. The partial derivative of the $i$'th component of $mathbf{p}$, with respect to the $j$'th dimension, is
$$bbox{ frac{partial p_i}{partial z_j} = begin{cases}
p_i ( 1 - p_i ), & i = j \
- p_i p_j, & i ne j \
end{cases} }$$
In other words, $i$ and $j$ specify dimensions, and identify the components in the vector. In numerical computation, they are essentially indexes to the vector.

It may be useful to look at the Jacobian matrix of the Softmax function on $mathbf{z}$. Each row corresponds to a component of the derivative, $mathbf{p}$, and each column corresponds to a dimension of $mathbf{z}$ the partial derivative is taken with respect to. In other words,
$$bbox{mathbf{J}_{i j} = frac{ partial p_i }{partial z_i}, quad i, j = 1 .. n }$$
$$bbox{mathbf{J} = left [ begin{matrix}
p_1 ( 1 - p_1 ) & - p_1 p_2 & dots & - p_1 p_{n-1} & -p_1 p_n \
-p_2 p_1 & p_2 ( 1 - p_2 ) & dots & -p_2 p_{n-1} & -p_2 p_n \
vdots & vdots & ddots & vdots & vdots \
-p_{n-1} p_1 & -p_{n-1} p_2 & dots & p_{n-1} ( 1 - p_{n-1} & -p_{n-1} p_n \
-p_n p_1 & -p_n p_2 & dots & -p_n p_{n-1} & p_n ( 1 - p_n ) \
end{matrix} right ]}$$
Because $p_i in mathbb{R}$, $mathbf{J}$ is symmetric,
$$bbox{ mathbf{J}_{i j} = mathbf{J}_{j i}, quad i , j = 1 .. n }$$

Programmers are often more comfortable with pseudocode examples:

Function Softmax(z, n):

    Let  p  be an array of n reals

    Let  d = 0.0



    # Calculate exponents; unscaled components

    For i = 1 to n:

        p[i] = Exp(z[i])

        d = d + p[i]

    End For



    # Normalize components

    For i = 1 to n:

        p[i] = p[i] / d

    End For



    Return p

End Function

We can calculate the entire Jacobian matrix of $mathbf{z}$ using

Function Jacobian_matrix_of_Softmax(z, n):

    Let p = Softmax(z)

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,

Function Jacobian_matrix_of_Softmaxed(p, n):

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

Individual partial derivatives are calculated using

Function Partial_of_Softmaxed(p, i, j):

    If i == j:

        Return p[i] * (1.0 - p[i])

    Else:

        Return -p[i] * p[j] 

    End If

End Function

but note using the matrix symmetry, we can calculate the diagonal values separately, and off-diagonal values (to the left, and directly above, each diagonal value) in a separate subloop, we can do the calculations much more efficiently.

Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use

Function Gradient_of_Softmaxed(p, n):

    Let  g be an array of n reals

    For i = 1 to n:

        g[i] = p[i] * (1.0 - p[i])

    End For

    Return g

End Function



Function Gradient_of_Softmax(z, n):

    Let g = Softmax(z, n)

    For i = 1 to n:

        g[i] = g[i] * (1.0 - g[i])

    End For

    Return g

End Function

instead of Jacobian_matrix_of_Softmaxed(). Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.

answered Jan 1 at 18:24

Nominal Animal

7,0632617

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3058580%2fvalues-in-softmax-derivative%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Programmers are often more comfortable with pseudocode examples:

Function Softmax(z, n):

    Let  p  be an array of n reals

    Let  d = 0.0



    # Calculate exponents; unscaled components

    For i = 1 to n:

        p[i] = Exp(z[i])

        d = d + p[i]

    End For



    # Normalize components

    For i = 1 to n:

        p[i] = p[i] / d

    End For



    Return p

End Function

We can calculate the entire Jacobian matrix of $mathbf{z}$ using

Function Jacobian_matrix_of_Softmax(z, n):

    Let p = Softmax(z)

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,

Function Jacobian_matrix_of_Softmaxed(p, n):

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

Individual partial derivatives are calculated using

Function Partial_of_Softmaxed(p, i, j):

    If i == j:

        Return p[i] * (1.0 - p[i])

    Else:

        Return -p[i] * p[j] 

    End If

End Function

Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use

Function Gradient_of_Softmaxed(p, n):

    Let  g be an array of n reals

    For i = 1 to n:

        g[i] = p[i] * (1.0 - p[i])

    End For

    Return g

End Function



Function Gradient_of_Softmax(z, n):

    Let g = Softmax(z, n)

    For i = 1 to n:

        g[i] = g[i] * (1.0 - g[i])

    End For

    Return g

End Function

instead of Jacobian_matrix_of_Softmaxed(). Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.

answered Jan 1 at 18:24

Nominal Animal

7,0632617

add a comment |

Programmers are often more comfortable with pseudocode examples:

Function Softmax(z, n):

    Let  p  be an array of n reals

    Let  d = 0.0



    # Calculate exponents; unscaled components

    For i = 1 to n:

        p[i] = Exp(z[i])

        d = d + p[i]

    End For



    # Normalize components

    For i = 1 to n:

        p[i] = p[i] / d

    End For



    Return p

End Function

We can calculate the entire Jacobian matrix of $mathbf{z}$ using

Function Jacobian_matrix_of_Softmax(z, n):

    Let p = Softmax(z)

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,

Function Jacobian_matrix_of_Softmaxed(p, n):

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

Individual partial derivatives are calculated using

Function Partial_of_Softmaxed(p, i, j):

    If i == j:

        Return p[i] * (1.0 - p[i])

    Else:

        Return -p[i] * p[j] 

    End If

End Function

Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use

Function Gradient_of_Softmaxed(p, n):

    Let  g be an array of n reals

    For i = 1 to n:

        g[i] = p[i] * (1.0 - p[i])

    End For

    Return g

End Function



Function Gradient_of_Softmax(z, n):

    Let g = Softmax(z, n)

    For i = 1 to n:

        g[i] = g[i] * (1.0 - g[i])

    End For

    Return g

End Function

instead of Jacobian_matrix_of_Softmaxed(). Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.

answered Jan 1 at 18:24

Nominal Animal

7,0632617

add a comment |

Programmers are often more comfortable with pseudocode examples:

Function Softmax(z, n):

    Let  p  be an array of n reals

    Let  d = 0.0



    # Calculate exponents; unscaled components

    For i = 1 to n:

        p[i] = Exp(z[i])

        d = d + p[i]

    End For



    # Normalize components

    For i = 1 to n:

        p[i] = p[i] / d

    End For



    Return p

End Function

We can calculate the entire Jacobian matrix of $mathbf{z}$ using

Function Jacobian_matrix_of_Softmax(z, n):

    Let p = Softmax(z)

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,

Function Jacobian_matrix_of_Softmaxed(p, n):

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

Individual partial derivatives are calculated using

Function Partial_of_Softmaxed(p, i, j):

    If i == j:

        Return p[i] * (1.0 - p[i])

    Else:

        Return -p[i] * p[j] 

    End If

End Function

Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use

Function Gradient_of_Softmaxed(p, n):

    Let  g be an array of n reals

    For i = 1 to n:

        g[i] = p[i] * (1.0 - p[i])

    End For

    Return g

End Function



Function Gradient_of_Softmax(z, n):

    Let g = Softmax(z, n)

    For i = 1 to n:

        g[i] = g[i] * (1.0 - g[i])

    End For

    Return g

End Function

instead of Jacobian_matrix_of_Softmaxed(). Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.

answered Jan 1 at 18:24

Nominal Animal

7,0632617

Programmers are often more comfortable with pseudocode examples:

Function Softmax(z, n):

    Let  p  be an array of n reals

    Let  d = 0.0



    # Calculate exponents; unscaled components

    For i = 1 to n:

        p[i] = Exp(z[i])

        d = d + p[i]

    End For



    # Normalize components

    For i = 1 to n:

        p[i] = p[i] / d

    End For



    Return p

End Function

We can calculate the entire Jacobian matrix of $mathbf{z}$ using

Function Jacobian_matrix_of_Softmax(z, n):

    Let p = Softmax(z)

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

or, using $mathbf{p} = text{Softmax}(mathbf{z})$ if that is already calculated,

Function Jacobian_matrix_of_Softmaxed(p, n):

    Let J = n by n matrix of reals

    For i = 1 to n:

        J[i][i] = p[i] * (1.0 - p[i])

        For k = 1 to i-1:

            J[k][i] = -p[i]*p[k]

            J[i][k] = J[k][i] 

        End For

    End For

    Discard p

    Return J

End Function

Individual partial derivatives are calculated using

Function Partial_of_Softmaxed(p, i, j):

    If i == j:

        Return p[i] * (1.0 - p[i])

    Else:

        Return -p[i] * p[j] 

    End If

End Function

Whether calculating the Jacobian matrix is useful or not, depends on what you need the partial derivatives for. If you only need the gradient of $mathbf{p}$ ($nablamathbf{p}$), use

Function Gradient_of_Softmaxed(p, n):

    Let  g be an array of n reals

    For i = 1 to n:

        g[i] = p[i] * (1.0 - p[i])

    End For

    Return g

End Function



Function Gradient_of_Softmax(z, n):

    Let g = Softmax(z, n)

    For i = 1 to n:

        g[i] = g[i] * (1.0 - g[i])

    End For

    Return g

End Function

instead of Jacobian_matrix_of_Softmaxed(). Note how the latter version, if you don't need $text{Softmax}(mathbf{z})$, can reuse the storage for the gradient.

answered Jan 1 at 18:24

Nominal Animal

7,0632617

answered Jan 1 at 18:24

Nominal Animal

7,0632617

answered Jan 1 at 18:24

Nominal Animal

7,0632617

answered Jan 1 at 18:24

Nominal Animal

7,0632617

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

BtTuX5IqN3lGZ7L51HQd6,UOT ckDVl9N,8Z xabeNaZ 4rSyRUspfzjuQkVTV06f46,LL,vOgYPlLj2v4ysWkbHpMwa82

搜尋此網誌

Krdytkyu