Calculating Probabilities for Substitution Ciphers using Frequency Analysis
$begingroup$
I have been trying to put together a tool that can take in cipher text encrypted via a simple substitution cipher and calculate the most likely "key" (that is, how the plain text letters were mapped to the cipher text letters).
I think I've had some success with an approach I've been using, but I suspect it could be a lot better.
What I've been doing to this point:
For a given mystery letter (that is, that which appeared in the cipher text), I calculated that it actually decrypts to 'a', given the frequency that this mystery letter appears in the cipher text. I did that by using Bayes formula to render it as the likelihood that 'a' appears in the cipher text the number of times that the mystery letter does (knowing usual frequency of 'a' in English). In other words, if we denote $X$ as the mystery letter and $F_X$ denotes the frequency that $X$ appears, then:
$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$
$P(F_X mid X = a)$ can be easily calculated by using the binomial distribution mass function knowing the general frequency that 'a' appears in the English language. Without any other information, I assume that $P(X = a)$ equals $1 over 26$. I neglected to divide by the bottom for reasons which I'll explain in a moment.
Then I did the same for every other letter; that is $P(X = b mid F_X), ldots, P(X = z mid F_X)$. I summed up all of what I had calculated for this mystery letter, and then divided each one by the sum, to complete the calculation for each possibility.
Then I did the same for every other mystery letter. So, what I ended up with was a 26-by-26 matrix of probabilities that a mystery letter corresponded to a plaintext letter. It then becomes an assignment problem of selecting 26 elements, exactly one from each column and one from each row and trying to maximize probability for a particular key.
What I don't like about this (beyond that I'm generally inexperienced and uneasy on this topic), is that I feel that I could definitely do better. Each probability calculated for a mystery letter only takes into account the frequency of that particular mystery letter, and not the frequency of other mystery letters. I've thought about trying to do this as a Bayesian Network, but I feel unsure if that is valid, and I'm having a hard time getting my mind around it.
Thanks for any help and insight!
probability cryptography bayesian
$endgroup$
add a comment |
$begingroup$
I have been trying to put together a tool that can take in cipher text encrypted via a simple substitution cipher and calculate the most likely "key" (that is, how the plain text letters were mapped to the cipher text letters).
I think I've had some success with an approach I've been using, but I suspect it could be a lot better.
What I've been doing to this point:
For a given mystery letter (that is, that which appeared in the cipher text), I calculated that it actually decrypts to 'a', given the frequency that this mystery letter appears in the cipher text. I did that by using Bayes formula to render it as the likelihood that 'a' appears in the cipher text the number of times that the mystery letter does (knowing usual frequency of 'a' in English). In other words, if we denote $X$ as the mystery letter and $F_X$ denotes the frequency that $X$ appears, then:
$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$
$P(F_X mid X = a)$ can be easily calculated by using the binomial distribution mass function knowing the general frequency that 'a' appears in the English language. Without any other information, I assume that $P(X = a)$ equals $1 over 26$. I neglected to divide by the bottom for reasons which I'll explain in a moment.
Then I did the same for every other letter; that is $P(X = b mid F_X), ldots, P(X = z mid F_X)$. I summed up all of what I had calculated for this mystery letter, and then divided each one by the sum, to complete the calculation for each possibility.
Then I did the same for every other mystery letter. So, what I ended up with was a 26-by-26 matrix of probabilities that a mystery letter corresponded to a plaintext letter. It then becomes an assignment problem of selecting 26 elements, exactly one from each column and one from each row and trying to maximize probability for a particular key.
What I don't like about this (beyond that I'm generally inexperienced and uneasy on this topic), is that I feel that I could definitely do better. Each probability calculated for a mystery letter only takes into account the frequency of that particular mystery letter, and not the frequency of other mystery letters. I've thought about trying to do this as a Bayesian Network, but I feel unsure if that is valid, and I'm having a hard time getting my mind around it.
Thanks for any help and insight!
probability cryptography bayesian
$endgroup$
add a comment |
$begingroup$
I have been trying to put together a tool that can take in cipher text encrypted via a simple substitution cipher and calculate the most likely "key" (that is, how the plain text letters were mapped to the cipher text letters).
I think I've had some success with an approach I've been using, but I suspect it could be a lot better.
What I've been doing to this point:
For a given mystery letter (that is, that which appeared in the cipher text), I calculated that it actually decrypts to 'a', given the frequency that this mystery letter appears in the cipher text. I did that by using Bayes formula to render it as the likelihood that 'a' appears in the cipher text the number of times that the mystery letter does (knowing usual frequency of 'a' in English). In other words, if we denote $X$ as the mystery letter and $F_X$ denotes the frequency that $X$ appears, then:
$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$
$P(F_X mid X = a)$ can be easily calculated by using the binomial distribution mass function knowing the general frequency that 'a' appears in the English language. Without any other information, I assume that $P(X = a)$ equals $1 over 26$. I neglected to divide by the bottom for reasons which I'll explain in a moment.
Then I did the same for every other letter; that is $P(X = b mid F_X), ldots, P(X = z mid F_X)$. I summed up all of what I had calculated for this mystery letter, and then divided each one by the sum, to complete the calculation for each possibility.
Then I did the same for every other mystery letter. So, what I ended up with was a 26-by-26 matrix of probabilities that a mystery letter corresponded to a plaintext letter. It then becomes an assignment problem of selecting 26 elements, exactly one from each column and one from each row and trying to maximize probability for a particular key.
What I don't like about this (beyond that I'm generally inexperienced and uneasy on this topic), is that I feel that I could definitely do better. Each probability calculated for a mystery letter only takes into account the frequency of that particular mystery letter, and not the frequency of other mystery letters. I've thought about trying to do this as a Bayesian Network, but I feel unsure if that is valid, and I'm having a hard time getting my mind around it.
Thanks for any help and insight!
probability cryptography bayesian
$endgroup$
I have been trying to put together a tool that can take in cipher text encrypted via a simple substitution cipher and calculate the most likely "key" (that is, how the plain text letters were mapped to the cipher text letters).
I think I've had some success with an approach I've been using, but I suspect it could be a lot better.
What I've been doing to this point:
For a given mystery letter (that is, that which appeared in the cipher text), I calculated that it actually decrypts to 'a', given the frequency that this mystery letter appears in the cipher text. I did that by using Bayes formula to render it as the likelihood that 'a' appears in the cipher text the number of times that the mystery letter does (knowing usual frequency of 'a' in English). In other words, if we denote $X$ as the mystery letter and $F_X$ denotes the frequency that $X$ appears, then:
$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$
$P(F_X mid X = a)$ can be easily calculated by using the binomial distribution mass function knowing the general frequency that 'a' appears in the English language. Without any other information, I assume that $P(X = a)$ equals $1 over 26$. I neglected to divide by the bottom for reasons which I'll explain in a moment.
Then I did the same for every other letter; that is $P(X = b mid F_X), ldots, P(X = z mid F_X)$. I summed up all of what I had calculated for this mystery letter, and then divided each one by the sum, to complete the calculation for each possibility.
Then I did the same for every other mystery letter. So, what I ended up with was a 26-by-26 matrix of probabilities that a mystery letter corresponded to a plaintext letter. It then becomes an assignment problem of selecting 26 elements, exactly one from each column and one from each row and trying to maximize probability for a particular key.
What I don't like about this (beyond that I'm generally inexperienced and uneasy on this topic), is that I feel that I could definitely do better. Each probability calculated for a mystery letter only takes into account the frequency of that particular mystery letter, and not the frequency of other mystery letters. I've thought about trying to do this as a Bayesian Network, but I feel unsure if that is valid, and I'm having a hard time getting my mind around it.
Thanks for any help and insight!
probability cryptography bayesian
probability cryptography bayesian
edited May 5 '13 at 16:56
James Sydow
asked May 5 '13 at 6:44
James SydowJames Sydow
304
304
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f381889%2fcalculating-probabilities-for-substitution-ciphers-using-frequency-analysis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.
$endgroup$
add a comment |
$begingroup$
You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.
$endgroup$
add a comment |
$begingroup$
You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.
$endgroup$
You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.
edited Nov 2 '16 at 12:24
Andreas Caranti
56.4k34395
56.4k34395
answered May 5 '13 at 14:02
user75108
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f381889%2fcalculating-probabilities-for-substitution-ciphers-using-frequency-analysis%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown