Calculating Probabilities for Substitution Ciphers using Frequency Analysis

I have been trying to put together a tool that can take in cipher text encrypted via a simple substitution cipher and calculate the most likely "key" (that is, how the plain text letters were mapped to the cipher text letters).

I think I've had some success with an approach I've been using, but I suspect it could be a lot better.

What I've been doing to this point:

For a given mystery letter (that is, that which appeared in the cipher text), I calculated that it actually decrypts to 'a', given the frequency that this mystery letter appears in the cipher text. I did that by using Bayes formula to render it as the likelihood that 'a' appears in the cipher text the number of times that the mystery letter does (knowing usual frequency of 'a' in English). In other words, if we denote $X$ as the mystery letter and $F_X$ denotes the frequency that $X$ appears, then:

$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$

$P(F_X mid X = a)$ can be easily calculated by using the binomial distribution mass function knowing the general frequency that 'a' appears in the English language. Without any other information, I assume that $P(X = a)$ equals $1 over 26$. I neglected to divide by the bottom for reasons which I'll explain in a moment.

Then I did the same for every other letter; that is $P(X = b mid F_X), ldots, P(X = z mid F_X)$. I summed up all of what I had calculated for this mystery letter, and then divided each one by the sum, to complete the calculation for each possibility.

Then I did the same for every other mystery letter. So, what I ended up with was a 26-by-26 matrix of probabilities that a mystery letter corresponded to a plaintext letter. It then becomes an assignment problem of selecting 26 elements, exactly one from each column and one from each row and trying to maximize probability for a particular key.

What I don't like about this (beyond that I'm generally inexperienced and uneasy on this topic), is that I feel that I could definitely do better. Each probability calculated for a mystery letter only takes into account the frequency of that particular mystery letter, and not the frequency of other mystery letters. I've thought about trying to do this as a Bayesian Network, but I feel unsure if that is valid, and I'm having a hard time getting my mind around it.

Thanks for any help and insight!

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

add a comment |

I think I've had some success with an approach I've been using, but I suspect it could be a lot better.

What I've been doing to this point:

$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$

Thanks for any help and insight!

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

add a comment |

I think I've had some success with an approach I've been using, but I suspect it could be a lot better.

What I've been doing to this point:

$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$

Thanks for any help and insight!

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

I think I've had some success with an approach I've been using, but I suspect it could be a lot better.

What I've been doing to this point:

$$P(X = a mid F_{X} ) = frac{ P(F_X mid X = a) P(X = a) }{ P(F_X) }$$

Thanks for any help and insight!

probability cryptography bayesian

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

edited May 5 '13 at 16:56

asked May 5 '13 at 6:44

James Sydow

304

asked May 5 '13 at 6:44

James Sydow

304

asked May 5 '13 at 6:44

James Sydow

304

add a comment |

1 Answer
1

active

oldest

votes

You might be interested in the paper "The Markov Chain Monte Carlo Revolution" by Diaconis, which you can find online. In this paper, Diaconis shows how Stanford students were able to break a simple substitution cipher by using the Metropolis algorithm, using digraph frequencies. Diaconis says an attempt based on the frequencies of single letters failed.

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f381889%2fcalculating-probabilities-for-substitution-ciphers-using-frequency-analysis%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

add a comment |

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

add a comment |

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

edited Nov 2 '16 at 12:24

Andreas Caranti

56.4k34395

answered May 5 '13 at 14:02

user75108

answered May 5 '13 at 14:02

user75108

answered May 5 '13 at 14:02

user75108

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

flc4vOLUJj2HHRW X4gs 43

搜尋此網誌

Krdytkyu