Needleman-Wunsch Grid Generation in Python

The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.

needlemanwunsch.py

import random

from tabulate import tabulate





class NeedlemanWunsch:

    """

    Class used for generating Needleman-Wunsch matrices.

    """



    def _compute_block(self, result, i, j):

        """

        Given a block (corresponding to a 2 x 2 matrix), calculate the value o-

        f the bottom right corner.

        (Based on the equation:

                               M_{i,j} = max(M_{i-1,j-1} + S_{i,j},

                                             M_{i,j-1} + W,

                                             M_{i-1,j} + W)

         Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)



        Args:

            result : The current matrix that is being computed.

            i : The right most part of the block being computed.

            j : The bottom most part of the block being computed.



        Returns:

            The value for the right bottom corner of a particular block.

        """

        return max(result[i-1][j-1] +

                   self._calc_weight(self._second_seq[i-1],

                                     self._first_seq[j-1]),

                   result[i-1][j] + self.gap,

                   result[i][j-1] + self.gap)



    def _calc_weight(self, first_char, second_char):

        """

        Helper function, given two characters determines (based on the sc-

        oring scheme) what the score for the particular characters can be.



        Args:

            first_char : A character to compare.

            second_char : A character to compare.



        Returns:

            Either self.match or self.mismatch.

        """

        if first_char == second_char:

            return self.match

        else:

            return self.mismatch



    def generate(self, first_seq, second_seq):

        """

        Generates a matrix corresponding to the scores to the Needleman-Wu-

        nsch algorithm.



        Args:

            first_seq : One of the sequences to be compared for similarity.

            second_seq : One of the sequences to be compared for

            similarity.



        Returns:

            A 2D list corresponding to the resulting matrix of the Needlem-

            an-Wunsch algorithm.

        """

        # Internally requies that the first sequence is longer.

        if len(second_seq) > len(first_seq):

            first_seq, second_seq = second_seq, first_seq

        self._first_seq = first_seq

        self._second_seq = second_seq

        # Adjust sequence with "intial space"

        # Initialize the resulting matrix with the initial row.

        result = [list(range(0, -len(first_seq) - 1, -1))]

        # Create initial columns.

        for i in range(-1, -len(second_seq) - 1, -1):

            row = [i]

            row.extend([0]*len(first_seq))

            result.append(row)

        # Sweep through and compute each new cell row-wise.

        for i in range(1, len(result)):

            for j in range(1, len(result[0])):

                result[i][j] = self._compute_block(result, i, j)

        # Format for prettier printing.

        for index, letter in enumerate(second_seq):

            result[index + 1].insert(0, letter)

        result[0].insert(0, ' ')

        result.insert(0, list("  " + first_seq))

        return result



    def __init__(self, match=1, mismatch=-1, gap=-1):

        """

        Initialize the Needleman-Wunsch class so that it provides weights for

        match (default 1), mismatch (default -1), and gap (default -1).

        """

        self.match = match

        self.mismatch = mismatch

        self.gap = gap

        self._first_seq = ""

        self._second_seq = ""





def deletion(seq, pos):

    """

    Deletes a random base pair from a sequence at a specified position.



    Args:

        seq : Sequence to perform deletion on.

        pos : Location of deletion.



    Returns:

        seq with character removed at pos.

    """

    return seq[:pos] + seq[pos:]





def base_change(seq, pos):

    """

    Changes a random base pair to another base pair at a specified position.



    Args:

        seq : Sequence to perform base change on.

        pos : Locaion of base change.



    Returns:

        seq with character changed at pos.

    """

    new_base = random.choice("ACTG".replace(seq[pos], ""))

    return seq[:pos] + new_base + seq[pos:]





def mutate(seq, rounds=3):

    """

    Mutates a piece of DNA by randomly applying a deletion or base change



    Args:

        seq : The sequence to be mutated.

        rounds : Defaults to 3, the number of mutations to be made.



    Returns:

        A mutated sequence.

    """

    mutations = (deletion, base_change)

    for _ in range(rounds):

        pos = random.randrange(len(seq))

        seq = random.choice(mutations)(seq, pos)

    return seq





def main():

    """

    Creates a random couple of strings and creates the corresponding Needleman

    -Wunsch matrix associated with them.

    """

    needleman_wunsch = NeedlemanWunsch()

    first_seq = ''.join(random.choices("ACTG", k=5))

    second_seq = mutate(first_seq)

    data = needleman_wunsch.generate(first_seq, second_seq)

    print(tabulate(data, headers="firstrow"))





if __name__ == '__main__':

    main()

I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.

I am not particularly fond of the code. I haven't used numpy or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.

asked 3 mins ago

Dair

4,417729

add a comment |

needlemanwunsch.py

import random

from tabulate import tabulate





class NeedlemanWunsch:

    """

    Class used for generating Needleman-Wunsch matrices.

    """



    def _compute_block(self, result, i, j):

        """

        Given a block (corresponding to a 2 x 2 matrix), calculate the value o-

        f the bottom right corner.

        (Based on the equation:

                               M_{i,j} = max(M_{i-1,j-1} + S_{i,j},

                                             M_{i,j-1} + W,

                                             M_{i-1,j} + W)

         Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)



        Args:

            result : The current matrix that is being computed.

            i : The right most part of the block being computed.

            j : The bottom most part of the block being computed.



        Returns:

            The value for the right bottom corner of a particular block.

        """

        return max(result[i-1][j-1] +

                   self._calc_weight(self._second_seq[i-1],

                                     self._first_seq[j-1]),

                   result[i-1][j] + self.gap,

                   result[i][j-1] + self.gap)



    def _calc_weight(self, first_char, second_char):

        """

        Helper function, given two characters determines (based on the sc-

        oring scheme) what the score for the particular characters can be.



        Args:

            first_char : A character to compare.

            second_char : A character to compare.



        Returns:

            Either self.match or self.mismatch.

        """

        if first_char == second_char:

            return self.match

        else:

            return self.mismatch



    def generate(self, first_seq, second_seq):

        """

        Generates a matrix corresponding to the scores to the Needleman-Wu-

        nsch algorithm.



        Args:

            first_seq : One of the sequences to be compared for similarity.

            second_seq : One of the sequences to be compared for

            similarity.



        Returns:

            A 2D list corresponding to the resulting matrix of the Needlem-

            an-Wunsch algorithm.

        """

        # Internally requies that the first sequence is longer.

        if len(second_seq) > len(first_seq):

            first_seq, second_seq = second_seq, first_seq

        self._first_seq = first_seq

        self._second_seq = second_seq

        # Adjust sequence with "intial space"

        # Initialize the resulting matrix with the initial row.

        result = [list(range(0, -len(first_seq) - 1, -1))]

        # Create initial columns.

        for i in range(-1, -len(second_seq) - 1, -1):

            row = [i]

            row.extend([0]*len(first_seq))

            result.append(row)

        # Sweep through and compute each new cell row-wise.

        for i in range(1, len(result)):

            for j in range(1, len(result[0])):

                result[i][j] = self._compute_block(result, i, j)

        # Format for prettier printing.

        for index, letter in enumerate(second_seq):

            result[index + 1].insert(0, letter)

        result[0].insert(0, ' ')

        result.insert(0, list("  " + first_seq))

        return result



    def __init__(self, match=1, mismatch=-1, gap=-1):

        """

        Initialize the Needleman-Wunsch class so that it provides weights for

        match (default 1), mismatch (default -1), and gap (default -1).

        """

        self.match = match

        self.mismatch = mismatch

        self.gap = gap

        self._first_seq = ""

        self._second_seq = ""





def deletion(seq, pos):

    """

    Deletes a random base pair from a sequence at a specified position.



    Args:

        seq : Sequence to perform deletion on.

        pos : Location of deletion.



    Returns:

        seq with character removed at pos.

    """

    return seq[:pos] + seq[pos:]





def base_change(seq, pos):

    """

    Changes a random base pair to another base pair at a specified position.



    Args:

        seq : Sequence to perform base change on.

        pos : Locaion of base change.



    Returns:

        seq with character changed at pos.

    """

    new_base = random.choice("ACTG".replace(seq[pos], ""))

    return seq[:pos] + new_base + seq[pos:]





def mutate(seq, rounds=3):

    """

    Mutates a piece of DNA by randomly applying a deletion or base change



    Args:

        seq : The sequence to be mutated.

        rounds : Defaults to 3, the number of mutations to be made.



    Returns:

        A mutated sequence.

    """

    mutations = (deletion, base_change)

    for _ in range(rounds):

        pos = random.randrange(len(seq))

        seq = random.choice(mutations)(seq, pos)

    return seq





def main():

    """

    Creates a random couple of strings and creates the corresponding Needleman

    -Wunsch matrix associated with them.

    """

    needleman_wunsch = NeedlemanWunsch()

    first_seq = ''.join(random.choices("ACTG", k=5))

    second_seq = mutate(first_seq)

    data = needleman_wunsch.generate(first_seq, second_seq)

    print(tabulate(data, headers="firstrow"))





if __name__ == '__main__':

    main()

I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.

asked 3 mins ago

Dair

4,417729

add a comment |

needlemanwunsch.py

import random

from tabulate import tabulate





class NeedlemanWunsch:

    """

    Class used for generating Needleman-Wunsch matrices.

    """



    def _compute_block(self, result, i, j):

        """

        Given a block (corresponding to a 2 x 2 matrix), calculate the value o-

        f the bottom right corner.

        (Based on the equation:

                               M_{i,j} = max(M_{i-1,j-1} + S_{i,j},

                                             M_{i,j-1} + W,

                                             M_{i-1,j} + W)

         Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)



        Args:

            result : The current matrix that is being computed.

            i : The right most part of the block being computed.

            j : The bottom most part of the block being computed.



        Returns:

            The value for the right bottom corner of a particular block.

        """

        return max(result[i-1][j-1] +

                   self._calc_weight(self._second_seq[i-1],

                                     self._first_seq[j-1]),

                   result[i-1][j] + self.gap,

                   result[i][j-1] + self.gap)



    def _calc_weight(self, first_char, second_char):

        """

        Helper function, given two characters determines (based on the sc-

        oring scheme) what the score for the particular characters can be.



        Args:

            first_char : A character to compare.

            second_char : A character to compare.



        Returns:

            Either self.match or self.mismatch.

        """

        if first_char == second_char:

            return self.match

        else:

            return self.mismatch



    def generate(self, first_seq, second_seq):

        """

        Generates a matrix corresponding to the scores to the Needleman-Wu-

        nsch algorithm.



        Args:

            first_seq : One of the sequences to be compared for similarity.

            second_seq : One of the sequences to be compared for

            similarity.



        Returns:

            A 2D list corresponding to the resulting matrix of the Needlem-

            an-Wunsch algorithm.

        """

        # Internally requies that the first sequence is longer.

        if len(second_seq) > len(first_seq):

            first_seq, second_seq = second_seq, first_seq

        self._first_seq = first_seq

        self._second_seq = second_seq

        # Adjust sequence with "intial space"

        # Initialize the resulting matrix with the initial row.

        result = [list(range(0, -len(first_seq) - 1, -1))]

        # Create initial columns.

        for i in range(-1, -len(second_seq) - 1, -1):

            row = [i]

            row.extend([0]*len(first_seq))

            result.append(row)

        # Sweep through and compute each new cell row-wise.

        for i in range(1, len(result)):

            for j in range(1, len(result[0])):

                result[i][j] = self._compute_block(result, i, j)

        # Format for prettier printing.

        for index, letter in enumerate(second_seq):

            result[index + 1].insert(0, letter)

        result[0].insert(0, ' ')

        result.insert(0, list("  " + first_seq))

        return result



    def __init__(self, match=1, mismatch=-1, gap=-1):

        """

        Initialize the Needleman-Wunsch class so that it provides weights for

        match (default 1), mismatch (default -1), and gap (default -1).

        """

        self.match = match

        self.mismatch = mismatch

        self.gap = gap

        self._first_seq = ""

        self._second_seq = ""





def deletion(seq, pos):

    """

    Deletes a random base pair from a sequence at a specified position.



    Args:

        seq : Sequence to perform deletion on.

        pos : Location of deletion.



    Returns:

        seq with character removed at pos.

    """

    return seq[:pos] + seq[pos:]





def base_change(seq, pos):

    """

    Changes a random base pair to another base pair at a specified position.



    Args:

        seq : Sequence to perform base change on.

        pos : Locaion of base change.



    Returns:

        seq with character changed at pos.

    """

    new_base = random.choice("ACTG".replace(seq[pos], ""))

    return seq[:pos] + new_base + seq[pos:]





def mutate(seq, rounds=3):

    """

    Mutates a piece of DNA by randomly applying a deletion or base change



    Args:

        seq : The sequence to be mutated.

        rounds : Defaults to 3, the number of mutations to be made.



    Returns:

        A mutated sequence.

    """

    mutations = (deletion, base_change)

    for _ in range(rounds):

        pos = random.randrange(len(seq))

        seq = random.choice(mutations)(seq, pos)

    return seq





def main():

    """

    Creates a random couple of strings and creates the corresponding Needleman

    -Wunsch matrix associated with them.

    """

    needleman_wunsch = NeedlemanWunsch()

    first_seq = ''.join(random.choices("ACTG", k=5))

    second_seq = mutate(first_seq)

    data = needleman_wunsch.generate(first_seq, second_seq)

    print(tabulate(data, headers="firstrow"))





if __name__ == '__main__':

    main()

I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.

asked 3 mins ago

Dair

4,417729

needlemanwunsch.py

import random

from tabulate import tabulate





class NeedlemanWunsch:

    """

    Class used for generating Needleman-Wunsch matrices.

    """



    def _compute_block(self, result, i, j):

        """

        Given a block (corresponding to a 2 x 2 matrix), calculate the value o-

        f the bottom right corner.

        (Based on the equation:

                               M_{i,j} = max(M_{i-1,j-1} + S_{i,j},

                                             M_{i,j-1} + W,

                                             M_{i-1,j} + W)

         Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)



        Args:

            result : The current matrix that is being computed.

            i : The right most part of the block being computed.

            j : The bottom most part of the block being computed.



        Returns:

            The value for the right bottom corner of a particular block.

        """

        return max(result[i-1][j-1] +

                   self._calc_weight(self._second_seq[i-1],

                                     self._first_seq[j-1]),

                   result[i-1][j] + self.gap,

                   result[i][j-1] + self.gap)



    def _calc_weight(self, first_char, second_char):

        """

        Helper function, given two characters determines (based on the sc-

        oring scheme) what the score for the particular characters can be.



        Args:

            first_char : A character to compare.

            second_char : A character to compare.



        Returns:

            Either self.match or self.mismatch.

        """

        if first_char == second_char:

            return self.match

        else:

            return self.mismatch



    def generate(self, first_seq, second_seq):

        """

        Generates a matrix corresponding to the scores to the Needleman-Wu-

        nsch algorithm.



        Args:

            first_seq : One of the sequences to be compared for similarity.

            second_seq : One of the sequences to be compared for

            similarity.



        Returns:

            A 2D list corresponding to the resulting matrix of the Needlem-

            an-Wunsch algorithm.

        """

        # Internally requies that the first sequence is longer.

        if len(second_seq) > len(first_seq):

            first_seq, second_seq = second_seq, first_seq

        self._first_seq = first_seq

        self._second_seq = second_seq

        # Adjust sequence with "intial space"

        # Initialize the resulting matrix with the initial row.

        result = [list(range(0, -len(first_seq) - 1, -1))]

        # Create initial columns.

        for i in range(-1, -len(second_seq) - 1, -1):

            row = [i]

            row.extend([0]*len(first_seq))

            result.append(row)

        # Sweep through and compute each new cell row-wise.

        for i in range(1, len(result)):

            for j in range(1, len(result[0])):

                result[i][j] = self._compute_block(result, i, j)

        # Format for prettier printing.

        for index, letter in enumerate(second_seq):

            result[index + 1].insert(0, letter)

        result[0].insert(0, ' ')

        result.insert(0, list("  " + first_seq))

        return result



    def __init__(self, match=1, mismatch=-1, gap=-1):

        """

        Initialize the Needleman-Wunsch class so that it provides weights for

        match (default 1), mismatch (default -1), and gap (default -1).

        """

        self.match = match

        self.mismatch = mismatch

        self.gap = gap

        self._first_seq = ""

        self._second_seq = ""





def deletion(seq, pos):

    """

    Deletes a random base pair from a sequence at a specified position.



    Args:

        seq : Sequence to perform deletion on.

        pos : Location of deletion.



    Returns:

        seq with character removed at pos.

    """

    return seq[:pos] + seq[pos:]





def base_change(seq, pos):

    """

    Changes a random base pair to another base pair at a specified position.



    Args:

        seq : Sequence to perform base change on.

        pos : Locaion of base change.



    Returns:

        seq with character changed at pos.

    """

    new_base = random.choice("ACTG".replace(seq[pos], ""))

    return seq[:pos] + new_base + seq[pos:]





def mutate(seq, rounds=3):

    """

    Mutates a piece of DNA by randomly applying a deletion or base change



    Args:

        seq : The sequence to be mutated.

        rounds : Defaults to 3, the number of mutations to be made.



    Returns:

        A mutated sequence.

    """

    mutations = (deletion, base_change)

    for _ in range(rounds):

        pos = random.randrange(len(seq))

        seq = random.choice(mutations)(seq, pos)

    return seq





def main():

    """

    Creates a random couple of strings and creates the corresponding Needleman

    -Wunsch matrix associated with them.

    """

    needleman_wunsch = NeedlemanWunsch()

    first_seq = ''.join(random.choices("ACTG", k=5))

    second_seq = mutate(first_seq)

    data = needleman_wunsch.generate(first_seq, second_seq)

    print(tabulate(data, headers="firstrow"))





if __name__ == '__main__':

    main()

I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.

python algorithm python-3.x bioinformatics

asked 3 mins ago

Dair

4,417729

asked 3 mins ago

Dair

4,417729

asked 3 mins ago

Dair

4,417729

asked 3 mins ago

Dair

4,417729

asked 3 mins ago

Dair

4,417729

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210099%2fneedleman-wunsch-grid-generation-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu

Needleman-Wunsch Grid Generation in Python

needlemanwunsch.py

needlemanwunsch.py

needlemanwunsch.py

needlemanwunsch.py

Your Answer

Post as a guest

Post as a guest

Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei