Needleman-Wunsch Grid Generation in Python












0














The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.



needlemanwunsch.py



import random
from tabulate import tabulate


class NeedlemanWunsch:
"""
Class used for generating Needleman-Wunsch matrices.
"""

def _compute_block(self, result, i, j):
"""
Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
f the bottom right corner.
(Based on the equation:
M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
M_{i,j-1} + W,
M_{i-1,j} + W)
Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)

Args:
result : The current matrix that is being computed.
i : The right most part of the block being computed.
j : The bottom most part of the block being computed.

Returns:
The value for the right bottom corner of a particular block.
"""
return max(result[i-1][j-1] +
self._calc_weight(self._second_seq[i-1],
self._first_seq[j-1]),
result[i-1][j] + self.gap,
result[i][j-1] + self.gap)

def _calc_weight(self, first_char, second_char):
"""
Helper function, given two characters determines (based on the sc-
oring scheme) what the score for the particular characters can be.

Args:
first_char : A character to compare.
second_char : A character to compare.

Returns:
Either self.match or self.mismatch.
"""
if first_char == second_char:
return self.match
else:
return self.mismatch

def generate(self, first_seq, second_seq):
"""
Generates a matrix corresponding to the scores to the Needleman-Wu-
nsch algorithm.

Args:
first_seq : One of the sequences to be compared for similarity.
second_seq : One of the sequences to be compared for
similarity.

Returns:
A 2D list corresponding to the resulting matrix of the Needlem-
an-Wunsch algorithm.
"""
# Internally requies that the first sequence is longer.
if len(second_seq) > len(first_seq):
first_seq, second_seq = second_seq, first_seq
self._first_seq = first_seq
self._second_seq = second_seq
# Adjust sequence with "intial space"
# Initialize the resulting matrix with the initial row.
result = [list(range(0, -len(first_seq) - 1, -1))]
# Create initial columns.
for i in range(-1, -len(second_seq) - 1, -1):
row = [i]
row.extend([0]*len(first_seq))
result.append(row)
# Sweep through and compute each new cell row-wise.
for i in range(1, len(result)):
for j in range(1, len(result[0])):
result[i][j] = self._compute_block(result, i, j)
# Format for prettier printing.
for index, letter in enumerate(second_seq):
result[index + 1].insert(0, letter)
result[0].insert(0, ' ')
result.insert(0, list(" " + first_seq))
return result

def __init__(self, match=1, mismatch=-1, gap=-1):
"""
Initialize the Needleman-Wunsch class so that it provides weights for
match (default 1), mismatch (default -1), and gap (default -1).
"""
self.match = match
self.mismatch = mismatch
self.gap = gap
self._first_seq = ""
self._second_seq = ""


def deletion(seq, pos):
"""
Deletes a random base pair from a sequence at a specified position.

Args:
seq : Sequence to perform deletion on.
pos : Location of deletion.

Returns:
seq with character removed at pos.
"""
return seq[:pos] + seq[pos:]


def base_change(seq, pos):
"""
Changes a random base pair to another base pair at a specified position.

Args:
seq : Sequence to perform base change on.
pos : Locaion of base change.

Returns:
seq with character changed at pos.
"""
new_base = random.choice("ACTG".replace(seq[pos], ""))
return seq[:pos] + new_base + seq[pos:]


def mutate(seq, rounds=3):
"""
Mutates a piece of DNA by randomly applying a deletion or base change

Args:
seq : The sequence to be mutated.
rounds : Defaults to 3, the number of mutations to be made.

Returns:
A mutated sequence.
"""
mutations = (deletion, base_change)
for _ in range(rounds):
pos = random.randrange(len(seq))
seq = random.choice(mutations)(seq, pos)
return seq


def main():
"""
Creates a random couple of strings and creates the corresponding Needleman
-Wunsch matrix associated with them.
"""
needleman_wunsch = NeedlemanWunsch()
first_seq = ''.join(random.choices("ACTG", k=5))
second_seq = mutate(first_seq)
data = needleman_wunsch.generate(first_seq, second_seq)
print(tabulate(data, headers="firstrow"))


if __name__ == '__main__':
main()


I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.



I am not particularly fond of the code. I haven't used numpy or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.









share



























    0














    The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.



    needlemanwunsch.py



    import random
    from tabulate import tabulate


    class NeedlemanWunsch:
    """
    Class used for generating Needleman-Wunsch matrices.
    """

    def _compute_block(self, result, i, j):
    """
    Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
    f the bottom right corner.
    (Based on the equation:
    M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
    M_{i,j-1} + W,
    M_{i-1,j} + W)
    Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)

    Args:
    result : The current matrix that is being computed.
    i : The right most part of the block being computed.
    j : The bottom most part of the block being computed.

    Returns:
    The value for the right bottom corner of a particular block.
    """
    return max(result[i-1][j-1] +
    self._calc_weight(self._second_seq[i-1],
    self._first_seq[j-1]),
    result[i-1][j] + self.gap,
    result[i][j-1] + self.gap)

    def _calc_weight(self, first_char, second_char):
    """
    Helper function, given two characters determines (based on the sc-
    oring scheme) what the score for the particular characters can be.

    Args:
    first_char : A character to compare.
    second_char : A character to compare.

    Returns:
    Either self.match or self.mismatch.
    """
    if first_char == second_char:
    return self.match
    else:
    return self.mismatch

    def generate(self, first_seq, second_seq):
    """
    Generates a matrix corresponding to the scores to the Needleman-Wu-
    nsch algorithm.

    Args:
    first_seq : One of the sequences to be compared for similarity.
    second_seq : One of the sequences to be compared for
    similarity.

    Returns:
    A 2D list corresponding to the resulting matrix of the Needlem-
    an-Wunsch algorithm.
    """
    # Internally requies that the first sequence is longer.
    if len(second_seq) > len(first_seq):
    first_seq, second_seq = second_seq, first_seq
    self._first_seq = first_seq
    self._second_seq = second_seq
    # Adjust sequence with "intial space"
    # Initialize the resulting matrix with the initial row.
    result = [list(range(0, -len(first_seq) - 1, -1))]
    # Create initial columns.
    for i in range(-1, -len(second_seq) - 1, -1):
    row = [i]
    row.extend([0]*len(first_seq))
    result.append(row)
    # Sweep through and compute each new cell row-wise.
    for i in range(1, len(result)):
    for j in range(1, len(result[0])):
    result[i][j] = self._compute_block(result, i, j)
    # Format for prettier printing.
    for index, letter in enumerate(second_seq):
    result[index + 1].insert(0, letter)
    result[0].insert(0, ' ')
    result.insert(0, list(" " + first_seq))
    return result

    def __init__(self, match=1, mismatch=-1, gap=-1):
    """
    Initialize the Needleman-Wunsch class so that it provides weights for
    match (default 1), mismatch (default -1), and gap (default -1).
    """
    self.match = match
    self.mismatch = mismatch
    self.gap = gap
    self._first_seq = ""
    self._second_seq = ""


    def deletion(seq, pos):
    """
    Deletes a random base pair from a sequence at a specified position.

    Args:
    seq : Sequence to perform deletion on.
    pos : Location of deletion.

    Returns:
    seq with character removed at pos.
    """
    return seq[:pos] + seq[pos:]


    def base_change(seq, pos):
    """
    Changes a random base pair to another base pair at a specified position.

    Args:
    seq : Sequence to perform base change on.
    pos : Locaion of base change.

    Returns:
    seq with character changed at pos.
    """
    new_base = random.choice("ACTG".replace(seq[pos], ""))
    return seq[:pos] + new_base + seq[pos:]


    def mutate(seq, rounds=3):
    """
    Mutates a piece of DNA by randomly applying a deletion or base change

    Args:
    seq : The sequence to be mutated.
    rounds : Defaults to 3, the number of mutations to be made.

    Returns:
    A mutated sequence.
    """
    mutations = (deletion, base_change)
    for _ in range(rounds):
    pos = random.randrange(len(seq))
    seq = random.choice(mutations)(seq, pos)
    return seq


    def main():
    """
    Creates a random couple of strings and creates the corresponding Needleman
    -Wunsch matrix associated with them.
    """
    needleman_wunsch = NeedlemanWunsch()
    first_seq = ''.join(random.choices("ACTG", k=5))
    second_seq = mutate(first_seq)
    data = needleman_wunsch.generate(first_seq, second_seq)
    print(tabulate(data, headers="firstrow"))


    if __name__ == '__main__':
    main()


    I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.



    I am not particularly fond of the code. I haven't used numpy or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.









    share

























      0












      0








      0







      The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.



      needlemanwunsch.py



      import random
      from tabulate import tabulate


      class NeedlemanWunsch:
      """
      Class used for generating Needleman-Wunsch matrices.
      """

      def _compute_block(self, result, i, j):
      """
      Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
      f the bottom right corner.
      (Based on the equation:
      M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
      M_{i,j-1} + W,
      M_{i-1,j} + W)
      Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)

      Args:
      result : The current matrix that is being computed.
      i : The right most part of the block being computed.
      j : The bottom most part of the block being computed.

      Returns:
      The value for the right bottom corner of a particular block.
      """
      return max(result[i-1][j-1] +
      self._calc_weight(self._second_seq[i-1],
      self._first_seq[j-1]),
      result[i-1][j] + self.gap,
      result[i][j-1] + self.gap)

      def _calc_weight(self, first_char, second_char):
      """
      Helper function, given two characters determines (based on the sc-
      oring scheme) what the score for the particular characters can be.

      Args:
      first_char : A character to compare.
      second_char : A character to compare.

      Returns:
      Either self.match or self.mismatch.
      """
      if first_char == second_char:
      return self.match
      else:
      return self.mismatch

      def generate(self, first_seq, second_seq):
      """
      Generates a matrix corresponding to the scores to the Needleman-Wu-
      nsch algorithm.

      Args:
      first_seq : One of the sequences to be compared for similarity.
      second_seq : One of the sequences to be compared for
      similarity.

      Returns:
      A 2D list corresponding to the resulting matrix of the Needlem-
      an-Wunsch algorithm.
      """
      # Internally requies that the first sequence is longer.
      if len(second_seq) > len(first_seq):
      first_seq, second_seq = second_seq, first_seq
      self._first_seq = first_seq
      self._second_seq = second_seq
      # Adjust sequence with "intial space"
      # Initialize the resulting matrix with the initial row.
      result = [list(range(0, -len(first_seq) - 1, -1))]
      # Create initial columns.
      for i in range(-1, -len(second_seq) - 1, -1):
      row = [i]
      row.extend([0]*len(first_seq))
      result.append(row)
      # Sweep through and compute each new cell row-wise.
      for i in range(1, len(result)):
      for j in range(1, len(result[0])):
      result[i][j] = self._compute_block(result, i, j)
      # Format for prettier printing.
      for index, letter in enumerate(second_seq):
      result[index + 1].insert(0, letter)
      result[0].insert(0, ' ')
      result.insert(0, list(" " + first_seq))
      return result

      def __init__(self, match=1, mismatch=-1, gap=-1):
      """
      Initialize the Needleman-Wunsch class so that it provides weights for
      match (default 1), mismatch (default -1), and gap (default -1).
      """
      self.match = match
      self.mismatch = mismatch
      self.gap = gap
      self._first_seq = ""
      self._second_seq = ""


      def deletion(seq, pos):
      """
      Deletes a random base pair from a sequence at a specified position.

      Args:
      seq : Sequence to perform deletion on.
      pos : Location of deletion.

      Returns:
      seq with character removed at pos.
      """
      return seq[:pos] + seq[pos:]


      def base_change(seq, pos):
      """
      Changes a random base pair to another base pair at a specified position.

      Args:
      seq : Sequence to perform base change on.
      pos : Locaion of base change.

      Returns:
      seq with character changed at pos.
      """
      new_base = random.choice("ACTG".replace(seq[pos], ""))
      return seq[:pos] + new_base + seq[pos:]


      def mutate(seq, rounds=3):
      """
      Mutates a piece of DNA by randomly applying a deletion or base change

      Args:
      seq : The sequence to be mutated.
      rounds : Defaults to 3, the number of mutations to be made.

      Returns:
      A mutated sequence.
      """
      mutations = (deletion, base_change)
      for _ in range(rounds):
      pos = random.randrange(len(seq))
      seq = random.choice(mutations)(seq, pos)
      return seq


      def main():
      """
      Creates a random couple of strings and creates the corresponding Needleman
      -Wunsch matrix associated with them.
      """
      needleman_wunsch = NeedlemanWunsch()
      first_seq = ''.join(random.choices("ACTG", k=5))
      second_seq = mutate(first_seq)
      data = needleman_wunsch.generate(first_seq, second_seq)
      print(tabulate(data, headers="firstrow"))


      if __name__ == '__main__':
      main()


      I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.



      I am not particularly fond of the code. I haven't used numpy or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.









      share













      The Needleman-Wunsch algorithm is a way to align sequences in a way that optimizes "similarity". Usually, a grid is generated and then you follow a path down the grid (based off the largest value) to compute the optimal alignment between two sequences. I have created a Python program, that given two strings, will create the resulting matrix for the Needleman-Wunsch algorithm. Currently, the program when ran will generate two random sequences of DNA and then print out the resulting Needleman-Wunsch matrix.



      needlemanwunsch.py



      import random
      from tabulate import tabulate


      class NeedlemanWunsch:
      """
      Class used for generating Needleman-Wunsch matrices.
      """

      def _compute_block(self, result, i, j):
      """
      Given a block (corresponding to a 2 x 2 matrix), calculate the value o-
      f the bottom right corner.
      (Based on the equation:
      M_{i,j} = max(M_{i-1,j-1} + S_{i,j},
      M_{i,j-1} + W,
      M_{i-1,j} + W)
      Found here: https://vlab.amrita.edu/?sub=3&brch=274&sim=1431&cnt=1)

      Args:
      result : The current matrix that is being computed.
      i : The right most part of the block being computed.
      j : The bottom most part of the block being computed.

      Returns:
      The value for the right bottom corner of a particular block.
      """
      return max(result[i-1][j-1] +
      self._calc_weight(self._second_seq[i-1],
      self._first_seq[j-1]),
      result[i-1][j] + self.gap,
      result[i][j-1] + self.gap)

      def _calc_weight(self, first_char, second_char):
      """
      Helper function, given two characters determines (based on the sc-
      oring scheme) what the score for the particular characters can be.

      Args:
      first_char : A character to compare.
      second_char : A character to compare.

      Returns:
      Either self.match or self.mismatch.
      """
      if first_char == second_char:
      return self.match
      else:
      return self.mismatch

      def generate(self, first_seq, second_seq):
      """
      Generates a matrix corresponding to the scores to the Needleman-Wu-
      nsch algorithm.

      Args:
      first_seq : One of the sequences to be compared for similarity.
      second_seq : One of the sequences to be compared for
      similarity.

      Returns:
      A 2D list corresponding to the resulting matrix of the Needlem-
      an-Wunsch algorithm.
      """
      # Internally requies that the first sequence is longer.
      if len(second_seq) > len(first_seq):
      first_seq, second_seq = second_seq, first_seq
      self._first_seq = first_seq
      self._second_seq = second_seq
      # Adjust sequence with "intial space"
      # Initialize the resulting matrix with the initial row.
      result = [list(range(0, -len(first_seq) - 1, -1))]
      # Create initial columns.
      for i in range(-1, -len(second_seq) - 1, -1):
      row = [i]
      row.extend([0]*len(first_seq))
      result.append(row)
      # Sweep through and compute each new cell row-wise.
      for i in range(1, len(result)):
      for j in range(1, len(result[0])):
      result[i][j] = self._compute_block(result, i, j)
      # Format for prettier printing.
      for index, letter in enumerate(second_seq):
      result[index + 1].insert(0, letter)
      result[0].insert(0, ' ')
      result.insert(0, list(" " + first_seq))
      return result

      def __init__(self, match=1, mismatch=-1, gap=-1):
      """
      Initialize the Needleman-Wunsch class so that it provides weights for
      match (default 1), mismatch (default -1), and gap (default -1).
      """
      self.match = match
      self.mismatch = mismatch
      self.gap = gap
      self._first_seq = ""
      self._second_seq = ""


      def deletion(seq, pos):
      """
      Deletes a random base pair from a sequence at a specified position.

      Args:
      seq : Sequence to perform deletion on.
      pos : Location of deletion.

      Returns:
      seq with character removed at pos.
      """
      return seq[:pos] + seq[pos:]


      def base_change(seq, pos):
      """
      Changes a random base pair to another base pair at a specified position.

      Args:
      seq : Sequence to perform base change on.
      pos : Locaion of base change.

      Returns:
      seq with character changed at pos.
      """
      new_base = random.choice("ACTG".replace(seq[pos], ""))
      return seq[:pos] + new_base + seq[pos:]


      def mutate(seq, rounds=3):
      """
      Mutates a piece of DNA by randomly applying a deletion or base change

      Args:
      seq : The sequence to be mutated.
      rounds : Defaults to 3, the number of mutations to be made.

      Returns:
      A mutated sequence.
      """
      mutations = (deletion, base_change)
      for _ in range(rounds):
      pos = random.randrange(len(seq))
      seq = random.choice(mutations)(seq, pos)
      return seq


      def main():
      """
      Creates a random couple of strings and creates the corresponding Needleman
      -Wunsch matrix associated with them.
      """
      needleman_wunsch = NeedlemanWunsch()
      first_seq = ''.join(random.choices("ACTG", k=5))
      second_seq = mutate(first_seq)
      data = needleman_wunsch.generate(first_seq, second_seq)
      print(tabulate(data, headers="firstrow"))


      if __name__ == '__main__':
      main()


      I ended up using a NeedlemanWunsch class, because using only function resulted in a lot DRY for the parameters match, mismatch, and gap.



      I am not particularly fond of the code. I haven't used numpy or any related libraries because I couldn't see any way that it would significantly shorten the code, however, I would be willing to use numpy if there is a significantly shorter way of expressing the matrix generation. However, the code seems frail, and very prone to off by one errors.







      python algorithm python-3.x bioinformatics





      share












      share










      share



      share










      asked 3 mins ago









      Dair

      4,417729




      4,417729



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210099%2fneedleman-wunsch-grid-generation-in-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210099%2fneedleman-wunsch-grid-generation-in-python%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ellipse (mathématiques)

          Quarter-circle Tiles

          Mont Emei