Counting SQL GUIDs from a server log and printing the stats, improved












0














This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.



"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.

author: clmno
date: 2018-12-23
updated: 2018-12-27
"""

from time import time
import re

def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f

#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1

def find_device_IDs(path, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
return lines

def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}

for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
return lines

def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))

if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}

#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}

print_stats(num_exceptions_dict)


and here's a small server log for you to experiment on



Improvements




  • More pythonic with some help from Reinderien.


  • Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

  • Using better data structure model. Was using dicts everywhere, sets made sense.


My current approach is to




  1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

  2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


  3. Print the stats


Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive. And so running them in parallel makes more sense.



How should I get going to improve this code?









share







New contributor




clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    0














    This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

    This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.



    """
    Find the number of 'exceptions' and 'added' event's in the exception log
    with respect to the device ID.

    author: clmno
    date: 2018-12-23
    updated: 2018-12-27
    """

    from time import time
    import re

    def timer(fn):
    """ Used to time a function's execution"""
    def f(*args, **kwargs):
    before = time()
    rv = fn(*args, **kwargs)
    after = time()
    print("elapsed", after - before)
    return rv
    return f

    #compile the regex globally
    re_prefix = '.*?'
    re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
    rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

    def find_sql_guid(txt):
    """ From the passed in txt, find the SQL guid using re"""
    m = rg.search(txt)
    if m:
    guid1 = m.group(1)
    else:
    print("ERROR: No SQL guid in line. Check the code")
    exit(-1)
    return guid1

    def find_device_IDs(path, element):
    """ Find the element (type: str) within the file (file path is
    provide as arg). Then find the SQL guid from the line at hand.
    (Each line has a SQL guid)
    Return a dict of {element: [<list of SQL guids>]}
    """
    lines = set()
    for line in file_obj:
    if element in line:
    #find the sql-guid from the line-str & append
    lines.add(find_sql_guid(line))
    return lines

    def find_num_occurences(file_obj, key, search_val, unique_values):
    """ Find and append SQL guids that are in a line that contains a string
    that's in search_val into 'exception' and 'added'
    Return a dict of {'exception':set(<set of SQL guids>),
    'added': set(<set of SQL guids>)}
    """
    lines = {'exception':set(), 'added': set()}

    for line in file_obj:
    for value in unique_values:
    if value in line:
    if search_val[0] in line:
    lines['exception'].add(value)
    elif search_val[1] in line:
    lines['added'].add(value)
    return lines

    def print_stats(num_exceptions_dict):
    for key in num_exceptions_dict.keys():
    print("{} added ".format(key) +
    str(len(list(num_exceptions_dict[key]["added"]))))
    print("{} exceptions ".format(key) +
    str(len(list(num_exceptions_dict[key]["exception"]))))

    if __name__ == "__main__":
    path = 'log/server.log'
    search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

    with open(path) as file_obj:
    #find every occurance of device ID and find their corresponding SQL
    # guids (unique ID)
    unique_ids_dict = {
    element: find_device_IDs(file_obj, element)
    for element in search_list
    }

    #Now for each unique ID find if string ["Exception occurred",
    # "Packet record has been added"] is found in it's SQL guid list.
    search_with_in_deviceID = ("Exception occurred",
    "Packet record has been added")
    #reset the file pointer
    file_obj.seek(0)
    num_exceptions_dict = {
    elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
    unique_ids_dict[elem])
    for elem in search_list
    }

    print_stats(num_exceptions_dict)


    and here's a small server log for you to experiment on



    Improvements




    • More pythonic with some help from Reinderien.


    • Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

    • Using better data structure model. Was using dicts everywhere, sets made sense.


    My current approach is to




    1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

    2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


    3. Print the stats


    Parallelize

    Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive. And so running them in parallel makes more sense.



    How should I get going to improve this code?









    share







    New contributor




    clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      0












      0








      0







      This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

      This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.



      """
      Find the number of 'exceptions' and 'added' event's in the exception log
      with respect to the device ID.

      author: clmno
      date: 2018-12-23
      updated: 2018-12-27
      """

      from time import time
      import re

      def timer(fn):
      """ Used to time a function's execution"""
      def f(*args, **kwargs):
      before = time()
      rv = fn(*args, **kwargs)
      after = time()
      print("elapsed", after - before)
      return rv
      return f

      #compile the regex globally
      re_prefix = '.*?'
      re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
      rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

      def find_sql_guid(txt):
      """ From the passed in txt, find the SQL guid using re"""
      m = rg.search(txt)
      if m:
      guid1 = m.group(1)
      else:
      print("ERROR: No SQL guid in line. Check the code")
      exit(-1)
      return guid1

      def find_device_IDs(path, element):
      """ Find the element (type: str) within the file (file path is
      provide as arg). Then find the SQL guid from the line at hand.
      (Each line has a SQL guid)
      Return a dict of {element: [<list of SQL guids>]}
      """
      lines = set()
      for line in file_obj:
      if element in line:
      #find the sql-guid from the line-str & append
      lines.add(find_sql_guid(line))
      return lines

      def find_num_occurences(file_obj, key, search_val, unique_values):
      """ Find and append SQL guids that are in a line that contains a string
      that's in search_val into 'exception' and 'added'
      Return a dict of {'exception':set(<set of SQL guids>),
      'added': set(<set of SQL guids>)}
      """
      lines = {'exception':set(), 'added': set()}

      for line in file_obj:
      for value in unique_values:
      if value in line:
      if search_val[0] in line:
      lines['exception'].add(value)
      elif search_val[1] in line:
      lines['added'].add(value)
      return lines

      def print_stats(num_exceptions_dict):
      for key in num_exceptions_dict.keys():
      print("{} added ".format(key) +
      str(len(list(num_exceptions_dict[key]["added"]))))
      print("{} exceptions ".format(key) +
      str(len(list(num_exceptions_dict[key]["exception"]))))

      if __name__ == "__main__":
      path = 'log/server.log'
      search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

      with open(path) as file_obj:
      #find every occurance of device ID and find their corresponding SQL
      # guids (unique ID)
      unique_ids_dict = {
      element: find_device_IDs(file_obj, element)
      for element in search_list
      }

      #Now for each unique ID find if string ["Exception occurred",
      # "Packet record has been added"] is found in it's SQL guid list.
      search_with_in_deviceID = ("Exception occurred",
      "Packet record has been added")
      #reset the file pointer
      file_obj.seek(0)
      num_exceptions_dict = {
      elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
      unique_ids_dict[elem])
      for elem in search_list
      }

      print_stats(num_exceptions_dict)


      and here's a small server log for you to experiment on



      Improvements




      • More pythonic with some help from Reinderien.


      • Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

      • Using better data structure model. Was using dicts everywhere, sets made sense.


      My current approach is to




      1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

      2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


      3. Print the stats


      Parallelize

      Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive. And so running them in parallel makes more sense.



      How should I get going to improve this code?









      share







      New contributor




      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

      This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.



      """
      Find the number of 'exceptions' and 'added' event's in the exception log
      with respect to the device ID.

      author: clmno
      date: 2018-12-23
      updated: 2018-12-27
      """

      from time import time
      import re

      def timer(fn):
      """ Used to time a function's execution"""
      def f(*args, **kwargs):
      before = time()
      rv = fn(*args, **kwargs)
      after = time()
      print("elapsed", after - before)
      return rv
      return f

      #compile the regex globally
      re_prefix = '.*?'
      re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
      rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)

      def find_sql_guid(txt):
      """ From the passed in txt, find the SQL guid using re"""
      m = rg.search(txt)
      if m:
      guid1 = m.group(1)
      else:
      print("ERROR: No SQL guid in line. Check the code")
      exit(-1)
      return guid1

      def find_device_IDs(path, element):
      """ Find the element (type: str) within the file (file path is
      provide as arg). Then find the SQL guid from the line at hand.
      (Each line has a SQL guid)
      Return a dict of {element: [<list of SQL guids>]}
      """
      lines = set()
      for line in file_obj:
      if element in line:
      #find the sql-guid from the line-str & append
      lines.add(find_sql_guid(line))
      return lines

      def find_num_occurences(file_obj, key, search_val, unique_values):
      """ Find and append SQL guids that are in a line that contains a string
      that's in search_val into 'exception' and 'added'
      Return a dict of {'exception':set(<set of SQL guids>),
      'added': set(<set of SQL guids>)}
      """
      lines = {'exception':set(), 'added': set()}

      for line in file_obj:
      for value in unique_values:
      if value in line:
      if search_val[0] in line:
      lines['exception'].add(value)
      elif search_val[1] in line:
      lines['added'].add(value)
      return lines

      def print_stats(num_exceptions_dict):
      for key in num_exceptions_dict.keys():
      print("{} added ".format(key) +
      str(len(list(num_exceptions_dict[key]["added"]))))
      print("{} exceptions ".format(key) +
      str(len(list(num_exceptions_dict[key]["exception"]))))

      if __name__ == "__main__":
      path = 'log/server.log'
      search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')

      with open(path) as file_obj:
      #find every occurance of device ID and find their corresponding SQL
      # guids (unique ID)
      unique_ids_dict = {
      element: find_device_IDs(file_obj, element)
      for element in search_list
      }

      #Now for each unique ID find if string ["Exception occurred",
      # "Packet record has been added"] is found in it's SQL guid list.
      search_with_in_deviceID = ("Exception occurred",
      "Packet record has been added")
      #reset the file pointer
      file_obj.seek(0)
      num_exceptions_dict = {
      elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
      unique_ids_dict[elem])
      for elem in search_list
      }

      print_stats(num_exceptions_dict)


      and here's a small server log for you to experiment on



      Improvements




      • More pythonic with some help from Reinderien.


      • Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

      • Using better data structure model. Was using dicts everywhere, sets made sense.


      My current approach is to




      1. Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

      2. For each SQL GUID find if it resulted in an exception or added event. Store them in dict.


      3. Print the stats


      Parallelize

      Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive. And so running them in parallel makes more sense.



      How should I get going to improve this code?







      python





      share







      New contributor




      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.










      share







      New contributor




      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.








      share



      share






      New contributor




      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 6 mins ago









      clmno

      1184




      1184




      New contributor




      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.



























          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          clmno is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown






























          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          clmno is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          clmno is a new contributor. Be nice, and check out our Code of Conduct.













          clmno is a new contributor. Be nice, and check out our Code of Conduct.












          clmno is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Ellipse (mathématiques)

          Quarter-circle Tiles

          Mont Emei