Counting SQL GUIDs from a server log and printing the stats, improved

This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.

This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.

"""

Find the number of 'exceptions' and 'added' event's in the exception log

with respect to the device ID.



author: clmno

date: 2018-12-23

updated: 2018-12-27

"""



from time import time

import re



def timer(fn):

    """ Used to time a function's execution"""

    def f(*args, **kwargs):

        before = time()

        rv = fn(*args, **kwargs)

        after = time()

        print("elapsed", after - before)

        return rv

    return f



#compile the regex globally

re_prefix = '.*?'

re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'

rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)



def find_sql_guid(txt):

    """ From the passed in txt, find the SQL guid using re"""

    m = rg.search(txt)

    if m:

        guid1 = m.group(1)

    else:

        print("ERROR: No SQL guid in line. Check the code")

        exit(-1)

    return guid1



def find_device_IDs(path, element):

    """ Find the element (type: str) within the file (file path is 

        provide as arg). Then find the SQL guid from the line at hand.

        (Each line has a SQL guid)

        Return a dict of {element: [<list of SQL guids>]}

    """

    lines = set()

    for line in file_obj:

        if element in line:

            #find the sql-guid from the line-str & append

            lines.add(find_sql_guid(line))

    return lines



def find_num_occurences(file_obj, key, search_val, unique_values):

    """ Find and append SQL guids that are in a line that contains a string

        that's in search_val into 'exception' and 'added'

        Return a dict of {'exception':set(<set of SQL guids>), 

                        'added': set(<set of SQL guids>)}

    """

    lines = {'exception':set(), 'added': set()}



    for line in file_obj:

        for value in unique_values:

            if value in line:

                if search_val[0] in line:

                    lines['exception'].add(value)

                elif search_val[1] in line:

                    lines['added'].add(value)

    return lines



def print_stats(num_exceptions_dict):

    for key in num_exceptions_dict.keys():

        print("{} added ".format(key) + 

            str(len(list(num_exceptions_dict[key]["added"]))))

        print("{} exceptions ".format(key) + 

            str(len(list(num_exceptions_dict[key]["exception"]))))



if __name__ == "__main__":

    path  = 'log/server.log'

    search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')



    with open(path) as file_obj:

        #find every occurance of device ID and find their corresponding SQL

        # guids (unique ID)

        unique_ids_dict = {

            element: find_device_IDs(file_obj, element)

            for element in search_list

        }



        #Now for each unique ID find if string ["Exception occurred", 

        # "Packet record has been added"] is found in it's SQL guid list.

        search_with_in_deviceID = ("Exception occurred", 

                                    "Packet record has been added")

        #reset the file pointer

        file_obj.seek(0)

        num_exceptions_dict = {

            elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,

                                    unique_ids_dict[elem])

            for elem in search_list

        }



    print_stats(num_exceptions_dict)

and here's a small server log for you to experiment on

Improvements

More pythonic with some help from Reinderien.

Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

Using better data structure model. Was using dicts everywhere, sets made sense.

My current approach is to

Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

For each SQL GUID find if it resulted in an exception or added event. Store them in dict.

Print the stats

Parallelize

Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive. And so running them in parallel makes more sense.

How should I get going to improve this code?

asked 6 mins ago

clmno

1184

New contributor

add a comment |

"""

Find the number of 'exceptions' and 'added' event's in the exception log

with respect to the device ID.



author: clmno

date: 2018-12-23

updated: 2018-12-27

"""



from time import time

import re



def timer(fn):

    """ Used to time a function's execution"""

    def f(*args, **kwargs):

        before = time()

        rv = fn(*args, **kwargs)

        after = time()

        print("elapsed", after - before)

        return rv

    return f



#compile the regex globally

re_prefix = '.*?'

re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'

rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)



def find_sql_guid(txt):

    """ From the passed in txt, find the SQL guid using re"""

    m = rg.search(txt)

    if m:

        guid1 = m.group(1)

    else:

        print("ERROR: No SQL guid in line. Check the code")

        exit(-1)

    return guid1



def find_device_IDs(path, element):

    """ Find the element (type: str) within the file (file path is 

        provide as arg). Then find the SQL guid from the line at hand.

        (Each line has a SQL guid)

        Return a dict of {element: [<list of SQL guids>]}

    """

    lines = set()

    for line in file_obj:

        if element in line:

            #find the sql-guid from the line-str & append

            lines.add(find_sql_guid(line))

    return lines



def find_num_occurences(file_obj, key, search_val, unique_values):

    """ Find and append SQL guids that are in a line that contains a string

        that's in search_val into 'exception' and 'added'

        Return a dict of {'exception':set(<set of SQL guids>), 

                        'added': set(<set of SQL guids>)}

    """

    lines = {'exception':set(), 'added': set()}



    for line in file_obj:

        for value in unique_values:

            if value in line:

                if search_val[0] in line:

                    lines['exception'].add(value)

                elif search_val[1] in line:

                    lines['added'].add(value)

    return lines



def print_stats(num_exceptions_dict):

    for key in num_exceptions_dict.keys():

        print("{} added ".format(key) + 

            str(len(list(num_exceptions_dict[key]["added"]))))

        print("{} exceptions ".format(key) + 

            str(len(list(num_exceptions_dict[key]["exception"]))))



if __name__ == "__main__":

    path  = 'log/server.log'

    search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')



    with open(path) as file_obj:

        #find every occurance of device ID and find their corresponding SQL

        # guids (unique ID)

        unique_ids_dict = {

            element: find_device_IDs(file_obj, element)

            for element in search_list

        }



        #Now for each unique ID find if string ["Exception occurred", 

        # "Packet record has been added"] is found in it's SQL guid list.

        search_with_in_deviceID = ("Exception occurred", 

                                    "Packet record has been added")

        #reset the file pointer

        file_obj.seek(0)

        num_exceptions_dict = {

            elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,

                                    unique_ids_dict[elem])

            for elem in search_list

        }



    print_stats(num_exceptions_dict)

and here's a small server log for you to experiment on

Improvements

More pythonic with some help from Reinderien.

Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

Using better data structure model. Was using dicts everywhere, sets made sense.

My current approach is to

Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

For each SQL GUID find if it resulted in an exception or added event. Store them in dict.

Print the stats

How should I get going to improve this code?

asked 6 mins ago

clmno

1184

New contributor

add a comment |

"""

Find the number of 'exceptions' and 'added' event's in the exception log

with respect to the device ID.



author: clmno

date: 2018-12-23

updated: 2018-12-27

"""



from time import time

import re



def timer(fn):

    """ Used to time a function's execution"""

    def f(*args, **kwargs):

        before = time()

        rv = fn(*args, **kwargs)

        after = time()

        print("elapsed", after - before)

        return rv

    return f



#compile the regex globally

re_prefix = '.*?'

re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'

rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)



def find_sql_guid(txt):

    """ From the passed in txt, find the SQL guid using re"""

    m = rg.search(txt)

    if m:

        guid1 = m.group(1)

    else:

        print("ERROR: No SQL guid in line. Check the code")

        exit(-1)

    return guid1



def find_device_IDs(path, element):

    """ Find the element (type: str) within the file (file path is 

        provide as arg). Then find the SQL guid from the line at hand.

        (Each line has a SQL guid)

        Return a dict of {element: [<list of SQL guids>]}

    """

    lines = set()

    for line in file_obj:

        if element in line:

            #find the sql-guid from the line-str & append

            lines.add(find_sql_guid(line))

    return lines



def find_num_occurences(file_obj, key, search_val, unique_values):

    """ Find and append SQL guids that are in a line that contains a string

        that's in search_val into 'exception' and 'added'

        Return a dict of {'exception':set(<set of SQL guids>), 

                        'added': set(<set of SQL guids>)}

    """

    lines = {'exception':set(), 'added': set()}



    for line in file_obj:

        for value in unique_values:

            if value in line:

                if search_val[0] in line:

                    lines['exception'].add(value)

                elif search_val[1] in line:

                    lines['added'].add(value)

    return lines



def print_stats(num_exceptions_dict):

    for key in num_exceptions_dict.keys():

        print("{} added ".format(key) + 

            str(len(list(num_exceptions_dict[key]["added"]))))

        print("{} exceptions ".format(key) + 

            str(len(list(num_exceptions_dict[key]["exception"]))))



if __name__ == "__main__":

    path  = 'log/server.log'

    search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')



    with open(path) as file_obj:

        #find every occurance of device ID and find their corresponding SQL

        # guids (unique ID)

        unique_ids_dict = {

            element: find_device_IDs(file_obj, element)

            for element in search_list

        }



        #Now for each unique ID find if string ["Exception occurred", 

        # "Packet record has been added"] is found in it's SQL guid list.

        search_with_in_deviceID = ("Exception occurred", 

                                    "Packet record has been added")

        #reset the file pointer

        file_obj.seek(0)

        num_exceptions_dict = {

            elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,

                                    unique_ids_dict[elem])

            for elem in search_list

        }



    print_stats(num_exceptions_dict)

and here's a small server log for you to experiment on

Improvements

More pythonic with some help from Reinderien.

Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

Using better data structure model. Was using dicts everywhere, sets made sense.

My current approach is to

Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

For each SQL GUID find if it resulted in an exception or added event. Store them in dict.

Print the stats

How should I get going to improve this code?

asked 6 mins ago

clmno

1184

New contributor

"""

Find the number of 'exceptions' and 'added' event's in the exception log

with respect to the device ID.



author: clmno

date: 2018-12-23

updated: 2018-12-27

"""



from time import time

import re



def timer(fn):

    """ Used to time a function's execution"""

    def f(*args, **kwargs):

        before = time()

        rv = fn(*args, **kwargs)

        after = time()

        print("elapsed", after - before)

        return rv

    return f



#compile the regex globally

re_prefix = '.*?'

re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'

rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)



def find_sql_guid(txt):

    """ From the passed in txt, find the SQL guid using re"""

    m = rg.search(txt)

    if m:

        guid1 = m.group(1)

    else:

        print("ERROR: No SQL guid in line. Check the code")

        exit(-1)

    return guid1



def find_device_IDs(path, element):

    """ Find the element (type: str) within the file (file path is 

        provide as arg). Then find the SQL guid from the line at hand.

        (Each line has a SQL guid)

        Return a dict of {element: [<list of SQL guids>]}

    """

    lines = set()

    for line in file_obj:

        if element in line:

            #find the sql-guid from the line-str & append

            lines.add(find_sql_guid(line))

    return lines



def find_num_occurences(file_obj, key, search_val, unique_values):

    """ Find and append SQL guids that are in a line that contains a string

        that's in search_val into 'exception' and 'added'

        Return a dict of {'exception':set(<set of SQL guids>), 

                        'added': set(<set of SQL guids>)}

    """

    lines = {'exception':set(), 'added': set()}



    for line in file_obj:

        for value in unique_values:

            if value in line:

                if search_val[0] in line:

                    lines['exception'].add(value)

                elif search_val[1] in line:

                    lines['added'].add(value)

    return lines



def print_stats(num_exceptions_dict):

    for key in num_exceptions_dict.keys():

        print("{} added ".format(key) + 

            str(len(list(num_exceptions_dict[key]["added"]))))

        print("{} exceptions ".format(key) + 

            str(len(list(num_exceptions_dict[key]["exception"]))))



if __name__ == "__main__":

    path  = 'log/server.log'

    search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')



    with open(path) as file_obj:

        #find every occurance of device ID and find their corresponding SQL

        # guids (unique ID)

        unique_ids_dict = {

            element: find_device_IDs(file_obj, element)

            for element in search_list

        }



        #Now for each unique ID find if string ["Exception occurred", 

        # "Packet record has been added"] is found in it's SQL guid list.

        search_with_in_deviceID = ("Exception occurred", 

                                    "Packet record has been added")

        #reset the file pointer

        file_obj.seek(0)

        num_exceptions_dict = {

            elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,

                                    unique_ids_dict[elem])

            for elem in search_list

        }



    print_stats(num_exceptions_dict)

and here's a small server log for you to experiment on

Improvements

More pythonic with some help from Reinderien.

Opening the file only once. This reduced the time of execution from 47s to 10s (for a file of 55K lines)

Using better data structure model. Was using dicts everywhere, sets made sense.

My current approach is to

Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.

For each SQL GUID find if it resulted in an exception or added event. Store them in dict.

Print the stats

How should I get going to improve this code?

python

asked 6 mins ago

clmno

1184

New contributor

asked 6 mins ago

clmno

1184

New contributor

asked 6 mins ago

clmno

1184

New contributor

asked 6 mins ago

clmno

1184

asked 6 mins ago

clmno

1184

New contributor

clmno is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

clmno is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

clmno is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

clmno is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

tv,IRSZ4VXSWU 27j1loD S6fbtjzEUVpp4M7Y,AS,wmbFlTKhu,8P IaEp4ax,q

搜尋此網誌

Krdytkyu