Counting SQL GUIDs from a server log and printing the stats, improved
This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.
This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.
"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.
author: clmno
date: 2018-12-23
updated: 2018-12-27
"""
from time import time
import re
def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f
#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)
def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1
def find_device_IDs(path, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
return lines
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}
for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
return lines
def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))
if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}
#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}
print_stats(num_exceptions_dict)
and here's a small server log for you to experiment on
Improvements
- More pythonic with some help from Reinderien.
- Opening the file only once. This reduced the time of execution from
47s
to10s
(for a file of 55K lines) - Using better data structure model. Was using
dict
s everywhere,set
s made sense.
My current approach is to
- Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.
- For each SQL GUID find if it resulted in an
exception
oradded
event. Store them in dict.
- Print the stats
Parallelize
Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive
. And so running them in parallel makes more sense.
How should I get going to improve this code?
python
New contributor
add a comment |
This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.
This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.
"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.
author: clmno
date: 2018-12-23
updated: 2018-12-27
"""
from time import time
import re
def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f
#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)
def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1
def find_device_IDs(path, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
return lines
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}
for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
return lines
def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))
if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}
#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}
print_stats(num_exceptions_dict)
and here's a small server log for you to experiment on
Improvements
- More pythonic with some help from Reinderien.
- Opening the file only once. This reduced the time of execution from
47s
to10s
(for a file of 55K lines) - Using better data structure model. Was using
dict
s everywhere,set
s made sense.
My current approach is to
- Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.
- For each SQL GUID find if it resulted in an
exception
oradded
event. Store them in dict.
- Print the stats
Parallelize
Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive
. And so running them in parallel makes more sense.
How should I get going to improve this code?
python
New contributor
add a comment |
This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.
This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.
"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.
author: clmno
date: 2018-12-23
updated: 2018-12-27
"""
from time import time
import re
def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f
#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)
def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1
def find_device_IDs(path, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
return lines
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}
for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
return lines
def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))
if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}
#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}
print_stats(num_exceptions_dict)
and here's a small server log for you to experiment on
Improvements
- More pythonic with some help from Reinderien.
- Opening the file only once. This reduced the time of execution from
47s
to10s
(for a file of 55K lines) - Using better data structure model. Was using
dict
s everywhere,set
s made sense.
My current approach is to
- Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.
- For each SQL GUID find if it resulted in an
exception
oradded
event. Store them in dict.
- Print the stats
Parallelize
Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive
. And so running them in parallel makes more sense.
How should I get going to improve this code?
python
New contributor
This is a continuation from my original question. After the improvements suggested by @Reinderien and some of my own.
This approach I've taken is kinda obvious. And I'm not using any parallel processing. I think there's scope for improvement because I know of a crate, Rayon in Rust which could have run the steps I'm currently running parallelly. I'll explain why I think this is possible below.
"""
Find the number of 'exceptions' and 'added' event's in the exception log
with respect to the device ID.
author: clmno
date: 2018-12-23
updated: 2018-12-27
"""
from time import time
import re
def timer(fn):
""" Used to time a function's execution"""
def f(*args, **kwargs):
before = time()
rv = fn(*args, **kwargs)
after = time()
print("elapsed", after - before)
return rv
return f
#compile the regex globally
re_prefix = '.*?'
re_guid='([A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12})'
rg = re.compile(re_prefix+re_guid, re.IGNORECASE|re.DOTALL)
def find_sql_guid(txt):
""" From the passed in txt, find the SQL guid using re"""
m = rg.search(txt)
if m:
guid1 = m.group(1)
else:
print("ERROR: No SQL guid in line. Check the code")
exit(-1)
return guid1
def find_device_IDs(path, element):
""" Find the element (type: str) within the file (file path is
provide as arg). Then find the SQL guid from the line at hand.
(Each line has a SQL guid)
Return a dict of {element: [<list of SQL guids>]}
"""
lines = set()
for line in file_obj:
if element in line:
#find the sql-guid from the line-str & append
lines.add(find_sql_guid(line))
return lines
def find_num_occurences(file_obj, key, search_val, unique_values):
""" Find and append SQL guids that are in a line that contains a string
that's in search_val into 'exception' and 'added'
Return a dict of {'exception':set(<set of SQL guids>),
'added': set(<set of SQL guids>)}
"""
lines = {'exception':set(), 'added': set()}
for line in file_obj:
for value in unique_values:
if value in line:
if search_val[0] in line:
lines['exception'].add(value)
elif search_val[1] in line:
lines['added'].add(value)
return lines
def print_stats(num_exceptions_dict):
for key in num_exceptions_dict.keys():
print("{} added ".format(key) +
str(len(list(num_exceptions_dict[key]["added"]))))
print("{} exceptions ".format(key) +
str(len(list(num_exceptions_dict[key]["exception"]))))
if __name__ == "__main__":
path = 'log/server.log'
search_list = ('3BAA5C42', '3BAA5B84', '3BAA5C57', '3BAA5B67')
with open(path) as file_obj:
#find every occurance of device ID and find their corresponding SQL
# guids (unique ID)
unique_ids_dict = {
element: find_device_IDs(file_obj, element)
for element in search_list
}
#Now for each unique ID find if string ["Exception occurred",
# "Packet record has been added"] is found in it's SQL guid list.
search_with_in_deviceID = ("Exception occurred",
"Packet record has been added")
#reset the file pointer
file_obj.seek(0)
num_exceptions_dict = {
elem: find_num_occurences(file_obj, elem, search_with_in_deviceID,
unique_ids_dict[elem])
for elem in search_list
}
print_stats(num_exceptions_dict)
and here's a small server log for you to experiment on
Improvements
- More pythonic with some help from Reinderien.
- Opening the file only once. This reduced the time of execution from
47s
to10s
(for a file of 55K lines) - Using better data structure model. Was using
dict
s everywhere,set
s made sense.
My current approach is to
- Find the device ID (eg 3BAA5C42) and their corresponding SQL GUIDs.
- For each SQL GUID find if it resulted in an
exception
oradded
event. Store them in dict.
- Print the stats
Parallelize
Step one and two are just going through the file searching for a particular string and performing a set of instructions once the sting is found. And so each process (both within steps one and two, and step one and two as a whole) is independent of the each other / mutually exclusive
. And so running them in parallel makes more sense.
How should I get going to improve this code?
python
python
New contributor
New contributor
New contributor
asked 6 mins ago
clmno
1184
1184
New contributor
New contributor
add a comment |
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
clmno is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
clmno is a new contributor. Be nice, and check out our Code of Conduct.
clmno is a new contributor. Be nice, and check out our Code of Conduct.
clmno is a new contributor. Be nice, and check out our Code of Conduct.
clmno is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210405%2fcounting-sql-guids-from-a-server-log-and-printing-the-stats-improved%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown