Comparing columns from two CSV files - follow-up











up vote
2
down vote

favorite












This is the second part of a question, you can find the first part here:



Comparing columns from two CSV files



I've made some changes to the script and here is what it looks like now:



import csv, sys

def get_column(columns, name):
count = 0
for column in columns:
if column != name:
count += 1
else:
return(count)

def set_up_file(file, variable):
columns = next(file)
siren_pos = get_column(columns, 'SIREN')
nic_pos = get_column(columns, 'NIC')
variable_pos = get_column(columns, variable)
return(siren_pos, nic_pos, variable_pos)

def test_variable(variable):
with open('source.csv', 'r') as source:
source_r = csv.reader(source, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(source_r, variable)
line_s = next(source_r)
with open('tested.csv', 'r') as tested:
tested_r = csv.reader(tested, delimiter=';')
sir_t, nic_t, comp_t = set_up_file(tested_r, variable)
size = sum(1 for line in tested_r)
tested.seek(0, 0)
line_t = next(tested_r)
line_t = next(tested_r)
correct = 0
try:
while True:
if(line_s[sir_s] == line_t[sir_t]
and line_s[nic_s] == line_t[nic_t]):
if(line_s[comp_s] == line_t[comp_t]):
correct += 1
line_t = next(tested_r)
line_s = next(source_r)
elif(int(line_s[sir_s]) > int(line_t[sir_t])):
line_t = next(tested_r)
elif(int(line_s[sir_s]) < int(line_t[sir_t])):
line_s = next(source_r)
else:
if(int(line_s[nic_s]) > int(line_t[nic_t])):
line_t = next(tested_r)
else:
line_s = next(source_r)
except StopIteration:
return(correct / size * 100)

def main():
with open('tested.csv', 'r') as file:
file_r = csv.reader(file, delimiter=';')
columns = next(file_r)
found = test_variable('SIREN')
for column in columns:
print(column, '%.2f' % (test_variable(column) / found * 100))

if(__name__ == "__main__"):
main()


There are no real performance issue in the new version but I feel like it could still be improved greatly.



Also, would it be possible to reduce the size of the test_variable function? I thought about cutting it right before the try statement but I'll end up passing about 7 parameters which is not really a clean solution in my opinion.










share|improve this question




















  • 1




    You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
    – Reinderien
    2 days ago















up vote
2
down vote

favorite












This is the second part of a question, you can find the first part here:



Comparing columns from two CSV files



I've made some changes to the script and here is what it looks like now:



import csv, sys

def get_column(columns, name):
count = 0
for column in columns:
if column != name:
count += 1
else:
return(count)

def set_up_file(file, variable):
columns = next(file)
siren_pos = get_column(columns, 'SIREN')
nic_pos = get_column(columns, 'NIC')
variable_pos = get_column(columns, variable)
return(siren_pos, nic_pos, variable_pos)

def test_variable(variable):
with open('source.csv', 'r') as source:
source_r = csv.reader(source, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(source_r, variable)
line_s = next(source_r)
with open('tested.csv', 'r') as tested:
tested_r = csv.reader(tested, delimiter=';')
sir_t, nic_t, comp_t = set_up_file(tested_r, variable)
size = sum(1 for line in tested_r)
tested.seek(0, 0)
line_t = next(tested_r)
line_t = next(tested_r)
correct = 0
try:
while True:
if(line_s[sir_s] == line_t[sir_t]
and line_s[nic_s] == line_t[nic_t]):
if(line_s[comp_s] == line_t[comp_t]):
correct += 1
line_t = next(tested_r)
line_s = next(source_r)
elif(int(line_s[sir_s]) > int(line_t[sir_t])):
line_t = next(tested_r)
elif(int(line_s[sir_s]) < int(line_t[sir_t])):
line_s = next(source_r)
else:
if(int(line_s[nic_s]) > int(line_t[nic_t])):
line_t = next(tested_r)
else:
line_s = next(source_r)
except StopIteration:
return(correct / size * 100)

def main():
with open('tested.csv', 'r') as file:
file_r = csv.reader(file, delimiter=';')
columns = next(file_r)
found = test_variable('SIREN')
for column in columns:
print(column, '%.2f' % (test_variable(column) / found * 100))

if(__name__ == "__main__"):
main()


There are no real performance issue in the new version but I feel like it could still be improved greatly.



Also, would it be possible to reduce the size of the test_variable function? I thought about cutting it right before the try statement but I'll end up passing about 7 parameters which is not really a clean solution in my opinion.










share|improve this question




















  • 1




    You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
    – Reinderien
    2 days ago













up vote
2
down vote

favorite









up vote
2
down vote

favorite











This is the second part of a question, you can find the first part here:



Comparing columns from two CSV files



I've made some changes to the script and here is what it looks like now:



import csv, sys

def get_column(columns, name):
count = 0
for column in columns:
if column != name:
count += 1
else:
return(count)

def set_up_file(file, variable):
columns = next(file)
siren_pos = get_column(columns, 'SIREN')
nic_pos = get_column(columns, 'NIC')
variable_pos = get_column(columns, variable)
return(siren_pos, nic_pos, variable_pos)

def test_variable(variable):
with open('source.csv', 'r') as source:
source_r = csv.reader(source, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(source_r, variable)
line_s = next(source_r)
with open('tested.csv', 'r') as tested:
tested_r = csv.reader(tested, delimiter=';')
sir_t, nic_t, comp_t = set_up_file(tested_r, variable)
size = sum(1 for line in tested_r)
tested.seek(0, 0)
line_t = next(tested_r)
line_t = next(tested_r)
correct = 0
try:
while True:
if(line_s[sir_s] == line_t[sir_t]
and line_s[nic_s] == line_t[nic_t]):
if(line_s[comp_s] == line_t[comp_t]):
correct += 1
line_t = next(tested_r)
line_s = next(source_r)
elif(int(line_s[sir_s]) > int(line_t[sir_t])):
line_t = next(tested_r)
elif(int(line_s[sir_s]) < int(line_t[sir_t])):
line_s = next(source_r)
else:
if(int(line_s[nic_s]) > int(line_t[nic_t])):
line_t = next(tested_r)
else:
line_s = next(source_r)
except StopIteration:
return(correct / size * 100)

def main():
with open('tested.csv', 'r') as file:
file_r = csv.reader(file, delimiter=';')
columns = next(file_r)
found = test_variable('SIREN')
for column in columns:
print(column, '%.2f' % (test_variable(column) / found * 100))

if(__name__ == "__main__"):
main()


There are no real performance issue in the new version but I feel like it could still be improved greatly.



Also, would it be possible to reduce the size of the test_variable function? I thought about cutting it right before the try statement but I'll end up passing about 7 parameters which is not really a clean solution in my opinion.










share|improve this question















This is the second part of a question, you can find the first part here:



Comparing columns from two CSV files



I've made some changes to the script and here is what it looks like now:



import csv, sys

def get_column(columns, name):
count = 0
for column in columns:
if column != name:
count += 1
else:
return(count)

def set_up_file(file, variable):
columns = next(file)
siren_pos = get_column(columns, 'SIREN')
nic_pos = get_column(columns, 'NIC')
variable_pos = get_column(columns, variable)
return(siren_pos, nic_pos, variable_pos)

def test_variable(variable):
with open('source.csv', 'r') as source:
source_r = csv.reader(source, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(source_r, variable)
line_s = next(source_r)
with open('tested.csv', 'r') as tested:
tested_r = csv.reader(tested, delimiter=';')
sir_t, nic_t, comp_t = set_up_file(tested_r, variable)
size = sum(1 for line in tested_r)
tested.seek(0, 0)
line_t = next(tested_r)
line_t = next(tested_r)
correct = 0
try:
while True:
if(line_s[sir_s] == line_t[sir_t]
and line_s[nic_s] == line_t[nic_t]):
if(line_s[comp_s] == line_t[comp_t]):
correct += 1
line_t = next(tested_r)
line_s = next(source_r)
elif(int(line_s[sir_s]) > int(line_t[sir_t])):
line_t = next(tested_r)
elif(int(line_s[sir_s]) < int(line_t[sir_t])):
line_s = next(source_r)
else:
if(int(line_s[nic_s]) > int(line_t[nic_t])):
line_t = next(tested_r)
else:
line_s = next(source_r)
except StopIteration:
return(correct / size * 100)

def main():
with open('tested.csv', 'r') as file:
file_r = csv.reader(file, delimiter=';')
columns = next(file_r)
found = test_variable('SIREN')
for column in columns:
print(column, '%.2f' % (test_variable(column) / found * 100))

if(__name__ == "__main__"):
main()


There are no real performance issue in the new version but I feel like it could still be improved greatly.



Also, would it be possible to reduce the size of the test_variable function? I thought about cutting it right before the try statement but I'll end up passing about 7 parameters which is not really a clean solution in my opinion.







python csv






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 2 days ago

























asked 2 days ago









Comte_Zero

257




257








  • 1




    You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
    – Reinderien
    2 days ago














  • 1




    You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
    – Reinderien
    2 days ago








1




1




You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago




You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago










2 Answers
2






active

oldest

votes

















up vote
2
down vote



accepted











  • There is some duplication of code because you are handling two files.

  • When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

  • The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.


To address the above, I propose to organize the code like this:



def parse_file(file, variable):
reader = csv.reader(file, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(reader, variable)
for line in reader:
key = int(line[sir_s]), int(line[nic_s])
yield key, line[comp_s]

def test_variable(variable):
with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:
source_r = parse_file(source, variable)
tested_r = parse_file(tested, variable)

correct = 0
try:
line_s = next(source_r)
line_t = next(tested_r)
while True:
key_s, comp_s = line_s
key_t, comp_t = line_t
if line_s == line_t:
correct += 1
if key_s >= key_t:
line_t = next(tested_r)
if key_s <= key_t:
line_s = next(source_r)

except StopIteration:
return correct


Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.






share|improve this answer























  • I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
    – Comte_Zero
    yesterday










  • @Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
    – Janne Karila
    yesterday












  • The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
    – Comte_Zero
    yesterday




















up vote
1
down vote













On this line:



return(siren_pos, nic_pos, variable_pos)


you do not need to surround the return values in parens. They will be returned as a tuple anyway.



It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.



You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().






share|improve this answer





















  • I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
    – Comte_Zero
    yesterday













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208220%2fcomparing-columns-from-two-csv-files-follow-up%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted











  • There is some duplication of code because you are handling two files.

  • When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

  • The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.


To address the above, I propose to organize the code like this:



def parse_file(file, variable):
reader = csv.reader(file, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(reader, variable)
for line in reader:
key = int(line[sir_s]), int(line[nic_s])
yield key, line[comp_s]

def test_variable(variable):
with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:
source_r = parse_file(source, variable)
tested_r = parse_file(tested, variable)

correct = 0
try:
line_s = next(source_r)
line_t = next(tested_r)
while True:
key_s, comp_s = line_s
key_t, comp_t = line_t
if line_s == line_t:
correct += 1
if key_s >= key_t:
line_t = next(tested_r)
if key_s <= key_t:
line_s = next(source_r)

except StopIteration:
return correct


Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.






share|improve this answer























  • I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
    – Comte_Zero
    yesterday










  • @Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
    – Janne Karila
    yesterday












  • The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
    – Comte_Zero
    yesterday

















up vote
2
down vote



accepted











  • There is some duplication of code because you are handling two files.

  • When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

  • The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.


To address the above, I propose to organize the code like this:



def parse_file(file, variable):
reader = csv.reader(file, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(reader, variable)
for line in reader:
key = int(line[sir_s]), int(line[nic_s])
yield key, line[comp_s]

def test_variable(variable):
with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:
source_r = parse_file(source, variable)
tested_r = parse_file(tested, variable)

correct = 0
try:
line_s = next(source_r)
line_t = next(tested_r)
while True:
key_s, comp_s = line_s
key_t, comp_t = line_t
if line_s == line_t:
correct += 1
if key_s >= key_t:
line_t = next(tested_r)
if key_s <= key_t:
line_s = next(source_r)

except StopIteration:
return correct


Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.






share|improve this answer























  • I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
    – Comte_Zero
    yesterday










  • @Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
    – Janne Karila
    yesterday












  • The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
    – Comte_Zero
    yesterday















up vote
2
down vote



accepted







up vote
2
down vote



accepted







  • There is some duplication of code because you are handling two files.

  • When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

  • The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.


To address the above, I propose to organize the code like this:



def parse_file(file, variable):
reader = csv.reader(file, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(reader, variable)
for line in reader:
key = int(line[sir_s]), int(line[nic_s])
yield key, line[comp_s]

def test_variable(variable):
with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:
source_r = parse_file(source, variable)
tested_r = parse_file(tested, variable)

correct = 0
try:
line_s = next(source_r)
line_t = next(tested_r)
while True:
key_s, comp_s = line_s
key_t, comp_t = line_t
if line_s == line_t:
correct += 1
if key_s >= key_t:
line_t = next(tested_r)
if key_s <= key_t:
line_s = next(source_r)

except StopIteration:
return correct


Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.






share|improve this answer















  • There is some duplication of code because you are handling two files.

  • When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

  • The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.


To address the above, I propose to organize the code like this:



def parse_file(file, variable):
reader = csv.reader(file, delimiter=';')
sir_s, nic_s, comp_s = set_up_file(reader, variable)
for line in reader:
key = int(line[sir_s]), int(line[nic_s])
yield key, line[comp_s]

def test_variable(variable):
with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:
source_r = parse_file(source, variable)
tested_r = parse_file(tested, variable)

correct = 0
try:
line_s = next(source_r)
line_t = next(tested_r)
while True:
key_s, comp_s = line_s
key_t, comp_t = line_t
if line_s == line_t:
correct += 1
if key_s >= key_t:
line_t = next(tested_r)
if key_s <= key_t:
line_s = next(source_r)

except StopIteration:
return correct


Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.







share|improve this answer














share|improve this answer



share|improve this answer








edited 2 days ago

























answered 2 days ago









Janne Karila

9,6351430




9,6351430












  • I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
    – Comte_Zero
    yesterday










  • @Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
    – Janne Karila
    yesterday












  • The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
    – Comte_Zero
    yesterday




















  • I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
    – Comte_Zero
    yesterday










  • @Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
    – Janne Karila
    yesterday












  • The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
    – Comte_Zero
    yesterday


















I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday




I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday












@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday






@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday














The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday






The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday














up vote
1
down vote













On this line:



return(siren_pos, nic_pos, variable_pos)


you do not need to surround the return values in parens. They will be returned as a tuple anyway.



It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.



You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().






share|improve this answer





















  • I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
    – Comte_Zero
    yesterday

















up vote
1
down vote













On this line:



return(siren_pos, nic_pos, variable_pos)


you do not need to surround the return values in parens. They will be returned as a tuple anyway.



It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.



You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().






share|improve this answer





















  • I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
    – Comte_Zero
    yesterday















up vote
1
down vote










up vote
1
down vote









On this line:



return(siren_pos, nic_pos, variable_pos)


you do not need to surround the return values in parens. They will be returned as a tuple anyway.



It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.



You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().






share|improve this answer












On this line:



return(siren_pos, nic_pos, variable_pos)


you do not need to surround the return values in parens. They will be returned as a tuple anyway.



It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.



You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().







share|improve this answer












share|improve this answer



share|improve this answer










answered 2 days ago









Reinderien

1,372516




1,372516












  • I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
    – Comte_Zero
    yesterday




















  • I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
    – Comte_Zero
    yesterday


















I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday






I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday




















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208220%2fcomparing-columns-from-two-csv-files-follow-up%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Quarter-circle Tiles

build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

Mont Emei