Comparing columns from two CSV files

Comparing columns from two CSV files - follow-up

up vote
2
down vote

favorite

This is the second part of a question, you can find the first part here:

I've made some changes to the script and here is what it looks like now:

import csv, sys



def get_column(columns, name):

    count = 0

    for column in columns:

        if column != name:

            count += 1

        else:

            return(count)



def set_up_file(file, variable):

    columns = next(file)

    siren_pos = get_column(columns, 'SIREN')

    nic_pos = get_column(columns, 'NIC')

    variable_pos = get_column(columns, variable)

    return(siren_pos, nic_pos, variable_pos)



def test_variable(variable):

    with open('source.csv', 'r') as source:

        source_r = csv.reader(source, delimiter=';')

        sir_s, nic_s, comp_s = set_up_file(source_r, variable)

        line_s = next(source_r)

        with open('tested.csv', 'r') as tested:

            tested_r = csv.reader(tested, delimiter=';')

            sir_t, nic_t, comp_t = set_up_file(tested_r, variable)

            size = sum(1 for line in tested_r)

            tested.seek(0, 0)

            line_t = next(tested_r)

            line_t = next(tested_r)

            correct = 0

            try:

                while True:

                    if(line_s[sir_s] == line_t[sir_t] 

                    and line_s[nic_s] == line_t[nic_t]):

                        if(line_s[comp_s] == line_t[comp_t]):

                            correct += 1

                        line_t = next(tested_r)

                        line_s = next(source_r)

                    elif(int(line_s[sir_s]) > int(line_t[sir_t])):

                        line_t = next(tested_r)

                    elif(int(line_s[sir_s]) < int(line_t[sir_t])):

                        line_s = next(source_r)

                    else:

                        if(int(line_s[nic_s]) > int(line_t[nic_t])):

                            line_t = next(tested_r)

                        else:

                            line_s = next(source_r)

            except StopIteration:

                return(correct / size * 100)



def main():

    with open('tested.csv', 'r') as file:

        file_r = csv.reader(file, delimiter=';')

        columns = next(file_r)

        found = test_variable('SIREN')

        for column in columns:

            print(column, '%.2f' % (test_variable(column) / found * 100))



if(__name__ == "__main__"):

    main()

There are no real performance issue in the new version but I feel like it could still be improved greatly.

Also, would it be possible to reduce the size of the test_variable function? I thought about cutting it right before the try statement but I'll end up passing about 7 parameters which is not really a clean solution in my opinion.

edited 2 days ago

asked 2 days ago

Comte_Zero

257

1

You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago

add a comment |

up vote
2
down vote

favorite

This is the second part of a question, you can find the first part here:

Comparing columns from two CSV files

I've made some changes to the script and here is what it looks like now:

import csv, sys



def get_column(columns, name):

    count = 0

    for column in columns:

        if column != name:

            count += 1

        else:

            return(count)



def set_up_file(file, variable):

    columns = next(file)

    siren_pos = get_column(columns, 'SIREN')

    nic_pos = get_column(columns, 'NIC')

    variable_pos = get_column(columns, variable)

    return(siren_pos, nic_pos, variable_pos)



def test_variable(variable):

    with open('source.csv', 'r') as source:

        source_r = csv.reader(source, delimiter=';')

        sir_s, nic_s, comp_s = set_up_file(source_r, variable)

        line_s = next(source_r)

        with open('tested.csv', 'r') as tested:

            tested_r = csv.reader(tested, delimiter=';')

            sir_t, nic_t, comp_t = set_up_file(tested_r, variable)

            size = sum(1 for line in tested_r)

            tested.seek(0, 0)

            line_t = next(tested_r)

            line_t = next(tested_r)

            correct = 0

            try:

                while True:

                    if(line_s[sir_s] == line_t[sir_t] 

                    and line_s[nic_s] == line_t[nic_t]):

                        if(line_s[comp_s] == line_t[comp_t]):

                            correct += 1

                        line_t = next(tested_r)

                        line_s = next(source_r)

                    elif(int(line_s[sir_s]) > int(line_t[sir_t])):

                        line_t = next(tested_r)

                    elif(int(line_s[sir_s]) < int(line_t[sir_t])):

                        line_s = next(source_r)

                    else:

                        if(int(line_s[nic_s]) > int(line_t[nic_t])):

                            line_t = next(tested_r)

                        else:

                            line_s = next(source_r)

            except StopIteration:

                return(correct / size * 100)



def main():

    with open('tested.csv', 'r') as file:

        file_r = csv.reader(file, delimiter=';')

        columns = next(file_r)

        found = test_variable('SIREN')

        for column in columns:

            print(column, '%.2f' % (test_variable(column) / found * 100))



if(__name__ == "__main__"):

    main()

There are no real performance issue in the new version but I feel like it could still be improved greatly.

edited 2 days ago

asked 2 days ago

Comte_Zero

257

1

You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago

add a comment |

up vote
2
down vote

favorite

This is the second part of a question, you can find the first part here:

Comparing columns from two CSV files

I've made some changes to the script and here is what it looks like now:

import csv, sys



def get_column(columns, name):

    count = 0

    for column in columns:

        if column != name:

            count += 1

        else:

            return(count)



def set_up_file(file, variable):

    columns = next(file)

    siren_pos = get_column(columns, 'SIREN')

    nic_pos = get_column(columns, 'NIC')

    variable_pos = get_column(columns, variable)

    return(siren_pos, nic_pos, variable_pos)



def test_variable(variable):

    with open('source.csv', 'r') as source:

        source_r = csv.reader(source, delimiter=';')

        sir_s, nic_s, comp_s = set_up_file(source_r, variable)

        line_s = next(source_r)

        with open('tested.csv', 'r') as tested:

            tested_r = csv.reader(tested, delimiter=';')

            sir_t, nic_t, comp_t = set_up_file(tested_r, variable)

            size = sum(1 for line in tested_r)

            tested.seek(0, 0)

            line_t = next(tested_r)

            line_t = next(tested_r)

            correct = 0

            try:

                while True:

                    if(line_s[sir_s] == line_t[sir_t] 

                    and line_s[nic_s] == line_t[nic_t]):

                        if(line_s[comp_s] == line_t[comp_t]):

                            correct += 1

                        line_t = next(tested_r)

                        line_s = next(source_r)

                    elif(int(line_s[sir_s]) > int(line_t[sir_t])):

                        line_t = next(tested_r)

                    elif(int(line_s[sir_s]) < int(line_t[sir_t])):

                        line_s = next(source_r)

                    else:

                        if(int(line_s[nic_s]) > int(line_t[nic_t])):

                            line_t = next(tested_r)

                        else:

                            line_s = next(source_r)

            except StopIteration:

                return(correct / size * 100)



def main():

    with open('tested.csv', 'r') as file:

        file_r = csv.reader(file, delimiter=';')

        columns = next(file_r)

        found = test_variable('SIREN')

        for column in columns:

            print(column, '%.2f' % (test_variable(column) / found * 100))



if(__name__ == "__main__"):

    main()

There are no real performance issue in the new version but I feel like it could still be improved greatly.

edited 2 days ago

asked 2 days ago

Comte_Zero

257

This is the second part of a question, you can find the first part here:

Comparing columns from two CSV files

I've made some changes to the script and here is what it looks like now:

import csv, sys



def get_column(columns, name):

    count = 0

    for column in columns:

        if column != name:

            count += 1

        else:

            return(count)



def set_up_file(file, variable):

    columns = next(file)

    siren_pos = get_column(columns, 'SIREN')

    nic_pos = get_column(columns, 'NIC')

    variable_pos = get_column(columns, variable)

    return(siren_pos, nic_pos, variable_pos)



def test_variable(variable):

    with open('source.csv', 'r') as source:

        source_r = csv.reader(source, delimiter=';')

        sir_s, nic_s, comp_s = set_up_file(source_r, variable)

        line_s = next(source_r)

        with open('tested.csv', 'r') as tested:

            tested_r = csv.reader(tested, delimiter=';')

            sir_t, nic_t, comp_t = set_up_file(tested_r, variable)

            size = sum(1 for line in tested_r)

            tested.seek(0, 0)

            line_t = next(tested_r)

            line_t = next(tested_r)

            correct = 0

            try:

                while True:

                    if(line_s[sir_s] == line_t[sir_t] 

                    and line_s[nic_s] == line_t[nic_t]):

                        if(line_s[comp_s] == line_t[comp_t]):

                            correct += 1

                        line_t = next(tested_r)

                        line_s = next(source_r)

                    elif(int(line_s[sir_s]) > int(line_t[sir_t])):

                        line_t = next(tested_r)

                    elif(int(line_s[sir_s]) < int(line_t[sir_t])):

                        line_s = next(source_r)

                    else:

                        if(int(line_s[nic_s]) > int(line_t[nic_t])):

                            line_t = next(tested_r)

                        else:

                            line_s = next(source_r)

            except StopIteration:

                return(correct / size * 100)



def main():

    with open('tested.csv', 'r') as file:

        file_r = csv.reader(file, delimiter=';')

        columns = next(file_r)

        found = test_variable('SIREN')

        for column in columns:

            print(column, '%.2f' % (test_variable(column) / found * 100))



if(__name__ == "__main__"):

    main()

There are no real performance issue in the new version but I feel like it could still be improved greatly.

python csv

edited 2 days ago

asked 2 days ago

Comte_Zero

257

edited 2 days ago

asked 2 days ago

Comte_Zero

257

edited 2 days ago

asked 2 days ago

Comte_Zero

257

asked 2 days ago

Comte_Zero

257

asked 2 days ago

Comte_Zero

257

1

You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago

add a comment |

1

You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago

You're missing the get_column function. This cannot be properly reviewed until you provide all of your code.
– Reinderien
2 days ago

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There is some duplication of code because you are handling two files.

When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.

To address the above, I propose to organize the code like this:

def parse_file(file, variable):

    reader = csv.reader(file, delimiter=';')

    sir_s, nic_s, comp_s = set_up_file(reader, variable)

    for line in reader:

        key = int(line[sir_s]), int(line[nic_s])

        yield key, line[comp_s]



def test_variable(variable):

    with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:

        source_r = parse_file(source, variable)

        tested_r = parse_file(tested, variable)



        correct = 0

        try:

            line_s = next(source_r)

            line_t = next(tested_r)

            while True:

                key_s, comp_s = line_s

                key_t, comp_t = line_t

                if line_s == line_t:

                    correct += 1

                if key_s >= key_t:

                    line_t = next(tested_r)

                if key_s <= key_t:

                    line_s = next(source_r)



        except StopIteration:

            return correct

Note however that I've omitted the computation of size. This could be done by incrementing a variable after reading each line, but since lines are read at multiple places, and some may be left in the end if the other file ends first, it may be best to count the lines separately like you have done.

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

add a comment |

up vote
1
down vote

On this line:

return(siren_pos, nic_pos, variable_pos)

you do not need to surround the return values in parens. They will be returned as a tuple anyway.

It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.

You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().

answered 2 days ago

Reinderien

1,372516

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208220%2fcomparing-columns-from-two-csv-files-follow-up%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

There is some duplication of code because you are handling two files.

When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.

To address the above, I propose to organize the code like this:

def parse_file(file, variable):

    reader = csv.reader(file, delimiter=';')

    sir_s, nic_s, comp_s = set_up_file(reader, variable)

    for line in reader:

        key = int(line[sir_s]), int(line[nic_s])

        yield key, line[comp_s]



def test_variable(variable):

    with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:

        source_r = parse_file(source, variable)

        tested_r = parse_file(tested, variable)



        correct = 0

        try:

            line_s = next(source_r)

            line_t = next(tested_r)

            while True:

                key_s, comp_s = line_s

                key_t, comp_t = line_t

                if line_s == line_t:

                    correct += 1

                if key_s >= key_t:

                    line_t = next(tested_r)

                if key_s <= key_t:

                    line_s = next(source_r)



        except StopIteration:

            return correct

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

add a comment |

up vote
2
down vote

accepted

There is some duplication of code because you are handling two files.

When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.

To address the above, I propose to organize the code like this:

def parse_file(file, variable):

    reader = csv.reader(file, delimiter=';')

    sir_s, nic_s, comp_s = set_up_file(reader, variable)

    for line in reader:

        key = int(line[sir_s]), int(line[nic_s])

        yield key, line[comp_s]



def test_variable(variable):

    with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:

        source_r = parse_file(source, variable)

        tested_r = parse_file(tested, variable)



        correct = 0

        try:

            line_s = next(source_r)

            line_t = next(tested_r)

            while True:

                key_s, comp_s = line_s

                key_t, comp_t = line_t

                if line_s == line_t:

                    correct += 1

                if key_s >= key_t:

                    line_t = next(tested_r)

                if key_s <= key_t:

                    line_s = next(source_r)



        except StopIteration:

            return correct

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

add a comment |

up vote
2
down vote

accepted

There is some duplication of code because you are handling two files.

When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.

To address the above, I propose to organize the code like this:

def parse_file(file, variable):

    reader = csv.reader(file, delimiter=';')

    sir_s, nic_s, comp_s = set_up_file(reader, variable)

    for line in reader:

        key = int(line[sir_s]), int(line[nic_s])

        yield key, line[comp_s]



def test_variable(variable):

    with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:

        source_r = parse_file(source, variable)

        tested_r = parse_file(tested, variable)



        correct = 0

        try:

            line_s = next(source_r)

            line_t = next(tested_r)

            while True:

                key_s, comp_s = line_s

                key_t, comp_t = line_t

                if line_s == line_t:

                    correct += 1

                if key_s >= key_t:

                    line_t = next(tested_r)

                if key_s <= key_t:

                    line_s = next(source_r)



        except StopIteration:

            return correct

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

There is some duplication of code because you are handling two files.

When you need some data converted to int it would be best to do soon as you've read it to avoid sprinkling other logic with int() calls.

The two columns SIREN and NIC combined seem to form a sorting key to the file. You could simplify the if elif... part by performing the comparisons on (SIREN, NIC) tuples.

To address the above, I propose to organize the code like this:

def parse_file(file, variable):

    reader = csv.reader(file, delimiter=';')

    sir_s, nic_s, comp_s = set_up_file(reader, variable)

    for line in reader:

        key = int(line[sir_s]), int(line[nic_s])

        yield key, line[comp_s]



def test_variable(variable):

    with open('source.csv', 'r') as source, open('tested.csv', 'r') as tested:

        source_r = parse_file(source, variable)

        tested_r = parse_file(tested, variable)



        correct = 0

        try:

            line_s = next(source_r)

            line_t = next(tested_r)

            while True:

                key_s, comp_s = line_s

                key_t, comp_t = line_t

                if line_s == line_t:

                    correct += 1

                if key_s >= key_t:

                    line_t = next(tested_r)

                if key_s <= key_t:

                    line_s = next(source_r)



        except StopIteration:

            return correct

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

edited 2 days ago

answered 2 days ago

Janne Karila

9,6351430

answered 2 days ago

Janne Karila

9,6351430

answered 2 days ago

Janne Karila

9,6351430

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

add a comment |

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

I would need some details on how key_s, comp_s = line_s works because comp_s is not re-used in the code after that yet it seems to manage the calculations
– Comte_Zero
yesterday

@Comte_Zero Well actually comp_s is a redundant variable, you could also use key_s = line_s[0]. Because line_s == line_t compares all three columns at once, a separate variable is not needed.
– Janne Karila
yesterday

The problem had to do with my non-understanding of yield, I've done some research and it is fine now, thanks for your answer it is really helpful but how about removing key_s, comp_s = line_s and using line_s[0] instead of key_s in the rest of the file, wouldn't it save some memory space / performances ?
– Comte_Zero
yesterday

add a comment |

up vote
1
down vote

On this line:

return(siren_pos, nic_pos, variable_pos)

you do not need to surround the return values in parens. They will be returned as a tuple anyway.

It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.

You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().

answered 2 days ago

Reinderien

1,372516

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

add a comment |

up vote
1
down vote

On this line:

return(siren_pos, nic_pos, variable_pos)

you do not need to surround the return values in parens. They will be returned as a tuple anyway.

It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.

You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().

answered 2 days ago

Reinderien

1,372516

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

add a comment |

up vote
1
down vote

On this line:

return(siren_pos, nic_pos, variable_pos)

you do not need to surround the return values in parens. They will be returned as a tuple anyway.

It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.

You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().

answered 2 days ago

Reinderien

1,372516

On this line:

return(siren_pos, nic_pos, variable_pos)

you do not need to surround the return values in parens. They will be returned as a tuple anyway.

It seems ill-advised to iterate through the entire file just to count the number of lines. I advise incrementing size as you go along, so that you only have to iterate once.

You have a bunch of common code that applies to both files. I recommend writing function that does the open(), the csv.reader(), and the set_up_file().

answered 2 days ago

Reinderien

1,372516

answered 2 days ago

Reinderien

1,372516

answered 2 days ago

Reinderien

1,372516

answered 2 days ago

Reinderien

1,372516

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

add a comment |

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

I'd prefer keeping the parens around the return value for clarity, size has to be counted outside of the loop because said loop depend on two iterator and is not consistant but what you said on duplicate code is true, that has been changed, thanks
– Comte_Zero
yesterday

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu