Read list of dictionaries with nested dictionaries

up vote
2
down vote

favorite

I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):

{

 "id": 1,

 "name": "Explosives",

 "category_id": 1,

 "average_price": 294,

 "is_rare": 0,

 "max_buy_price": 755,

 "max_sell_price": 1774,

 "min_buy_price": 99,

 "min_sell_price": 18,

 "buy_price_lower_average": 176,

 "sell_price_upper_average": 924,

 "is_non_marketable": 0,

 "ed_id": 128049204,

 "category": {

   "id": 1,

   "name": "Chemicals"

 } 

}

My actual working code is :

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            dict1 = g

            my_value1 = dict1.get("id")

            my_value2 = dict1.get("name")



    for s, i in v.items():

        if not isinstance(i, dict):

            commodities_reference.append(i)

    commodities_reference.append(my_value1)

    commodities_reference.append(my_value2)

Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany statement but it seems it takes tuple instead of list.

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

1

I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12

1

the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21

Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21

I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43

add a comment |

up vote
2
down vote

favorite

I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):

{

 "id": 1,

 "name": "Explosives",

 "category_id": 1,

 "average_price": 294,

 "is_rare": 0,

 "max_buy_price": 755,

 "max_sell_price": 1774,

 "min_buy_price": 99,

 "min_sell_price": 18,

 "buy_price_lower_average": 176,

 "sell_price_upper_average": 924,

 "is_non_marketable": 0,

 "ed_id": 128049204,

 "category": {

   "id": 1,

   "name": "Chemicals"

 } 

}

My actual working code is :

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            dict1 = g

            my_value1 = dict1.get("id")

            my_value2 = dict1.get("name")



    for s, i in v.items():

        if not isinstance(i, dict):

            commodities_reference.append(i)

    commodities_reference.append(my_value1)

    commodities_reference.append(my_value2)

Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

1

I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12

1

the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21

Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21

I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43

add a comment |

up vote
2
down vote

favorite

I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):

{

 "id": 1,

 "name": "Explosives",

 "category_id": 1,

 "average_price": 294,

 "is_rare": 0,

 "max_buy_price": 755,

 "max_sell_price": 1774,

 "min_buy_price": 99,

 "min_sell_price": 18,

 "buy_price_lower_average": 176,

 "sell_price_upper_average": 924,

 "is_non_marketable": 0,

 "ed_id": 128049204,

 "category": {

   "id": 1,

   "name": "Chemicals"

 } 

}

My actual working code is :

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            dict1 = g

            my_value1 = dict1.get("id")

            my_value2 = dict1.get("name")



    for s, i in v.items():

        if not isinstance(i, dict):

            commodities_reference.append(i)

    commodities_reference.append(my_value1)

    commodities_reference.append(my_value2)

Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):

{

 "id": 1,

 "name": "Explosives",

 "category_id": 1,

 "average_price": 294,

 "is_rare": 0,

 "max_buy_price": 755,

 "max_sell_price": 1774,

 "min_buy_price": 99,

 "min_sell_price": 18,

 "buy_price_lower_average": 176,

 "sell_price_upper_average": 924,

 "is_non_marketable": 0,

 "ed_id": 128049204,

 "category": {

   "id": 1,

   "name": "Chemicals"

 } 

}

My actual working code is :

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            dict1 = g

            my_value1 = dict1.get("id")

            my_value2 = dict1.get("name")



    for s, i in v.items():

        if not isinstance(i, dict):

            commodities_reference.append(i)

    commodities_reference.append(my_value1)

    commodities_reference.append(my_value2)

Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

python performance python-3.x

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

edited Nov 14 at 16:48

Mathias Ettinger

22.7k33077

asked Nov 14 at 14:01

Deareim

133

New contributor

asked Nov 14 at 14:01

Deareim

133

asked Nov 14 at 14:01

Deareim

133

New contributor

Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12

1

the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21

Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21

I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43

add a comment |

1

I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12

1

the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21

Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21

I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43

I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12

the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21

Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21

I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

Currently you iterate over each dictionary twice. You can do it in one pass:

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            commodities_reference.append(g["id"])

            commodities_reference.append(g["name"])

        else:

            commodities_reference.append(g)

Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.

You could even make this a generator and slightly more general:

def get_values_recursive(x):

    for value in x.values():

        if isinstance(value, dict):

            yield from get_values_recursive(value)

        else:

            yield value



for v in d:

    commodities_reference = list(get_values_recursive(v))

    # do something with it...

    print(commodities_reference)

When using the given example, this is the result:

>>> list(get_values_recursive(v))

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

When putting your code into a function, this generator is almost twice as fast with the given v:

In [13]: %timeit op(v)

5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [14]: %timeit list(get_values_recursive(v))

3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.

edited yesterday

answered 2 days ago

Graipher

22k53183

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Deareim is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207659%2fread-list-of-dictionaries-with-nested-dictionaries%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

Currently you iterate over each dictionary twice. You can do it in one pass:

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            commodities_reference.append(g["id"])

            commodities_reference.append(g["name"])

        else:

            commodities_reference.append(g)

You could even make this a generator and slightly more general:

def get_values_recursive(x):

    for value in x.values():

        if isinstance(value, dict):

            yield from get_values_recursive(value)

        else:

            yield value



for v in d:

    commodities_reference = list(get_values_recursive(v))

    # do something with it...

    print(commodities_reference)

When using the given example, this is the result:

>>> list(get_values_recursive(v))

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

When putting your code into a function, this generator is almost twice as fast with the given v:

In [13]: %timeit op(v)

5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [14]: %timeit list(get_values_recursive(v))

3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.

edited yesterday

answered 2 days ago

Graipher

22k53183

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

add a comment |

up vote
0
down vote

accepted

Currently you iterate over each dictionary twice. You can do it in one pass:

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            commodities_reference.append(g["id"])

            commodities_reference.append(g["name"])

        else:

            commodities_reference.append(g)

You could even make this a generator and slightly more general:

def get_values_recursive(x):

    for value in x.values():

        if isinstance(value, dict):

            yield from get_values_recursive(value)

        else:

            yield value



for v in d:

    commodities_reference = list(get_values_recursive(v))

    # do something with it...

    print(commodities_reference)

When using the given example, this is the result:

>>> list(get_values_recursive(v))

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

When putting your code into a function, this generator is almost twice as fast with the given v:

In [13]: %timeit op(v)

5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [14]: %timeit list(get_values_recursive(v))

3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.

edited yesterday

answered 2 days ago

Graipher

22k53183

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

add a comment |

up vote
0
down vote

accepted

Currently you iterate over each dictionary twice. You can do it in one pass:

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            commodities_reference.append(g["id"])

            commodities_reference.append(g["name"])

        else:

            commodities_reference.append(g)

You could even make this a generator and slightly more general:

def get_values_recursive(x):

    for value in x.values():

        if isinstance(value, dict):

            yield from get_values_recursive(value)

        else:

            yield value



for v in d:

    commodities_reference = list(get_values_recursive(v))

    # do something with it...

    print(commodities_reference)

When using the given example, this is the result:

>>> list(get_values_recursive(v))

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

When putting your code into a function, this generator is almost twice as fast with the given v:

In [13]: %timeit op(v)

5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [14]: %timeit list(get_values_recursive(v))

3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.

edited yesterday

answered 2 days ago

Graipher

22k53183

Currently you iterate over each dictionary twice. You can do it in one pass:

for v in d:

    commodities_reference = 

    for k, g in v.items():

        if isinstance(g, dict):

            commodities_reference.append(g["id"])

            commodities_reference.append(g["name"])

        else:

            commodities_reference.append(g)

You could even make this a generator and slightly more general:

def get_values_recursive(x):

    for value in x.values():

        if isinstance(value, dict):

            yield from get_values_recursive(value)

        else:

            yield value



for v in d:

    commodities_reference = list(get_values_recursive(v))

    # do something with it...

    print(commodities_reference)

When using the given example, this is the result:

>>> list(get_values_recursive(v))

[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']

When putting your code into a function, this generator is almost twice as fast with the given v:

In [13]: %timeit op(v)

5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)



In [14]: %timeit list(get_values_recursive(v))

3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.

edited yesterday

answered 2 days ago

Graipher

22k53183

edited yesterday

answered 2 days ago

Graipher

22k53183

answered 2 days ago

Graipher

22k53183

answered 2 days ago

Graipher

22k53183

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

add a comment |

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

thank you for idea. I ll try to change.
– Deareim
yesterday

So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday

@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday

add a comment |

Deareim is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Deareim is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

UIL1DOO4iZ0e0 d

搜尋此網誌

Krdytkyu