Read list of dictionaries with nested dictionaries











up vote
2
down vote

favorite












I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):




{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}



My actual working code is :



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")

for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)


Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)



[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany statement but it seems it takes tuple instead of list.










share|improve this question









New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
    – Toby Speight
    Nov 14 at 14:12








  • 1




    the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
    – Deareim
    Nov 14 at 14:21










  • Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
    – juvian
    Nov 14 at 20:21










  • I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
    – Deareim
    Nov 14 at 20:43















up vote
2
down vote

favorite












I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):




{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}



My actual working code is :



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")

for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)


Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)



[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany statement but it seems it takes tuple instead of list.










share|improve this question









New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
    – Toby Speight
    Nov 14 at 14:12








  • 1




    the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
    – Deareim
    Nov 14 at 14:21










  • Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
    – juvian
    Nov 14 at 20:21










  • I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
    – Deareim
    Nov 14 at 20:43













up vote
2
down vote

favorite









up vote
2
down vote

favorite











I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):




{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}



My actual working code is :



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")

for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)


Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)



[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany statement but it seems it takes tuple instead of list.










share|improve this question









New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):




{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}



My actual working code is :



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")

for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)


Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)



[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany statement but it seems it takes tuple instead of list.







python performance python-3.x






share|improve this question









New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Nov 14 at 16:48









Mathias Ettinger

22.7k33077




22.7k33077






New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Nov 14 at 14:01









Deareim

133




133




New contributor




Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Deareim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
    – Toby Speight
    Nov 14 at 14:12








  • 1




    the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
    – Deareim
    Nov 14 at 14:21










  • Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
    – juvian
    Nov 14 at 20:21










  • I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
    – Deareim
    Nov 14 at 20:43














  • 1




    I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
    – Toby Speight
    Nov 14 at 14:12








  • 1




    the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
    – Deareim
    Nov 14 at 14:21










  • Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
    – juvian
    Nov 14 at 20:21










  • I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
    – Deareim
    Nov 14 at 20:43








1




1




I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12






I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12






1




1




the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21




the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21












Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21




Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21












I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43




I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43










1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










Currently you iterate over each dictionary twice. You can do it in one pass:



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)


Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.



You could even make this a generator and slightly more general:



def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value

for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)


When using the given example, this is the result:



>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


When putting your code into a function, this generator is almost twice as fast with the given v:



In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.






share|improve this answer























  • thank you for idea. I ll try to change.
    – Deareim
    yesterday










  • So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
    – Deareim
    yesterday










  • @Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
    – Graipher
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Deareim is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207659%2fread-list-of-dictionaries-with-nested-dictionaries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










Currently you iterate over each dictionary twice. You can do it in one pass:



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)


Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.



You could even make this a generator and slightly more general:



def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value

for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)


When using the given example, this is the result:



>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


When putting your code into a function, this generator is almost twice as fast with the given v:



In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.






share|improve this answer























  • thank you for idea. I ll try to change.
    – Deareim
    yesterday










  • So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
    – Deareim
    yesterday










  • @Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
    – Graipher
    yesterday















up vote
0
down vote



accepted










Currently you iterate over each dictionary twice. You can do it in one pass:



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)


Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.



You could even make this a generator and slightly more general:



def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value

for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)


When using the given example, this is the result:



>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


When putting your code into a function, this generator is almost twice as fast with the given v:



In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.






share|improve this answer























  • thank you for idea. I ll try to change.
    – Deareim
    yesterday










  • So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
    – Deareim
    yesterday










  • @Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
    – Graipher
    yesterday













up vote
0
down vote



accepted







up vote
0
down vote



accepted






Currently you iterate over each dictionary twice. You can do it in one pass:



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)


Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.



You could even make this a generator and slightly more general:



def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value

for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)


When using the given example, this is the result:



>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


When putting your code into a function, this generator is almost twice as fast with the given v:



In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.






share|improve this answer














Currently you iterate over each dictionary twice. You can do it in one pass:



for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)


Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.



You could even make this a generator and slightly more general:



def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value

for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)


When using the given example, this is the result:



>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']


When putting your code into a function, this generator is almost twice as fast with the given v:



In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.







share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered 2 days ago









Graipher

22k53183




22k53183












  • thank you for idea. I ll try to change.
    – Deareim
    yesterday










  • So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
    – Deareim
    yesterday










  • @Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
    – Graipher
    yesterday


















  • thank you for idea. I ll try to change.
    – Deareim
    yesterday










  • So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
    – Deareim
    yesterday










  • @Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
    – Graipher
    yesterday
















thank you for idea. I ll try to change.
– Deareim
yesterday




thank you for idea. I ll try to change.
– Deareim
yesterday












So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday




So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday












@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday




@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday










Deareim is a new contributor. Be nice, and check out our Code of Conduct.










 

draft saved


draft discarded


















Deareim is a new contributor. Be nice, and check out our Code of Conduct.













Deareim is a new contributor. Be nice, and check out our Code of Conduct.












Deareim is a new contributor. Be nice, and check out our Code of Conduct.















 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207659%2fread-list-of-dictionaries-with-nested-dictionaries%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ellipse (mathématiques)

Quarter-circle Tiles

Mont Emei