Read list of dictionaries with nested dictionaries
up vote
2
down vote
favorite
I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):
{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}
My actual working code is :
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")
for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)
Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany
statement but it seems it takes tuple instead of list.
python performance python-3.x
New contributor
add a comment |
up vote
2
down vote
favorite
I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):
{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}
My actual working code is :
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")
for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)
Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany
statement but it seems it takes tuple instead of list.
python performance python-3.x
New contributor
1
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
1
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):
{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}
My actual working code is :
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")
for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)
Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany
statement but it seems it takes tuple instead of list.
python performance python-3.x
New contributor
I have a file as content a list of dictionaries (Around 75000). For instance, this is an example of first line I got when reading the file (value for v):
{
"id": 1,
"name": "Explosives",
"category_id": 1,
"average_price": 294,
"is_rare": 0,
"max_buy_price": 755,
"max_sell_price": 1774,
"min_buy_price": 99,
"min_sell_price": 18,
"buy_price_lower_average": 176,
"sell_price_upper_average": 924,
"is_non_marketable": 0,
"ed_id": 128049204,
"category": {
"id": 1,
"name": "Chemicals"
}
}
My actual working code is :
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
dict1 = g
my_value1 = dict1.get("id")
my_value2 = dict1.get("name")
for s, i in v.items():
if not isinstance(i, dict):
commodities_reference.append(i)
commodities_reference.append(my_value1)
commodities_reference.append(my_value2)
Output wanted = All the values in same list in the same order for doing a SQL INSERT Statement afterwards (Meaning values from the nested dict must be also at the end.)
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
From a performance perspective, with SQLITE3/python 3.7, it is a catastrophe. I am looking for some advices in order to make it more efficient. I am thinking about using executemany
statement but it seems it takes tuple instead of list.
python performance python-3.x
python performance python-3.x
New contributor
New contributor
edited Nov 14 at 16:48
Mathias Ettinger
22.7k33077
22.7k33077
New contributor
asked Nov 14 at 14:01
Deareim
133
133
New contributor
New contributor
1
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
1
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43
add a comment |
1
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
1
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43
1
1
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
1
1
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
Currently you iterate over each dictionary twice. You can do it in one pass:
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)
Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.
You could even make this a generator and slightly more general:
def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value
for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)
When using the given example, this is the result:
>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
When putting your code into a function, this generator is almost twice as fast with the given v
:
In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Currently you iterate over each dictionary twice. You can do it in one pass:
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)
Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.
You could even make this a generator and slightly more general:
def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value
for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)
When using the given example, this is the result:
>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
When putting your code into a function, this generator is almost twice as fast with the given v
:
In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
add a comment |
up vote
0
down vote
accepted
Currently you iterate over each dictionary twice. You can do it in one pass:
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)
Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.
You could even make this a generator and slightly more general:
def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value
for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)
When using the given example, this is the result:
>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
When putting your code into a function, this generator is almost twice as fast with the given v
:
In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Currently you iterate over each dictionary twice. You can do it in one pass:
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)
Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.
You could even make this a generator and slightly more general:
def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value
for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)
When using the given example, this is the result:
>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
When putting your code into a function, this generator is almost twice as fast with the given v
:
In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.
Currently you iterate over each dictionary twice. You can do it in one pass:
for v in d:
commodities_reference =
for k, g in v.items():
if isinstance(g, dict):
commodities_reference.append(g["id"])
commodities_reference.append(g["name"])
else:
commodities_reference.append(g)
Note that this appends the values when it encounters it. This means that in Python < 3.7 (cPython < 3.6) it is not guaranteed that the dictionary is actually the last item to be looked at, since dictionaries were not to guaranteed to be in insertion order.
You could even make this a generator and slightly more general:
def get_values_recursive(x):
for value in x.values():
if isinstance(value, dict):
yield from get_values_recursive(value)
else:
yield value
for v in d:
commodities_reference = list(get_values_recursive(v))
# do something with it...
print(commodities_reference)
When using the given example, this is the result:
>>> list(get_values_recursive(v))
[1, 'Explosives', 1, 294, 0, 755, 1774, 99, 18, 176, 924, 0, 128049204, 1, 'Chemicals']
When putting your code into a function, this generator is almost twice as fast with the given v
:
In [13]: %timeit op(v)
5.32 µs ± 43.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit list(get_values_recursive(v))
3.64 µs ± 10.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Note that both take on the order of micro seconds, so unless you need to process more than 100000 items per second your bottleneck is probably in those SQL statements and how you execute them.
edited yesterday
answered 2 days ago
Graipher
22k53183
22k53183
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
add a comment |
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
thank you for idea. I ll try to change.
– Deareim
yesterday
thank you for idea. I ll try to change.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
So I have tested your "general' function and I see a lot of performance improvment, so thank you again. Just missing the fact that I need the data without dictionnaries inside like in my example. That is why I had a get on dictionnary.
– Deareim
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
@Deareim: I don't understand. When I call the function with the given example there are no dictionaries left in the output (see edited answer).
– Graipher
yesterday
add a comment |
Deareim is a new contributor. Be nice, and check out our Code of Conduct.
Deareim is a new contributor. Be nice, and check out our Code of Conduct.
Deareim is a new contributor. Be nice, and check out our Code of Conduct.
Deareim is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207659%2fread-list-of-dictionaries-with-nested-dictionaries%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I'm not sure I fully understand the requirement (same order as what?) - could you show the actual expected output from the given input? That would make it clearer.
– Toby Speight
Nov 14 at 14:12
1
the data example I gave is the content of v, not d. In my case, d has 333 dictionaries in it.
– Deareim
Nov 14 at 14:21
Is the problem the code you posted or sqlite insert? Seems more likely to be the insert
– juvian
Nov 14 at 20:21
I solve the issue of the insert by using executemany instead with tuple. But it doesn't solve the code issue as it seems to me it could more optimized.
– Deareim
Nov 14 at 20:43