Python 3 multi-threaded pinger
up vote
3
down vote
favorite
My goal: I want to ping every single IPv4 address and record whether or not they responded.
The way I have it set up is every IP address corresponds to an index. For example 0.0.0.0
is index 0
and 0.0.1.0
is index 256
. So if 0.0.0.0
responded, then the 0th element of the bitarray is true. At the end I write the bitarray to a file.
Here is the code:
import subprocess
from bitarray import bitarray
import threading
import time
response_array = bitarray(256 * 256 * 256 * 256)
response_array.setall(False)
def send_all_pings():
index = 0
for f1 in range(256):
for f2 in range(256):
for f3 in range(256):
for f4 in range(256):
thread = PingerThread(".".join(map(str, [f1, f2, f3, f4])), index)
thread.start()
index += 1
time.sleep(30)
print("Writing response array to file")
with open('responses.bin', 'wb') as out:
response_array.tofile(out)
class PingerThread(threading.Thread):
def __init__(self, address, index):
threading.Thread.__init__(self)
self.address = address
self.index = index
def run(self):
if subprocess.call(["ping", "-c", "1", "-w", "1", self.address]) == 0:
response_array[self.index] = True
else:
response_array[self.index] = False
What can I do to make this run faster? Any optimisations at all, even if very small, are welcome!
python performance python-3.x multithreading networking
add a comment |
up vote
3
down vote
favorite
My goal: I want to ping every single IPv4 address and record whether or not they responded.
The way I have it set up is every IP address corresponds to an index. For example 0.0.0.0
is index 0
and 0.0.1.0
is index 256
. So if 0.0.0.0
responded, then the 0th element of the bitarray is true. At the end I write the bitarray to a file.
Here is the code:
import subprocess
from bitarray import bitarray
import threading
import time
response_array = bitarray(256 * 256 * 256 * 256)
response_array.setall(False)
def send_all_pings():
index = 0
for f1 in range(256):
for f2 in range(256):
for f3 in range(256):
for f4 in range(256):
thread = PingerThread(".".join(map(str, [f1, f2, f3, f4])), index)
thread.start()
index += 1
time.sleep(30)
print("Writing response array to file")
with open('responses.bin', 'wb') as out:
response_array.tofile(out)
class PingerThread(threading.Thread):
def __init__(self, address, index):
threading.Thread.__init__(self)
self.address = address
self.index = index
def run(self):
if subprocess.call(["ping", "-c", "1", "-w", "1", self.address]) == 0:
response_array[self.index] = True
else:
response_array[self.index] = False
What can I do to make this run faster? Any optimisations at all, even if very small, are welcome!
python performance python-3.x multithreading networking
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
My goal: I want to ping every single IPv4 address and record whether or not they responded.
The way I have it set up is every IP address corresponds to an index. For example 0.0.0.0
is index 0
and 0.0.1.0
is index 256
. So if 0.0.0.0
responded, then the 0th element of the bitarray is true. At the end I write the bitarray to a file.
Here is the code:
import subprocess
from bitarray import bitarray
import threading
import time
response_array = bitarray(256 * 256 * 256 * 256)
response_array.setall(False)
def send_all_pings():
index = 0
for f1 in range(256):
for f2 in range(256):
for f3 in range(256):
for f4 in range(256):
thread = PingerThread(".".join(map(str, [f1, f2, f3, f4])), index)
thread.start()
index += 1
time.sleep(30)
print("Writing response array to file")
with open('responses.bin', 'wb') as out:
response_array.tofile(out)
class PingerThread(threading.Thread):
def __init__(self, address, index):
threading.Thread.__init__(self)
self.address = address
self.index = index
def run(self):
if subprocess.call(["ping", "-c", "1", "-w", "1", self.address]) == 0:
response_array[self.index] = True
else:
response_array[self.index] = False
What can I do to make this run faster? Any optimisations at all, even if very small, are welcome!
python performance python-3.x multithreading networking
My goal: I want to ping every single IPv4 address and record whether or not they responded.
The way I have it set up is every IP address corresponds to an index. For example 0.0.0.0
is index 0
and 0.0.1.0
is index 256
. So if 0.0.0.0
responded, then the 0th element of the bitarray is true. At the end I write the bitarray to a file.
Here is the code:
import subprocess
from bitarray import bitarray
import threading
import time
response_array = bitarray(256 * 256 * 256 * 256)
response_array.setall(False)
def send_all_pings():
index = 0
for f1 in range(256):
for f2 in range(256):
for f3 in range(256):
for f4 in range(256):
thread = PingerThread(".".join(map(str, [f1, f2, f3, f4])), index)
thread.start()
index += 1
time.sleep(30)
print("Writing response array to file")
with open('responses.bin', 'wb') as out:
response_array.tofile(out)
class PingerThread(threading.Thread):
def __init__(self, address, index):
threading.Thread.__init__(self)
self.address = address
self.index = index
def run(self):
if subprocess.call(["ping", "-c", "1", "-w", "1", self.address]) == 0:
response_array[self.index] = True
else:
response_array[self.index] = False
What can I do to make this run faster? Any optimisations at all, even if very small, are welcome!
python performance python-3.x multithreading networking
python performance python-3.x multithreading networking
edited yesterday
200_success
127k15149412
127k15149412
asked yesterday
Kos
20329
20329
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
Opening four billion network connections potentially at once doesn't sound like a good idea. I can't tell right now whether or not you'll hit the OS limit and if it will be handled in some graceful way like blocking until a handle is free, but I'd rather set up a sane limit up front.
add a comment |
up vote
1
down vote
Some suggestions:
- IP addresses are only formatted as octets (0-255) for human readability - they actually just represent integers. Instead of for example 127.0.0.1 you can use 2130706433 (127*2^24+1). In other words,
range(2^32-1)
represents the entire range of IPv4 addresses. - Using a Python library to ping hosts is very likely going to be much faster than starting a shell command.
- Use multiprocessing rather than Python threads to avoid running into the global interpreter lock
response_array
will end up taking many gigabytes of memory. If you really need the kind of detail you're logging you should be writing each entry to disk ASAP (keeping the file open all the while). You could also look into simplifying your reporting, such as only saving the IP addresses which don't respond, or saving to two files, one with responding IPs and the other with non-responding ones. You'll have to store a file (or a pair of files) per process, to avoid them clobbering each other.
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Opening four billion network connections potentially at once doesn't sound like a good idea. I can't tell right now whether or not you'll hit the OS limit and if it will be handled in some graceful way like blocking until a handle is free, but I'd rather set up a sane limit up front.
add a comment |
up vote
1
down vote
Opening four billion network connections potentially at once doesn't sound like a good idea. I can't tell right now whether or not you'll hit the OS limit and if it will be handled in some graceful way like blocking until a handle is free, but I'd rather set up a sane limit up front.
add a comment |
up vote
1
down vote
up vote
1
down vote
Opening four billion network connections potentially at once doesn't sound like a good idea. I can't tell right now whether or not you'll hit the OS limit and if it will be handled in some graceful way like blocking until a handle is free, but I'd rather set up a sane limit up front.
Opening four billion network connections potentially at once doesn't sound like a good idea. I can't tell right now whether or not you'll hit the OS limit and if it will be handled in some graceful way like blocking until a handle is free, but I'd rather set up a sane limit up front.
answered yesterday
millimoose
27527
27527
add a comment |
add a comment |
up vote
1
down vote
Some suggestions:
- IP addresses are only formatted as octets (0-255) for human readability - they actually just represent integers. Instead of for example 127.0.0.1 you can use 2130706433 (127*2^24+1). In other words,
range(2^32-1)
represents the entire range of IPv4 addresses. - Using a Python library to ping hosts is very likely going to be much faster than starting a shell command.
- Use multiprocessing rather than Python threads to avoid running into the global interpreter lock
response_array
will end up taking many gigabytes of memory. If you really need the kind of detail you're logging you should be writing each entry to disk ASAP (keeping the file open all the while). You could also look into simplifying your reporting, such as only saving the IP addresses which don't respond, or saving to two files, one with responding IPs and the other with non-responding ones. You'll have to store a file (or a pair of files) per process, to avoid them clobbering each other.
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
add a comment |
up vote
1
down vote
Some suggestions:
- IP addresses are only formatted as octets (0-255) for human readability - they actually just represent integers. Instead of for example 127.0.0.1 you can use 2130706433 (127*2^24+1). In other words,
range(2^32-1)
represents the entire range of IPv4 addresses. - Using a Python library to ping hosts is very likely going to be much faster than starting a shell command.
- Use multiprocessing rather than Python threads to avoid running into the global interpreter lock
response_array
will end up taking many gigabytes of memory. If you really need the kind of detail you're logging you should be writing each entry to disk ASAP (keeping the file open all the while). You could also look into simplifying your reporting, such as only saving the IP addresses which don't respond, or saving to two files, one with responding IPs and the other with non-responding ones. You'll have to store a file (or a pair of files) per process, to avoid them clobbering each other.
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
add a comment |
up vote
1
down vote
up vote
1
down vote
Some suggestions:
- IP addresses are only formatted as octets (0-255) for human readability - they actually just represent integers. Instead of for example 127.0.0.1 you can use 2130706433 (127*2^24+1). In other words,
range(2^32-1)
represents the entire range of IPv4 addresses. - Using a Python library to ping hosts is very likely going to be much faster than starting a shell command.
- Use multiprocessing rather than Python threads to avoid running into the global interpreter lock
response_array
will end up taking many gigabytes of memory. If you really need the kind of detail you're logging you should be writing each entry to disk ASAP (keeping the file open all the while). You could also look into simplifying your reporting, such as only saving the IP addresses which don't respond, or saving to two files, one with responding IPs and the other with non-responding ones. You'll have to store a file (or a pair of files) per process, to avoid them clobbering each other.
Some suggestions:
- IP addresses are only formatted as octets (0-255) for human readability - they actually just represent integers. Instead of for example 127.0.0.1 you can use 2130706433 (127*2^24+1). In other words,
range(2^32-1)
represents the entire range of IPv4 addresses. - Using a Python library to ping hosts is very likely going to be much faster than starting a shell command.
- Use multiprocessing rather than Python threads to avoid running into the global interpreter lock
response_array
will end up taking many gigabytes of memory. If you really need the kind of detail you're logging you should be writing each entry to disk ASAP (keeping the file open all the while). You could also look into simplifying your reporting, such as only saving the IP addresses which don't respond, or saving to two files, one with responding IPs and the other with non-responding ones. You'll have to store a file (or a pair of files) per process, to avoid them clobbering each other.
edited yesterday
answered yesterday
l0b0
4,172923
4,172923
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
add a comment |
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
1
1
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Since this is mostly waiting on a network, async io might be even better than multiprocessing. Although you probably don't want 4 billion open connections, but that's an issue with the original code as well
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
Also saving to two files from umpty concurrent processes sounds like either a bottleneck or a way to end up with a file full of garbage.
– millimoose
yesterday
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209170%2fpython-3-multi-threaded-pinger%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown