Questions regarding the use of Requests Futures for accessing REST URLs

Multi tool use

up vote
8
down vote

favorite

This is a much simplified version of the real code focusing just on the handling of Futures from Requests Futures.

I have a few questions:

I had to implement my own version of as_completed because the data handlers may add more Futures to _pending. Is this a decent way to handle the problem, or is there another approach?

Is stop sufficient to handle KeyboardInterrupt in all cases? It has worked well in my limited testing. I found it hard to find a solution via Google.

Is my rate limiting solution OK or is there a better approach? It is not about the number of concurrent connections but about the number of connections per second.

import argparse

from concurrent.futures import ThreadPoolExecutor

import requests

from requests_futures.sessions import FuturesSession

import time



def background_callback(sess, resp):

    # parse the json storing the result on the response object

    if resp.status_code == requests.codes.ok:

        resp.data = resp.json()

    else:

        resp.data = None



class JSONRetriever(object):

    def __init__(self):

        self._executor = ThreadPoolExecutor(max_workers=10)

        self._session = FuturesSession(executor=self._executor)

        self._pending = {}



    def fetch(self, url):

        future = self._session.get(url,

                                   background_callback=background_callback)

        self._pending[future] = url



    def drain(self):

        # Look for completed requests by hand because in the real code

        # the responses my trigger further URLs to be retrieved so

        # self._pending is modified. New requests being added really

        # confused as_completed().

        for future in [f for f in self._pending if f.done()]:

            url = self._pending[future]

            del self._pending[future]



            response = future.result()

            response.raise_for_status()

            if response.status_code == requests.codes.ok:

                print response.data

                # real code would handle data possibly adding more requests

            else:

                # the real code is smarter, this is just for CR

                raise Exception("FIXME: unhandle response")



    def finish(self):

        while self._pending:

            self.drain()

            if self._pending:

                time.sleep(1)



    def stop(self):

        for i in self._pending:

            try:

                i.cancel()

            except Exception as e:

                sys.stderr.write("Caught: " + str(e) + "n")



        self._executor.shutdown()



if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Perform all REST calls")

    parser.add_argument("--delay", type=int, default=0)

    parser.add_argument("urls", nargs="+")

    args = parser.parse_args()



    retriever = JSONRetriever()



    try:

        for url in args.urls:

            retriever.fetch(url)

            if args.delay > 0:  # may need a delay to rate limit requests

                time.sleep(args.delay)

                retriever.drain()  # clear any requests that completed while asleep



        retriever.finish()

    except KeyboardInterrupt:

        retriever.stop()

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Sean, do you not find the requests-futures a bit slow? For example I can't get an improvement over mulitprocessing.Pool() and some people have suggested tornado instead.
– mptevsion
Mar 2 '16 at 20:43

I have never needed blazing speed.
– Sean Perry
Mar 2 '16 at 20:46

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
– Toby Speight
Sep 6 at 15:26

add a comment |

up vote
8
down vote

favorite

This is a much simplified version of the real code focusing just on the handling of Futures from Requests Futures.

I have a few questions:

I had to implement my own version of as_completed because the data handlers may add more Futures to _pending. Is this a decent way to handle the problem, or is there another approach?

Is stop sufficient to handle KeyboardInterrupt in all cases? It has worked well in my limited testing. I found it hard to find a solution via Google.

Is my rate limiting solution OK or is there a better approach? It is not about the number of concurrent connections but about the number of connections per second.

import argparse

from concurrent.futures import ThreadPoolExecutor

import requests

from requests_futures.sessions import FuturesSession

import time



def background_callback(sess, resp):

    # parse the json storing the result on the response object

    if resp.status_code == requests.codes.ok:

        resp.data = resp.json()

    else:

        resp.data = None



class JSONRetriever(object):

    def __init__(self):

        self._executor = ThreadPoolExecutor(max_workers=10)

        self._session = FuturesSession(executor=self._executor)

        self._pending = {}



    def fetch(self, url):

        future = self._session.get(url,

                                   background_callback=background_callback)

        self._pending[future] = url



    def drain(self):

        # Look for completed requests by hand because in the real code

        # the responses my trigger further URLs to be retrieved so

        # self._pending is modified. New requests being added really

        # confused as_completed().

        for future in [f for f in self._pending if f.done()]:

            url = self._pending[future]

            del self._pending[future]



            response = future.result()

            response.raise_for_status()

            if response.status_code == requests.codes.ok:

                print response.data

                # real code would handle data possibly adding more requests

            else:

                # the real code is smarter, this is just for CR

                raise Exception("FIXME: unhandle response")



    def finish(self):

        while self._pending:

            self.drain()

            if self._pending:

                time.sleep(1)



    def stop(self):

        for i in self._pending:

            try:

                i.cancel()

            except Exception as e:

                sys.stderr.write("Caught: " + str(e) + "n")



        self._executor.shutdown()



if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Perform all REST calls")

    parser.add_argument("--delay", type=int, default=0)

    parser.add_argument("urls", nargs="+")

    args = parser.parse_args()



    retriever = JSONRetriever()



    try:

        for url in args.urls:

            retriever.fetch(url)

            if args.delay > 0:  # may need a delay to rate limit requests

                time.sleep(args.delay)

                retriever.drain()  # clear any requests that completed while asleep



        retriever.finish()

    except KeyboardInterrupt:

        retriever.stop()

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Sean, do you not find the requests-futures a bit slow? For example I can't get an improvement over mulitprocessing.Pool() and some people have suggested tornado instead.
– mptevsion
Mar 2 '16 at 20:43

I have never needed blazing speed.
– Sean Perry
Mar 2 '16 at 20:46

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
– Toby Speight
Sep 6 at 15:26

add a comment |

up vote
8
down vote

favorite

This is a much simplified version of the real code focusing just on the handling of Futures from Requests Futures.

I have a few questions:

I had to implement my own version of as_completed because the data handlers may add more Futures to _pending. Is this a decent way to handle the problem, or is there another approach?

Is stop sufficient to handle KeyboardInterrupt in all cases? It has worked well in my limited testing. I found it hard to find a solution via Google.

Is my rate limiting solution OK or is there a better approach? It is not about the number of concurrent connections but about the number of connections per second.

import argparse

from concurrent.futures import ThreadPoolExecutor

import requests

from requests_futures.sessions import FuturesSession

import time



def background_callback(sess, resp):

    # parse the json storing the result on the response object

    if resp.status_code == requests.codes.ok:

        resp.data = resp.json()

    else:

        resp.data = None



class JSONRetriever(object):

    def __init__(self):

        self._executor = ThreadPoolExecutor(max_workers=10)

        self._session = FuturesSession(executor=self._executor)

        self._pending = {}



    def fetch(self, url):

        future = self._session.get(url,

                                   background_callback=background_callback)

        self._pending[future] = url



    def drain(self):

        # Look for completed requests by hand because in the real code

        # the responses my trigger further URLs to be retrieved so

        # self._pending is modified. New requests being added really

        # confused as_completed().

        for future in [f for f in self._pending if f.done()]:

            url = self._pending[future]

            del self._pending[future]



            response = future.result()

            response.raise_for_status()

            if response.status_code == requests.codes.ok:

                print response.data

                # real code would handle data possibly adding more requests

            else:

                # the real code is smarter, this is just for CR

                raise Exception("FIXME: unhandle response")



    def finish(self):

        while self._pending:

            self.drain()

            if self._pending:

                time.sleep(1)



    def stop(self):

        for i in self._pending:

            try:

                i.cancel()

            except Exception as e:

                sys.stderr.write("Caught: " + str(e) + "n")



        self._executor.shutdown()



if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Perform all REST calls")

    parser.add_argument("--delay", type=int, default=0)

    parser.add_argument("urls", nargs="+")

    args = parser.parse_args()



    retriever = JSONRetriever()



    try:

        for url in args.urls:

            retriever.fetch(url)

            if args.delay > 0:  # may need a delay to rate limit requests

                time.sleep(args.delay)

                retriever.drain()  # clear any requests that completed while asleep



        retriever.finish()

    except KeyboardInterrupt:

        retriever.stop()

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

This is a much simplified version of the real code focusing just on the handling of Futures from Requests Futures.

I have a few questions:

I had to implement my own version of as_completed because the data handlers may add more Futures to _pending. Is this a decent way to handle the problem, or is there another approach?

Is stop sufficient to handle KeyboardInterrupt in all cases? It has worked well in my limited testing. I found it hard to find a solution via Google.

Is my rate limiting solution OK or is there a better approach? It is not about the number of concurrent connections but about the number of connections per second.

import argparse

from concurrent.futures import ThreadPoolExecutor

import requests

from requests_futures.sessions import FuturesSession

import time



def background_callback(sess, resp):

    # parse the json storing the result on the response object

    if resp.status_code == requests.codes.ok:

        resp.data = resp.json()

    else:

        resp.data = None



class JSONRetriever(object):

    def __init__(self):

        self._executor = ThreadPoolExecutor(max_workers=10)

        self._session = FuturesSession(executor=self._executor)

        self._pending = {}



    def fetch(self, url):

        future = self._session.get(url,

                                   background_callback=background_callback)

        self._pending[future] = url



    def drain(self):

        # Look for completed requests by hand because in the real code

        # the responses my trigger further URLs to be retrieved so

        # self._pending is modified. New requests being added really

        # confused as_completed().

        for future in [f for f in self._pending if f.done()]:

            url = self._pending[future]

            del self._pending[future]



            response = future.result()

            response.raise_for_status()

            if response.status_code == requests.codes.ok:

                print response.data

                # real code would handle data possibly adding more requests

            else:

                # the real code is smarter, this is just for CR

                raise Exception("FIXME: unhandle response")



    def finish(self):

        while self._pending:

            self.drain()

            if self._pending:

                time.sleep(1)



    def stop(self):

        for i in self._pending:

            try:

                i.cancel()

            except Exception as e:

                sys.stderr.write("Caught: " + str(e) + "n")



        self._executor.shutdown()



if __name__ == "__main__":

    parser = argparse.ArgumentParser(description="Perform all REST calls")

    parser.add_argument("--delay", type=int, default=0)

    parser.add_argument("urls", nargs="+")

    args = parser.parse_args()



    retriever = JSONRetriever()



    try:

        for url in args.urls:

            retriever.fetch(url)

            if args.delay > 0:  # may need a delay to rate limit requests

                time.sleep(args.delay)

                retriever.drain()  # clear any requests that completed while asleep



        retriever.finish()

    except KeyboardInterrupt:

        retriever.stop()

python asynchronous http signal-handling

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

edited Jan 20 '15 at 0:15

Hosch250

17.2k564156

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

asked Jul 17 '14 at 22:40

Sean Perry

1,069513

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

Sean, do you not find the requests-futures a bit slow? For example I can't get an improvement over mulitprocessing.Pool() and some people have suggested tornado instead.
– mptevsion
Mar 2 '16 at 20:43

I have never needed blazing speed.
– Sean Perry
Mar 2 '16 at 20:46

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
– Toby Speight
Sep 6 at 15:26

add a comment |

Sean, do you not find the requests-futures a bit slow? For example I can't get an improvement over mulitprocessing.Pool() and some people have suggested tornado instead.
– mptevsion
Mar 2 '16 at 20:43

I have never needed blazing speed.
– Sean Perry
Mar 2 '16 at 20:46

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
– Toby Speight
Sep 6 at 15:26

Sean, do you not find the requests-futures a bit slow? For example I can't get an improvement over mulitprocessing.Pool() and some people have suggested tornado instead.
– mptevsion
Mar 2 '16 at 20:43

I have never needed blazing speed.
– Sean Perry
Mar 2 '16 at 20:46

The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.
– Toby Speight
Sep 6 at 15:26

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Nice code, clearly written.

I understand the rate limiting requirement.
Having the drain() call within the loop doesn't seem like the caller's responsibility, better to let the BG callback handle it, or defer until finish() as written, which does make sense. Each url fetch could take more or less than the delay time. So this seems to be a bug / wart still lurking within the code.

answered Sep 11 '17 at 0:55

J_H

4,407130

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f57342%2fquestions-regarding-the-use-of-requests-futures-for-accessing-rest-urls%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Nice code, clearly written.

answered Sep 11 '17 at 0:55

J_H

4,407130

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

add a comment |

up vote
0
down vote

Nice code, clearly written.

answered Sep 11 '17 at 0:55

J_H

4,407130

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

add a comment |

up vote
0
down vote

Nice code, clearly written.

answered Sep 11 '17 at 0:55

J_H

4,407130

Nice code, clearly written.

answered Sep 11 '17 at 0:55

J_H

4,407130

answered Sep 11 '17 at 0:55

J_H

4,407130

answered Sep 11 '17 at 0:55

J_H

4,407130

answered Sep 11 '17 at 0:55

J_H

4,407130

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

add a comment |

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

This was extracted from something I wrote while working on a REST client where the server sent back "not ready yet" responses. There was behavior that lead me to the drain() call but 3 years later I no longer remember what it was. There was definitely a wart but the API was dealing with has since been radically overhauled so I cannot dredge up the issue any longer. Thanks for looking at my post.
– Sean Perry
Sep 11 '17 at 5:25

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

t,a2YrjsZwzm3 I5,scSYLqPZlD zhbTKzAa,8G iriVyMtMdW P8fSEqHLEv2AJEOlE1 yxlmRT,KE,FxdIKM

搜尋此網誌

Krdytkyu