Parallelizing some model fitting code using multiprocessing

up vote
9
down vote

favorite

I'm trying to refactor some code for fitting parametric models that are defined symbolically using Theano. My goal is for all models to expose a common interface so that, as far as possible, they can be drop-in replacements for one another. To that end I've tried to encapsulate each model within a separate class. A key requirement is that I'm able to parallelize fitting the same model to multiple datasets using multiprocessing (I'm using the joblib wrapper for this).

Here's a runnable example of what I'm doing at the moment:

import numpy as np

import theano

from theano import tensor as te

from theano.gradient import jacobian, hessian

from scipy.optimize import minimize

from joblib import Parallel, delayed



class Rosenbrock(object):

    """The Rosenbrock function: f(x, y) = (a - x)^2 + b(y - x^2)^2 """



    # symbolic variables - only used internally

    _P = te.dvector('P')

    _a, _b = _P[0], _P[1]

    _xy = te.dmatrix('xy')

    _x, _y = _xy[0], _xy[1]

    _z = te.dvector('z')

    _z_hat = (_a - _x) ** 2 + _b * (_y - _x ** 2) ** 2

    _diff = _z - _z_hat

    _loss = 0.5 * te.dot(_diff, _diff)

    _jac = jacobian(_loss, _P)

    _hess = hessian(_loss, _P)



    # theano functions - part of the interface

    forward = theano.function([_P, _xy], _z_hat)

    loss = theano.function([_P, _xy, _z], _loss)

    jacobian = theano.function([_P, _xy, _z], _jac)

    hessian = theano.function([_P, _xy, _z], _hess)



    @staticmethod

    def initialize(xy, z):

        """

        make some sensible estimate of what the initial parameters should be,

        based on xy and z

        """

        P0 = xy[:, np.argmin(z)]

        return P0



    @staticmethod

    def _postfit(P):

        """

        sometimes I want to make some adjustments to the parameters post-

        fitting, e.g. wrapping angles between 0 and 2pi

        """

        return P



def do_fit(model, *args):

    """

    wrapper function that performs the fitting

    """

    # initialize the model

    P0 = model.initialize(*args)

    # do the fit

    res = minimize(model.loss, P0, args=args, method='Newton-CG',

                   jac=model.jacobian, hess=model.hessian)

    P = res.x

    # tweak the parameters

    P = model._postfit(P)

    # return the tweaked parameters

    return P



def run(niter=2000):

    # I don't actually need to instantiate this, since everything is

    # effectively a class method...

    model = Rosenbrock()



    # some example data

    xy = np.mgrid[-3:3:100j, -3:3:100j].reshape(2, -1)

    P = np.r_[1., 100.]

    z = model.forward(P, xy)



    # run multiple fits in parallel

    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    results = pool(delayed(do_fit)(model, xy, z) for _ in xrange(niter))



if __name__ == '__main__':

    run()

The core functions forward(), loss(), jacobian() and hessian() behave like staticmethods of the class. I've found that in order to be able to parallelize
the fitting, the Theano functions must be attributes of the class rather than
of an instance. Otherwise (i.e. if I define these functions within the __init__()
method of the class), what happens when I try to call them in parallel using
multiprocessing is that the worker threads block one another, meaning that they effectively only use a single core. Presumably this is because the GIL is no longer being circumvented, although I don't really understand why this should be the case.

Declaring the Theano functions as class methods has two rather undesirable consequences:

The class namespace gets cluttered up with all of the intermediate
Theano symbolic variables (a, b etc.), which aren't really needed
after theano.function() has been called. To hide them from the user I have
to prepend the variable names with underscores, which makes the code harder
to read.

theano.function() triggers the auto-generation and compilation of
C code, which is a slow process. It would be most convenient to define
all of my models in the same source file, but that means that whenever
I import or reload that file I have to wait for all of my models to
re-compile.

Can anyone (especially someone with experience in Theano) suggest a better way
to structure this code?

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
9
down vote

favorite

Here's a runnable example of what I'm doing at the moment:

import numpy as np

import theano

from theano import tensor as te

from theano.gradient import jacobian, hessian

from scipy.optimize import minimize

from joblib import Parallel, delayed



class Rosenbrock(object):

    """The Rosenbrock function: f(x, y) = (a - x)^2 + b(y - x^2)^2 """



    # symbolic variables - only used internally

    _P = te.dvector('P')

    _a, _b = _P[0], _P[1]

    _xy = te.dmatrix('xy')

    _x, _y = _xy[0], _xy[1]

    _z = te.dvector('z')

    _z_hat = (_a - _x) ** 2 + _b * (_y - _x ** 2) ** 2

    _diff = _z - _z_hat

    _loss = 0.5 * te.dot(_diff, _diff)

    _jac = jacobian(_loss, _P)

    _hess = hessian(_loss, _P)



    # theano functions - part of the interface

    forward = theano.function([_P, _xy], _z_hat)

    loss = theano.function([_P, _xy, _z], _loss)

    jacobian = theano.function([_P, _xy, _z], _jac)

    hessian = theano.function([_P, _xy, _z], _hess)



    @staticmethod

    def initialize(xy, z):

        """

        make some sensible estimate of what the initial parameters should be,

        based on xy and z

        """

        P0 = xy[:, np.argmin(z)]

        return P0



    @staticmethod

    def _postfit(P):

        """

        sometimes I want to make some adjustments to the parameters post-

        fitting, e.g. wrapping angles between 0 and 2pi

        """

        return P



def do_fit(model, *args):

    """

    wrapper function that performs the fitting

    """

    # initialize the model

    P0 = model.initialize(*args)

    # do the fit

    res = minimize(model.loss, P0, args=args, method='Newton-CG',

                   jac=model.jacobian, hess=model.hessian)

    P = res.x

    # tweak the parameters

    P = model._postfit(P)

    # return the tweaked parameters

    return P



def run(niter=2000):

    # I don't actually need to instantiate this, since everything is

    # effectively a class method...

    model = Rosenbrock()



    # some example data

    xy = np.mgrid[-3:3:100j, -3:3:100j].reshape(2, -1)

    P = np.r_[1., 100.]

    z = model.forward(P, xy)



    # run multiple fits in parallel

    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    results = pool(delayed(do_fit)(model, xy, z) for _ in xrange(niter))



if __name__ == '__main__':

    run()

Declaring the Theano functions as class methods has two rather undesirable consequences:

The class namespace gets cluttered up with all of the intermediate
Theano symbolic variables (a, b etc.), which aren't really needed
after theano.function() has been called. To hide them from the user I have
to prepend the variable names with underscores, which makes the code harder
to read.

theano.function() triggers the auto-generation and compilation of
C code, which is a slow process. It would be most convenient to define
all of my models in the same source file, but that means that whenever
I import or reload that file I have to wait for all of my models to
re-compile.

Can anyone (especially someone with experience in Theano) suggest a better way
to structure this code?

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

up vote
9
down vote

favorite

Here's a runnable example of what I'm doing at the moment:

import numpy as np

import theano

from theano import tensor as te

from theano.gradient import jacobian, hessian

from scipy.optimize import minimize

from joblib import Parallel, delayed



class Rosenbrock(object):

    """The Rosenbrock function: f(x, y) = (a - x)^2 + b(y - x^2)^2 """



    # symbolic variables - only used internally

    _P = te.dvector('P')

    _a, _b = _P[0], _P[1]

    _xy = te.dmatrix('xy')

    _x, _y = _xy[0], _xy[1]

    _z = te.dvector('z')

    _z_hat = (_a - _x) ** 2 + _b * (_y - _x ** 2) ** 2

    _diff = _z - _z_hat

    _loss = 0.5 * te.dot(_diff, _diff)

    _jac = jacobian(_loss, _P)

    _hess = hessian(_loss, _P)



    # theano functions - part of the interface

    forward = theano.function([_P, _xy], _z_hat)

    loss = theano.function([_P, _xy, _z], _loss)

    jacobian = theano.function([_P, _xy, _z], _jac)

    hessian = theano.function([_P, _xy, _z], _hess)



    @staticmethod

    def initialize(xy, z):

        """

        make some sensible estimate of what the initial parameters should be,

        based on xy and z

        """

        P0 = xy[:, np.argmin(z)]

        return P0



    @staticmethod

    def _postfit(P):

        """

        sometimes I want to make some adjustments to the parameters post-

        fitting, e.g. wrapping angles between 0 and 2pi

        """

        return P



def do_fit(model, *args):

    """

    wrapper function that performs the fitting

    """

    # initialize the model

    P0 = model.initialize(*args)

    # do the fit

    res = minimize(model.loss, P0, args=args, method='Newton-CG',

                   jac=model.jacobian, hess=model.hessian)

    P = res.x

    # tweak the parameters

    P = model._postfit(P)

    # return the tweaked parameters

    return P



def run(niter=2000):

    # I don't actually need to instantiate this, since everything is

    # effectively a class method...

    model = Rosenbrock()



    # some example data

    xy = np.mgrid[-3:3:100j, -3:3:100j].reshape(2, -1)

    P = np.r_[1., 100.]

    z = model.forward(P, xy)



    # run multiple fits in parallel

    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    results = pool(delayed(do_fit)(model, xy, z) for _ in xrange(niter))



if __name__ == '__main__':

    run()

Declaring the Theano functions as class methods has two rather undesirable consequences:

The class namespace gets cluttered up with all of the intermediate
Theano symbolic variables (a, b etc.), which aren't really needed
after theano.function() has been called. To hide them from the user I have
to prepend the variable names with underscores, which makes the code harder
to read.

theano.function() triggers the auto-generation and compilation of
C code, which is a slow process. It would be most convenient to define
all of my models in the same source file, but that means that whenever
I import or reload that file I have to wait for all of my models to
re-compile.

Can anyone (especially someone with experience in Theano) suggest a better way
to structure this code?

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

Here's a runnable example of what I'm doing at the moment:

import numpy as np

import theano

from theano import tensor as te

from theano.gradient import jacobian, hessian

from scipy.optimize import minimize

from joblib import Parallel, delayed



class Rosenbrock(object):

    """The Rosenbrock function: f(x, y) = (a - x)^2 + b(y - x^2)^2 """



    # symbolic variables - only used internally

    _P = te.dvector('P')

    _a, _b = _P[0], _P[1]

    _xy = te.dmatrix('xy')

    _x, _y = _xy[0], _xy[1]

    _z = te.dvector('z')

    _z_hat = (_a - _x) ** 2 + _b * (_y - _x ** 2) ** 2

    _diff = _z - _z_hat

    _loss = 0.5 * te.dot(_diff, _diff)

    _jac = jacobian(_loss, _P)

    _hess = hessian(_loss, _P)



    # theano functions - part of the interface

    forward = theano.function([_P, _xy], _z_hat)

    loss = theano.function([_P, _xy, _z], _loss)

    jacobian = theano.function([_P, _xy, _z], _jac)

    hessian = theano.function([_P, _xy, _z], _hess)



    @staticmethod

    def initialize(xy, z):

        """

        make some sensible estimate of what the initial parameters should be,

        based on xy and z

        """

        P0 = xy[:, np.argmin(z)]

        return P0



    @staticmethod

    def _postfit(P):

        """

        sometimes I want to make some adjustments to the parameters post-

        fitting, e.g. wrapping angles between 0 and 2pi

        """

        return P



def do_fit(model, *args):

    """

    wrapper function that performs the fitting

    """

    # initialize the model

    P0 = model.initialize(*args)

    # do the fit

    res = minimize(model.loss, P0, args=args, method='Newton-CG',

                   jac=model.jacobian, hess=model.hessian)

    P = res.x

    # tweak the parameters

    P = model._postfit(P)

    # return the tweaked parameters

    return P



def run(niter=2000):

    # I don't actually need to instantiate this, since everything is

    # effectively a class method...

    model = Rosenbrock()



    # some example data

    xy = np.mgrid[-3:3:100j, -3:3:100j].reshape(2, -1)

    P = np.r_[1., 100.]

    z = model.forward(P, xy)



    # run multiple fits in parallel

    pool = Parallel(n_jobs=-1, verbose=1, pre_dispatch='all')

    results = pool(delayed(do_fit)(model, xy, z) for _ in xrange(niter))



if __name__ == '__main__':

    run()

Declaring the Theano functions as class methods has two rather undesirable consequences:

The class namespace gets cluttered up with all of the intermediate
Theano symbolic variables (a, b etc.), which aren't really needed
after theano.function() has been called. To hide them from the user I have
to prepend the variable names with underscores, which makes the code harder
to read.

theano.function() triggers the auto-generation and compilation of
C code, which is a slow process. It would be most convenient to define
all of my models in the same source file, but that means that whenever
I import or reload that file I have to wait for all of my models to
re-compile.

Can anyone (especially someone with experience in Theano) suggest a better way
to structure this code?

python object-oriented multithreading numpy

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

edited Aug 19 '14 at 13:45

asked Aug 19 '14 at 10:37

ali_m

1484

asked Aug 19 '14 at 10:37

ali_m

1484

asked Aug 19 '14 at 10:37

ali_m

1484

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

bumped to the homepage by Community♦ yesterday

This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Here's a detail that might help you a little bit.

The class namespace gets cluttered...

Consider this variant:

@staticmethod

def initialize(xy, z):

    def helper(xy, z):

        P0 = xy[:, np.argmin(z)]

        return P0

    return helper(xy, z)

The nested function keeps P0 out of the class namespace.

answered Sep 10 '17 at 23:51

J_H

4,407130

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f60460%2fparallelizing-some-model-fitting-code-using-multiprocessing%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Here's a detail that might help you a little bit.

The class namespace gets cluttered...

Consider this variant:

@staticmethod

def initialize(xy, z):

    def helper(xy, z):

        P0 = xy[:, np.argmin(z)]

        return P0

    return helper(xy, z)

The nested function keeps P0 out of the class namespace.

answered Sep 10 '17 at 23:51

J_H

4,407130

add a comment |

up vote
0
down vote

Here's a detail that might help you a little bit.

The class namespace gets cluttered...

Consider this variant:

@staticmethod

def initialize(xy, z):

    def helper(xy, z):

        P0 = xy[:, np.argmin(z)]

        return P0

    return helper(xy, z)

The nested function keeps P0 out of the class namespace.

answered Sep 10 '17 at 23:51

J_H

4,407130

add a comment |

up vote
0
down vote

Here's a detail that might help you a little bit.

The class namespace gets cluttered...

Consider this variant:

@staticmethod

def initialize(xy, z):

    def helper(xy, z):

        P0 = xy[:, np.argmin(z)]

        return P0

    return helper(xy, z)

The nested function keeps P0 out of the class namespace.

answered Sep 10 '17 at 23:51

J_H

4,407130

Here's a detail that might help you a little bit.

The class namespace gets cluttered...

Consider this variant:

@staticmethod

def initialize(xy, z):

    def helper(xy, z):

        P0 = xy[:, np.argmin(z)]

        return P0

    return helper(xy, z)

The nested function keeps P0 out of the class namespace.

answered Sep 10 '17 at 23:51

J_H

4,407130

answered Sep 10 '17 at 23:51

J_H

4,407130

answered Sep 10 '17 at 23:51

J_H

4,407130

answered Sep 10 '17 at 23:51

J_H

4,407130

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

hK PWX Sw,T57BWVQPz,1NxFF26G,Irn9Rqe

搜尋此網誌

Krdytkyu