Estimating Parameter - What is the qualitative difference between MLE fitting and Least Squares CDF fitting?
$begingroup$
Given a parametric pdf $f(x;lambda)$ and a set of data ${ x_k }_{k=1}^n$, here are two ways of formulating a problem of selecting an optimal parameter vector $lambda^*$ to fit to the data. The first is maximum likelihood estimation (MLE):
$$lambda^* = arg max_lambda prod_{k=1}^n f(x_k;lambda)$$
where this product is called the likelihood function.
The second is least squares CDF fitting:
$$lambda^*=arg min_lambda | E(x)-F(x;lambda) |_{L^2(dx)}$$
where $F(x;lambda)$ is the CDF corresponding to $f(x;lambda)$ and $E(x)$ is the empirical CDF: $E(x)=frac{1}{n} sum_{k=1}^n 1_{x_k leq x}$. (One could also consider more general $L^p$ CDF fitting, but let's not go there for now.)
In the experiments I have done, these two methods give similar but still significantly different results. For example, in a bimodal normal mixture fit, one gave one of the standard deviations as about $12.6$ while the other gave it as about $11.6$. This isn't a huge difference but it is large enough to easily see it in a graph.
What is the intuition for the difference in these two "goodness of fit" metrics? An example answer would be something along the lines of "MLE cares more about data points in the tail of the distribution than least squares CDF fit" (I make no claims on the validity of this statement). An answer discussing other metrics of fitting parametric distributions to data would also be of some use.
statistics numerical-methods least-squares parameter-estimation maximum-likelihood
$endgroup$
add a comment |
$begingroup$
Given a parametric pdf $f(x;lambda)$ and a set of data ${ x_k }_{k=1}^n$, here are two ways of formulating a problem of selecting an optimal parameter vector $lambda^*$ to fit to the data. The first is maximum likelihood estimation (MLE):
$$lambda^* = arg max_lambda prod_{k=1}^n f(x_k;lambda)$$
where this product is called the likelihood function.
The second is least squares CDF fitting:
$$lambda^*=arg min_lambda | E(x)-F(x;lambda) |_{L^2(dx)}$$
where $F(x;lambda)$ is the CDF corresponding to $f(x;lambda)$ and $E(x)$ is the empirical CDF: $E(x)=frac{1}{n} sum_{k=1}^n 1_{x_k leq x}$. (One could also consider more general $L^p$ CDF fitting, but let's not go there for now.)
In the experiments I have done, these two methods give similar but still significantly different results. For example, in a bimodal normal mixture fit, one gave one of the standard deviations as about $12.6$ while the other gave it as about $11.6$. This isn't a huge difference but it is large enough to easily see it in a graph.
What is the intuition for the difference in these two "goodness of fit" metrics? An example answer would be something along the lines of "MLE cares more about data points in the tail of the distribution than least squares CDF fit" (I make no claims on the validity of this statement). An answer discussing other metrics of fitting parametric distributions to data would also be of some use.
statistics numerical-methods least-squares parameter-estimation maximum-likelihood
$endgroup$
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59
add a comment |
$begingroup$
Given a parametric pdf $f(x;lambda)$ and a set of data ${ x_k }_{k=1}^n$, here are two ways of formulating a problem of selecting an optimal parameter vector $lambda^*$ to fit to the data. The first is maximum likelihood estimation (MLE):
$$lambda^* = arg max_lambda prod_{k=1}^n f(x_k;lambda)$$
where this product is called the likelihood function.
The second is least squares CDF fitting:
$$lambda^*=arg min_lambda | E(x)-F(x;lambda) |_{L^2(dx)}$$
where $F(x;lambda)$ is the CDF corresponding to $f(x;lambda)$ and $E(x)$ is the empirical CDF: $E(x)=frac{1}{n} sum_{k=1}^n 1_{x_k leq x}$. (One could also consider more general $L^p$ CDF fitting, but let's not go there for now.)
In the experiments I have done, these two methods give similar but still significantly different results. For example, in a bimodal normal mixture fit, one gave one of the standard deviations as about $12.6$ while the other gave it as about $11.6$. This isn't a huge difference but it is large enough to easily see it in a graph.
What is the intuition for the difference in these two "goodness of fit" metrics? An example answer would be something along the lines of "MLE cares more about data points in the tail of the distribution than least squares CDF fit" (I make no claims on the validity of this statement). An answer discussing other metrics of fitting parametric distributions to data would also be of some use.
statistics numerical-methods least-squares parameter-estimation maximum-likelihood
$endgroup$
Given a parametric pdf $f(x;lambda)$ and a set of data ${ x_k }_{k=1}^n$, here are two ways of formulating a problem of selecting an optimal parameter vector $lambda^*$ to fit to the data. The first is maximum likelihood estimation (MLE):
$$lambda^* = arg max_lambda prod_{k=1}^n f(x_k;lambda)$$
where this product is called the likelihood function.
The second is least squares CDF fitting:
$$lambda^*=arg min_lambda | E(x)-F(x;lambda) |_{L^2(dx)}$$
where $F(x;lambda)$ is the CDF corresponding to $f(x;lambda)$ and $E(x)$ is the empirical CDF: $E(x)=frac{1}{n} sum_{k=1}^n 1_{x_k leq x}$. (One could also consider more general $L^p$ CDF fitting, but let's not go there for now.)
In the experiments I have done, these two methods give similar but still significantly different results. For example, in a bimodal normal mixture fit, one gave one of the standard deviations as about $12.6$ while the other gave it as about $11.6$. This isn't a huge difference but it is large enough to easily see it in a graph.
What is the intuition for the difference in these two "goodness of fit" metrics? An example answer would be something along the lines of "MLE cares more about data points in the tail of the distribution than least squares CDF fit" (I make no claims on the validity of this statement). An answer discussing other metrics of fitting parametric distributions to data would also be of some use.
statistics numerical-methods least-squares parameter-estimation maximum-likelihood
statistics numerical-methods least-squares parameter-estimation maximum-likelihood
edited Aug 28 '17 at 14:01
Royi
3,45012352
3,45012352
asked Oct 2 '16 at 12:47
IanIan
68.1k25388
68.1k25388
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59
add a comment |
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
In my eyes, the intuitive explanation is that the ML estimates the conditional mode (the maximum of the distribution), the least squares the conditional mean. In the case, where the errors are perfectly Gaussian distributed are this estimates equal.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1950523%2festimating-parameter-what-is-the-qualitative-difference-between-mle-fitting-an%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
In my eyes, the intuitive explanation is that the ML estimates the conditional mode (the maximum of the distribution), the least squares the conditional mean. In the case, where the errors are perfectly Gaussian distributed are this estimates equal.
$endgroup$
add a comment |
$begingroup$
In my eyes, the intuitive explanation is that the ML estimates the conditional mode (the maximum of the distribution), the least squares the conditional mean. In the case, where the errors are perfectly Gaussian distributed are this estimates equal.
$endgroup$
add a comment |
$begingroup$
In my eyes, the intuitive explanation is that the ML estimates the conditional mode (the maximum of the distribution), the least squares the conditional mean. In the case, where the errors are perfectly Gaussian distributed are this estimates equal.
$endgroup$
In my eyes, the intuitive explanation is that the ML estimates the conditional mode (the maximum of the distribution), the least squares the conditional mean. In the case, where the errors are perfectly Gaussian distributed are this estimates equal.
answered Dec 15 '18 at 12:36
Rafael Rafael
465
465
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f1950523%2festimating-parameter-what-is-the-qualitative-difference-between-mle-fitting-an%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I had this thought for long time as well. What I can tell you is that the ML maximizes the Fisher Information which guarantees some properties which I don;t think the other method can.
$endgroup$
– Royi
Aug 28 '17 at 13:59