मात्रात्मक प्रतिगमन पर रेखीय प्रतिगमन के क्या लाभ हैं?


15

रेखीय प्रतीपगमन मॉडल मान्यताओं का एक समूह है कि बनाता है quantile प्रतिगमन और, अगर रेखीय प्रतीपगमन की मान्यताओं से मुलाकात कर रहे हैं, तो अपने अंतर्ज्ञान (और कुछ बहुत ही सीमित अनुभव) है यह नहीं है कि मंझला प्रतिगमन रेखीय प्रतीपगमन के रूप में लगभग समान परिणाम देना होगा।

तो, रैखिक प्रतिगमन के क्या फायदे हैं? यह निश्चित रूप से अधिक परिचित है, लेकिन इसके अलावा अन्य?


3
'अधिक परिचित' के लिए मैं 'व्याख्यात्मकता' और 'स्थिरता' जोड़ूंगा, लेकिन मेरे लिए रेखीय प्रतिगमन के फायदों में से एक यह है कि यह आपको किस अर्थ के बारे में बताता है और कितनी अच्छी तरह से इसका मतलब नमूना जनसंख्या का प्रतिनिधित्व करता है (अवशिष्ट बहुत जानकारीपूर्ण हैं) । रेखीय प्रतिगमन के रूप में महान मूल्य है जब इसकी मान्यताओं को पूरा किया जाता है और अच्छे मूल्य जब वे नहीं मिलते हैं।
जस्टगेटिनस्टार्ट

5
मेरा तर्क है कि इन दो थ्रेड्स में एक महत्वपूर्ण मुद्दे पर चर्चा की गई है: आंकड़े.stackexchange.com/questions/153348/… और आँकड़े.stackexchange.com/questions/146077/… - दक्षता, और, संभवतः, कुछ के तहत भी इष्टतमता। मान्यताओं
क्रिस्टोफ हांक

1
आगे, लेकिन मामूली, बिंदु के रूप में, कोई स्पष्ट, बंद फॉर्म समाधानों की उपलब्धता को जोड़ सकता है, जो कि, LAD के लिए उपलब्ध नहीं हैं, जो चिकित्सकों के लिए ऐसी तकनीकों को कम आकर्षक बना सकता है।
क्रिस्टोफ हांक

1
एक एकल जनसंख्या पैरामीटर का अनुमान लगाने के सरल मामले की तुलना करने के लिए एक उत्तर की तरह हो सकता है, फिर उस कम से कम चुकता त्रुटियों को गौसियन त्रुटियों के साथ बेहतर प्रदर्शन करता है और कम से कम पूर्ण अवशिष्ट (मान्यताओं का उपयोग करके) विभिन्न प्रकार की त्रुटियों के लिए बेहतर प्रदर्शन करता है। लेकिन फिर, यह सवाल अधिक जटिल रैखिक मॉडल के बारे में है और समस्या अधिक जटिल और व्यापक होने लगती है। सरल समस्या का अंतर्ज्ञान (किसी एकल माध्य / माध्य का आकलन) एक बड़े मॉडल के लिए काम करता है, लेकिन इसे कितना काम करना चाहिए? और आउटलेर्स, डिस्ट्रीब्यूशन, कंपीटिशन के खिलाफ तुलना कैसे करें?
सेक्सटस एम्पिरिकस

2
मेरे मामले में, मैंने गैर-तकनीकी लोगों को समझाने के लिए मात्रात्मक प्रतिगमन को बहुत अच्छा पाया है जब प्रतिक्रिया चर को तिरछा (जैसे ग्राहक व्यय) और परिवर्तन / लिंक-फ़ंक्शन चरण की शुरूआत पूरे विश्लेषण को अस्पष्ट करती है। इस लिहाज से मैं दावा करता हूं कि " माध्य प्रतिगमन थोड़ा रेखांकन के रूप में रैखिक प्रतिगमन के रूप में लगभग समान परिणाम देगा "; यह विशेष रूप से जब संभावित तिरछी प्रतिक्रिया चर के साथ काम नहीं करता है।
us --r11852 का कहना है कि

जवाबों:


10

यह बहुत बार कहा जाता है कि कम से कम वर्ग के अवशेषों को कम करना इस कारण से पूर्ण अवशेषों को कम करने के लिए पसंद किया जाता है कि यह कम्प्यूटेशनल रूप से सरल है । लेकिन, यह अन्य कारणों से भी बेहतर हो सकता है। अर्थात्, यदि मान्यताएँ सत्य हैं (और यह इतनी असामान्य नहीं है) तो यह एक ऐसा समाधान प्रदान करती है जो (औसतन) अधिक सटीक है।

अधिकतम संभाव्यता

कम से कम वर्गों के प्रतिगमन और मात्रात्मक प्रतिगमन (जब पूर्ण अवशेषों को कम करके प्रदर्शन किया जाता है) को गौसियन / लाप्लास वितरित त्रुटियों के लिए संभावना समारोह को अधिकतम करने के रूप में देखा जा सकता है, और इस अर्थ में बहुत संबंधित हैं।

  • गाऊसी वितरण:

    f(x)=12πσ2e(xμ)22σ2

    वर्ग-अवशिष्ट के योग को न्यूनतम करने पर लॉग-लाइबिलिटी अधिकतम हो जाती है

    logL(x)=n2log(2π)nlog(σ)12σ2i=1n(xiμ)2sum of squared residuals

  • लाप्लास वितरण:

    f(x)=12be|xμ|b

    पूर्ण अवशिष्ट के योग को कम करते समय लॉग-लाइबिलिटी को अधिकतम किया जाता है

    logL(x)=nlog(2)nlog(b)1bi=1n|xiμ|sum of absolute residuals

Note: the Laplace distribution and the sum of absolute residuals relates to the median, but it can be generalized to other quantiles by giving different weights to negative and positive residuals.

Known error distribution

When we know the error-distribution (when the assumptions are likely true) it makes sense to choose the associated likelihood function. Minimizing that function is more optimal.

Very often the errors are (approximately) normal distributed. In that case using least squares is the best way to find the parameter μ (which relates to both the mean and the median). It is the best way because it has the lowest sample variance (lowest of all unbiased estimators). Or you can say more strongly: that it is stochastically dominant (see the illustration in this question comparing the distribution of the sample median and the sample mean).

So, when the errors are normal distributed, then the sample mean is a better estimator of the distribution median than the sample median. The least squares regression is a more optimal estimator of the quantiles. It is better than using the least sum of absolute residuals.

Because so many problems deal with normal distributed errors the use of the least squares method is very popular. To work with other type of distributions one can use the Generalized linear model. And, the method of iterative least squares, which can be used to solve GLMs, also works for the Laplace distribution (ie. for absolute deviations), which is equivalent to finding the median (or in the generalized version other quantiles).

Unknown error distribution

Robustness

The median or other quantiles have the advantage that they are very robust regarding the type of distribution. The actual values do not matter much and the quantiles only care about the order. So no matter what the distribution is, minimizing the absolute residuals (which is equivalent to finding the quantiles) is working very well.

The question becomes complex and broad here and it is dependent on what type of knowledge we have or do not have about the distribution function. For instance a distribution may be approximately normal distributed but only with some additional outliers. This can be dealt with by removing the outer values. This removal of the extreme values even works in estimating the location parameter of the Cauchy distribution where the truncated mean can be a better estimator than the median. So not only for the ideal situation when the assumptions hold, but also for some less ideal applications (e.g. additional outliers) there might be good robust methods that still use some form of a sum of squared residuals instead of sum of absolute residuals.

I imagine that regression with truncated residuals might be computationally much more complex. So it may actually be quantile regression which is the type of regression that is performed because of the reason that it is computationally simpler (not simpler than ordinary least squares, but simpler than truncated least squares).

Biased/unbiased

Another issue is biased versus unbiased estimators. In the above I described the maximum likelihood estimate for the mean, ie the least squares solution, as a good or preferable estimator because it often has the lowest variance of all unbiased estimators (when the errors are normal distributed). But, biased estimators may be better (lower expected sum of squared error).

This makes the question again broad and complex. There are many different estimators and many different situations to apply them. The use of an adapted sum of squared residuals loss function often works well to reduce the error (e.g. all kinds of regularization methods), but it may not need to work well for all cases. Intuitively it is not strange to imagine that, since the sum of squared residuals loss function often works well for all unbiased estimators, the optimal biased estimators is probably something close to a sum of squared residuals loss function.


When we know the error-distribution it makes sense to choose the associated likelihood function. Minimizing that function is more optimal. Not to say this is wrong, but should probably be qualified. Of course, this relates once again to my question (that you answered) on optimal estimators under different loss functions.
Richard Hardy

It is the best way because it has the lowest sample variance. Variance is generally not a sensible loss function because it neglects bias; a sensible counterpart would be expected squared error (a.k.a. mean square error) that takes account of both variance and bias. The least squares regression is a more optimal estimator of the quantiles. Median – yes, but other ones? And if yes, then why? In any case, yours is a very nice answer!
Richard Hardy

1
@RichardHardy this topic is so broad. Indeed the error = variance + bias. I assumed bias of the sample mean is the same as the sample median (or more general: least sum of squared residuals and least sum of absolute residuals have the same bias). This is true given various error distributions (e.g. symmetric error distributions), but indeed the questions becomes more complex for other cases. (the point was mainly that errors are often normal distributed and this makes least squares regression favourable)
Sextus Empiricus

1
The same (the complexity of the question) is true when we do not consider the median, but instead some other quantile. In the case of normal distributed errors I believe that the MLE gives the best result for whatever quantile, but I agree that it's intuition. Again the problem is very broad (dependency on the number of samples, type of distribution of errors and certainty about it, etc,).
Sextus Empiricus

1
a broken clock is exactly right twice a day, I would not call the MLE a broken clock. Sure, when you know the problem well, then you can introduce some variance reducing bias to improve the overall error. This is not necessarily moving to a different (quantile) type of regression, you can also just put some jam or honey on the least squares bread and butter. If you do wish to compare MLE to a broken clock then it is a clock that happens to be standing still around the time that we make the most use of.
Sextus Empiricus

2

Linear regression (LR) boils down to least squares optimization when computing its coefficients. This implies a symmetry in the deviations from the regression model. A good explanation of quantile regression (QR) is in https://data.library.virginia.edu/getting-started-with-quantile-regression/.

If LR assumptions (needed for inference: p-values, confidence intervals, etc.) are satisfied QR and LR predictions will be similar. But if the assumptions are strongly violated, your standard LR inference will be wrong. So a 0.5 quantile (median) regression presents an advantage over LR. It also gives more flexibility in providing regression for other quantiles. The equivalent for linear models would be a confidence bound computed from a LR (although this would be wrong if iid is strongly violated).

So what is the advantage of LR? Of course it's easier to compute but if your data set is of reasonable size that may not be very noticeable. But more importantly, the LR inference assumptions provide information that lowers uncertainty. As a result, LR confidence intervals on predictions will typically be narrower. So if there is strong theoretical support for the assumptions, narrower confidence intervals may be an advantage.


2

Linear regression is used to estimate the conditional mean response given the data, i.e. E(Y|X) where Y is the response and X is the data. The regression tells us that E(Y|X)=Xβ. There are certain assumptions (you can find them in any stats text) for inference to be valid. If these are satisfied then generally the standard estimator for β is the BLUE (best linear unbiased estimator -- see Gauss-Markov theorem).

Quantile regression can be used to estimate ANY quantile of the conditional distribution including the median. This provides potentially a lot more information than the mean about the conditional distribution. If the conditional distribution is not symmetric or the tails are possibly thick (e.g. risk analysis), quantile regression is helpful EVEN if all the assumptions of linear regression are satisfied.

Of course, it is numerically more intensive to carry out quantile estimation relative to linear regression but it is generally much more robust (e.g. just as the median is more robust than the mean to outliers). In addition, it is appropriate when linear regression is not -- e.g. for censored data. Inference may be trickier as direct estimation of variance-covariance matrix may be difficult or computationally expensive. In those cases, one can bootstrap.

हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.