परिशुद्धता (यदि कुछ भी हो) के बारे में विश्वास अंतराल क्या कहते हैं?


31

मोरे एट अल (2015) का तर्क है कि विश्वास अंतराल भ्रामक हैं और उनकी समझ से संबंधित कई पूर्वाग्रह हैं। दूसरों के बीच, वे निम्नलिखित के रूप में सटीक गिरावट का वर्णन करते हैं:

परिशुद्धता की गिरावट
आत्मविश्वास अंतराल की चौड़ाई पैरामीटर के बारे में हमारे ज्ञान की सटीकता को इंगित करती है। संकीर्ण आत्मविश्वास अंतराल सटीक ज्ञान दिखाते हैं, जबकि व्यापक आत्मविश्वास त्रुटियां ज्ञान को बाधित करती हैं।

एक अनुमान की सटीकता और एक आत्मविश्वास अंतराल के आकार के बीच कोई आवश्यक संबंध नहीं है। यह देखने का एक तरीका दो शोधकर्ताओं की कल्पना करना है - एक वरिष्ठ शोधकर्ता और एक पीएचडी छात्र - एक प्रयोग से प्रतिभागियों के डेटा का विश्लेषण कर रहे हैं । पीएचडी छात्र के लाभ के लिए एक अभ्यास के रूप में, वरिष्ठ शोधकर्ता प्रतिभागियों को बेतरतीब ढंग से दो सेटों में विभाजित करने का फैसला करता है ताकि वे अलग से आधे डेटा सेट का विश्लेषण कर सकें। बाद में एक बैठक में एक साथ दो शेयर एक और उनके विद्यार्थी का मतलब के लिए विश्वास के अंतराल। पीएचडी छात्र का सीआई , और वरिष्ठ शोधकर्ता का 95%% CI 53 \ pm 4 है25 टी 95 % 52 ± 2 95 % 53 ± 45025t95%52±295%53±4

वरिष्ठ शोधकर्ता नोट करते हैं कि उनके परिणाम व्यापक रूप से सुसंगत हैं, और वे अपने दो संबंधित बिंदु अनुमानों का समान रूप से भारित माध्य का उपयोग कर सकते हैं, 52.5 , वास्तविक माध्य के समग्र अनुमान के रूप में।

हालांकि, पीएचडी की छात्रा का तर्क है कि उनके दो साधनों का समान रूप से भार नहीं होना चाहिए: वह नोट करती है कि उसका सीआई आधा चौड़ा है और तर्क देती है कि उसका अनुमान अधिक सटीक है और इस तरह उसका वजन अधिक होना चाहिए। उसके सलाहकार ने कहा कि यह सही नहीं हो सकता है, क्योंकि दो साधनों के असमान भार से अनुमान पूर्ण डेटा सेट के विश्लेषण से अनुमान से अलग होगा, जो 52.5 होना चाहिए 52.5। पीएचडी छात्र की गलती यह मान रही है कि सीआई सीधे पोस्ट-डेटा परिशुद्धता का संकेत देते हैं।

ऊपर दिया गया उदाहरण भ्रामक प्रतीत होता है। यदि हम बेतरतीब ढंग से एक नमूने को आधे में विभाजित करते हैं, तो दो नमूनों में, तो हम दोनों नमूना साधनों और मानक त्रुटियों के करीब होने की उम्मीद करेंगे। ऐसे मामले में भारित माध्य (उदाहरण के लिए उलटा त्रुटियों से भारित) और सरल अंकगणितीय माध्य का उपयोग करने के बीच कोई अंतर नहीं होना चाहिए। हालाँकि, यदि अनुमानों में भिन्नता है और किसी एक नमूने में त्रुटियां काफी अधिक हैं, तो यह इस तरह के नमूने के साथ "मुद्दों" का सुझाव दे सकता है।

स्पष्ट रूप से, उपरोक्त उदाहरण में, नमूना आकार समान हैं इसलिए "माया वापस लेना" साधन का मतलब लेकर डेटा पूरे नमूने का मतलब लेने के समान है। समस्या यह है कि पूरा उदाहरण बीमार परिभाषित तर्क का पालन करता है कि नमूना पहले भागों में विभाजित किया गया है, फिर अंतिम अनुमान के लिए फिर से वापस शामिल होने के लिए।

उदाहरण को फिर से उलझाया जा सकता है ताकि इसके विपरीत निष्कर्ष निकाला जा सके:

शोधकर्ता और छात्र ने अपने डेटासेट को दो हिस्सों में विभाजित करने और उन्हें स्वतंत्र रूप से विश्लेषण करने का निर्णय लिया। बाद में, उन्होंने अपने अनुमानों की तुलना की और यह सामने आया कि नमूने का मतलब है कि उनकी गणना बहुत अलग थी, छात्र के अनुमान की अतिरिक्त मानक त्रुटि बहुत अधिक थी। छात्र को डर था कि यह उसके अनुमान की सटीकता के साथ मुद्दों का सुझाव दे सकता है, लेकिन शोधकर्ता ने अनुमान लगाया कि विश्वास अंतराल और परिशुद्धता के बीच कोई संबंध नहीं है, इसलिए दोनों अनुमान समान रूप से विश्वसनीय हैं और वे उनमें से किसी एक को प्रकाशित कर सकते हैं, बेतरतीब ढंग से चुना जाता है, उनके अंतिम अनुमान के रूप में।

इसे और अधिक औपचारिक रूप से, "मानक" आत्मविश्वास अंतराल, जैसे कि छात्र का , त्रुटियों पर आधारित हैt

x¯±c×SE(x)

जहाँ कुछ स्थिर है। ऐसे मामले में, वे सीधे सटीक से संबंधित हैं, वे नहीं हैं ..?c

तो मेरा सवाल यह
है कि क्या वास्तव में शुद्धता में गिरावट है? परिशुद्धता के बारे में विश्वास अंतराल क्या कहते हैं?


मोरे, आर।, होकेस्ट्रा, आर।, राइडर, जे।, ली, एम।, और वेगेनमेकर्स, ई। ई। (2015)। आत्मविश्वास अंतराल में विश्वास रखने की गिरावट। साइकोनोमिक बुलेटिन एंड रिव्यू, 1-21। https://learnbayes.org/papers/confidenceIntervalsFallacy/


2
मुझे लगता है कि यदि सटीक को विचरण के पारस्परिक के रूप में परिभाषित किया गया है, तो इन CI की चौड़ाई केवल सटीकता का अनुमान दर्शाती है । माध्य के लिए बायेसियन विश्वसनीय अंतराल की चौड़ाई जितनी सटीकता के बारे में अनिश्चितता को दर्शाती है।
Scortchi - को पुनः स्थापित मोनिका

@Scortchi तो यह कहने का एक और तरीका है कि अक्सर होने वाली विधियां सामान्य रूप से अविश्वसनीय हैं ..?
टिम

7
मैं कहूंगा कि यह एक लाल हेरिंग है। मैंने सिर्फ 10,000 प्रयोगों का अनुकरण किया, प्रत्येक में 52.5 और एसडी 7.5 के साथ सामान्य वितरण से 50 नमूने खींचे (ताकि आकार 25 के उपसमूह लगभग , की सीआईएस उपज±3)। मैंने फिर इन नमूनों को दो में विभाजित किया और जाँच की कि सीआई कितनी बार 2 या उससे अधिक अंतर करते हैं। यह 10,000 मामलों में से केवल 6 में हुआ। किसी को भी इस अलग CI देखने से बहुत कुछ संदेह होगा कि सबसिंपल चयन में टूट गया। 7.5/25=5±3
एस। कोलासा - मोनिका

@StephanKolassa मैं बिल्कुल वैसा ही अनुकरण किया था ठीक उसी निष्कर्ष पर नेतृत्व कि - यह कैसे सवाल उभरा :) है
टिम

2
@ समय: मैं वास्तव में नहीं जानता कि वे क्या पाने की कोशिश कर रहे हैं: यदि मीन के अनुमान की सही सटीकता एक अज्ञात पैरामीटर मान के एक फ़ंक्शन के रूप में कल्पना की जाती है, तो दो उप-नमूनों के लिए आम है, तो मैं यह मत सोचिए कि किसी को भी यह अनुमान नहीं होगा कि इन दो CI की चौड़ाई में अंतर इसलिए अनुमानों की शुद्धता में अंतर नहीं दिखाता है (जब तक कि उन्होंने सबसैंपलिंग प्रक्रिया पर संदेह नहीं किया हो)। भिन्नता के देखे गए गुणांक पर CI सशर्त के कवरेज गुणों को ध्यान में रखते हुए एक बेहतर रणनीति हो सकती है।
Scortchi - को पुनः स्थापित मोनिका

जवाबों:


16

कागज में, हम वास्तव में कई तरीकों से सटीक गिरावट का प्रदर्शन करते हैं। आप जिस बारे में पूछ रहे हैं - पहला पेपर में - उदाहरण यह प्रदर्शित करने के लिए है कि एक सरलीकृत "CI = परिशुद्धता" गलत है। यह कहना नहीं है कि कोई भी सक्षम व्यक्ति, बायेसियन या संभावनावादी इससे भ्रमित होंगे।

यहां यह देखने का एक और तरीका है कि क्या हो रहा है: यदि हमें सिर्फ सीआई बताया गया था, तो हम अभी भी नमूनों में जानकारी को एक साथ जोड़ नहीं पाएंगे; हमें को जानना होगा , और इससे हम CI को to x और s 2 में विघटित कर सकते हैं , और इस प्रकार दो नमूनों को ठीक से जोड़ सकते हैं। हमें यह करने का कारण यह है कि सीआई में जानकारी उपद्रव पैरामीटर पर सीमांत है। हमें यह ध्यान रखना चाहिए कि दोनों नमूनों में एक ही उपद्रव पैरामीटर के बारे में जानकारी हो । इसमें दोनों s 2 मानों की गणना करना शामिल है , जिससे उन्हें σ 2 का समग्र अनुमान मिल जाता है , फिर एक नए CI की गणना होती है।Nx¯s2s2σ2

सटीक गिरावट के अन्य प्रदर्शनों के लिए, देखें

  • वेल्च (1939) सेक्शन (पनडुब्बी) में कई सीआई, जिनमें से एक में @dsaxton द्वारा उल्लिखित "तुच्छ" CI शामिल है। इस उदाहरण में, इष्टतम CI संभावना की चौड़ाई को ट्रैक नहीं करता है, और CI के कई अन्य उदाहरण हैं जो या तो नहीं करते हैं।
  • तथ्य यह है कि CI - यहां तक ​​कि "अच्छा" CI खाली हो सकता है, "सटीक" अनंत परिशुद्धता का संकेत देता है

पहेली का उत्तर यह है कि "सटीक", कम से कम जिस तरह से सीआई अधिवक्ता इसके बारे में सोचते हैं (एक पैरामीटर के लिए एक अनुमान कितना "करीब" का प्रायोगिक मूल्यांकन है) बस एक विशेषता नहीं है कि विश्वास अंतराल सामान्य रूप से है , और वे करने के लिए नहीं थे। विशेष रूप से विश्वास प्रक्रियाओं ... या नहीं हो सकता है।

See also the discussion here: http://andrewgelman.com/2011/08/25/why_it_doesnt_m/#comment-61591


7
(+1) Great to hear from the actual author! I agree that CI's have several philosophical issues, as do ALL forms of inference (just different issues)...I like how you pointed out that it's the specific confidence procedure that you need to consider, not just that it is a CI at such and such level.

4
(+1) Thanks for your response! I agree with arguments that you state in your paper that CI's do not have to say anything about precision, however calling this a fallacy gives impression that you indicate that they do not say anything about precision -- and this is not the same... Moreover: in your opinion, it the "precision fallacy" a real-life-analysis issue..? I agree that misinterpreting CI's is, but in this case, I'm not so sure...
Tim

2
"Real-life" impact is difficult to quantify, particularly b/c one could talk about impact in a specific analysis scenario or across a field. For just computing a CI on a Gaussian mean the fallacy is not too dangerous. But consider the list of cites on p117 (para. starts "How often will Steiger’s confidence procedure..."). The intervals in those published papers is likely "too" narrow. The fallacy has other impacts: a lack of thoroughness on generators of new CI procedures (check any paper with a new CI), reluctance of analysts to move away from Gaussian assumptions when needed, and others.
richarddmorey

I am tantalized by these parantheses. What is this "submarine"?
Superbest

1
But if the width of the likelihood function for θ conditional on the sample range is supposed to truly reflect precision in the submarine example, then why shouldn't the width of the likelihood function for the mean conditional on the sample variance truly reflect the precision in this example. Suppose four bubbles from the submarine were observed & randomly split into two sets of two ...
Scortchi - Reinstate Monica

13

First of all, lets limit ourselves to CI procedures that only produce intervals with strictly positive, finite widths (to avoid pathological cases).

In this case, the relationship between precision and CI width can be theoretically demonstrated. Take an estimate for the mean (when it exists). If your CI for the mean is very narrow, then you have two interpretations: either you had some bad luck and your sample was too tightly clumped (a priori 5% chance of that happening), or your interval covers the true mean (95% a priori chance). Of course, the observed CI can be either of these two, but, we set up our calculation so that the latter is far more likely to have occurred (i.e., 95% chance a priori)...hence, we have a high degree of confidence that our interval covers the mean, because we set things up probabilistically so this is so. Thus, a 95% CI is not a probability interval (like a Bayesian Credible Interval), but more like a "trusted adviser"...someone who, statistically, is right 95% of the time, so we trust their answers even though any particular answer could very well be wrong.

In the 95% of cases where it does cover the actual parameter, then the width tells you something about the range of plausible values given the data (i.e., how well you can bound the true value), hence it acts like a measure of precision. In the 5% of cases where it doesn't, then the CI is misleading (since the sample is misleading).

So, does 95% CI width indicate precision...I'd say there's a 95% chance it does (provided your CI width is positive-finite) ;-)

What is a sensible CI?

In response to the original author's post, I've revised my response to (a) take into account that the "split sample" example had a very specific purpose, and (b) to provide some more background as requested by the commenter:

In an ideal (frequentist) world, all sampling distributions would admit a pivotal statistic that we could use to get exact confidence intervals. What is so great about pivotal statistics? Their distribution can be derived without knowing the actual value of the parameter being estimated! In these nice cases, we have an exact distribution of our sample statistic relative to the true parameter (although it may not be gaussian) about this parameter.

Put more succinctly: We know the error distribution (or some transformation thereof).

It is this quality of some estimators that allows us to form sensible confidence intervals. These intervals don't just satisfy their definitions...they do so by virtue of being derived from the actual distribution of estimation error.

The Gaussian distribution and the associated Z statistic is the canonical example of the use of a pivotal quantity to develop an exact CI for the mean. There are more esoteric examples, but this is generally the one that motivates "large sample theory", which is basically an attempt apply the theory behind Gaussian CIs to distributions that do not admit a true pivotal quantity. In these cases, you'll read about approximately pivotal, or asymptotically pivotal (in the sample size) quantities or "approximate" confidence intervals...these are based on likelihood theory-- specifically, the fact that the error distribution for many MLEs approaches a normal distribution.

Another approach for generating sensible CIs is to "invert" a hypothesis test. The idea is that a "good" test (e.g., UMP) will result in a good (read: narrow) CI for a given Type I error rate. These don't tend to give exact coverage, but do provide lower-bound coverage (note: the actual definition of a X%-CI only says it must cover the true parameter at least X% of the time).

The use of hypothesis tests does not directly require a pivotal quantity or error distribution -- its sensibility is derived from the sensibility of the underlying test. For example, if we had a test whose rejection region had length 0 5% of the time and infinite length 95% of the time, we'd be back to where we were with the CI's -- but its obvious that this test is not conditional on the data, and hence will not provide any information on the underlying parameter being tested.

This broader idea - that an estimate of precision should be conditional on the data, goes back to Fischer and the idea of ancillary statistics. You can be sure that if the result of your test or CI procedure is NOT conditioned by the data (i.e., its conditional behavior is the same as its unconditional behavior), then you've got a questionable method on your hands.


2
It would be great if you could elaborate on what you added in a "Note". This is I think the crux of the whole discussion: one can devise very weird but valid frequentist procedures for constructing CIs under which the width of CI has no relationship to any precision whatsoever. Hence one can argue, as Morey et al. do, that CIs are misled in principle. I do agree with you that commonly used CI procedures are more reasonable than that, but one needs to be clear on what makes them such.
amoeba says Reinstate Monica

@amoeba I added some more explanation on why not all CIs are created equal...the main idea is ancillarity, the second is the role of an error distribution (or an approximation to it)

Thanks for the update. One thing that I still don't find very clear in your answer, is that in the first paragraph you don't say anything about CI width; you are just talking about it containing or not containing the true population parameter. Everything there is correct even in "pathological" cases. Then in you say that yes, the width indicates precision, but you haven't provided any arguments for that (at that point). In the later discussion you explain it more though.
amoeba says Reinstate Monica

@amoeba I guess my post could do with a little more formatting. The basic logic is this (assuming we are using a "reasonable" CI procedure as I outline): there is an a priori 95% chance that the interval will contain the true parameter. After we collect data, we have our actual interval (finite, non-zero width). IF it contains the true parameter, then the width expresses the range of plausible values it could be, hence the width bounds the range of the parameter. HOWEVER, in the 5% of cases where the interval does not contain the value, then the interval is misleading.

@amoeba updated post to better emphasize the connection between CI width and precision.

8

I think the precision fallacy is a true fallacy, but not necessarily one we should care about. It isn't even that hard to show it's a fallacy. Take an extreme example like the following: we have a sample {x1,x2,,xn} from a normal(μ,σ2) distribution and wish to construct a confidence interval on μ, but instead of using the actual data we take our confidence interval to be either (,) or {0} based on the flip of a biased coin. By using the right bias we can get any level of confidence we like, but obviously our interval "estimate" has no precision at all even if we end up with an interval that has zero width.

The reason why I don't think we should care about this apparent fallacy is that while it is true that there's no necessary connection between the width of a confidence interval and precision, there is an almost universal connection between standard errors and precision, and in most cases the width of a confidence interval is proportional to a standard error.

I also don't believe the author's example is a very good one. Whenever we do data analysis we can only estimate precision, so of course the two individuals will reach different conclusions. But if we have some privileged knowledge, such as knowing that both samples are from the same distribution, then we obviously shouldn't ignore it. Clearly we should pool the data and use the resulting estimate of σ as our best guess. It seems to me this example is like the one above where we only equate confidence interval width with precision if we've allowed ourselves to stop thinking.


Good point about the randomly infinite CIs...definitely show that confidence is a different concept than precision. I probably should have caveated my response by saying that I am assuming a likelihood-based CI, where width is related to curvature of log likelihood, which is an approximation of standard error...your post points out that there are CIs that technically achieve coverage but in a very counterintuitive way.

A related issue (albeit very interesting one) is that of relevant subsets for a CI...for example, if you condition on ancillary statistics, your CI coverage may change (a case in point is that the conditional coverage of a t-interval changes based on the variability of your sample). Here's the link to the paper: jstor.org/stable/2242024?seq=1#page_scan_tab_contents

@Bey There's another less extreme example from this paper involving a submarine: webfiles.uci.edu/mdlee/fundamentalError.pdf. It's an interesting one, but again appears to be a case of an interpretation that no intelligent person would make.
dsaxton

Agreed....can't leave common sense at the door with stats...even in Machine Learning (somewhat of a misnomer)

1
@richarddmorey: Okay, I see. Then it was just an unfortunate formulation! I did not take it out of the context on purpose; I honestly read this sentence as a summary and generalization to any situation (not realizing that "in that example" was assumed in that sentence). Consider leaving a clarification comment in that other thread with my accusation (that already got some upvotes).
amoeba says Reinstate Monica

4

I think the demonstrable distinction between "confidence intervals" and "precision" (see answer from @dsaxton) is important because that distinction points out problems in common usage of both terms.

Quoting from Wikipedia:

The precision of a measurement system, related to reproducibility and repeatability, is the degree to which repeated measurements under unchanged conditions show the same results.

One thus might argue that frequentist confidence intervals do represent a type of precision of a measurement scheme. If one repeats the same scheme, the 95% CI calculated for each repetition will contain the one true value of the parameter in 95% of the repetitions.

This, however, is not what many people want from a practical measure of precision. They want to know how close the measured value is to the true value. Frequentist confidence intervals do not strictly provide that measure of precision. Bayesian credible regions do.

Some of the confusion is that, in practical examples, frequentist confidence intervals and Bayesian credible regions "will more-or-less overlap". Sampling from a normal distribution, as in some comments on the OP, is such an example. That may also be the case in practice for some of the broader types of analyses that @Bey had in mind, based on approximations to standard errors in processes that have normal distributions in the limit.

If you know that you are in such a situation, then there may be no practical danger in interpreting a particular 95% CI, from a single implementation of a measurement scheme, as having a 95% probability of containing the true value. That interpretation of confidence intervals, however, is not from frequentist statistics, for which the true value either is or is not within that particular interval.

If confidence intervals and credible regions differ markedly, that Bayesian-like interpretation of frequentist confidence intervals can be misleading or wrong, as the paper linked above and earlier literature referenced therein demonstrate. Yes, "common sense" might help avoid such misinterpretations, but in my experience "common sense" isn't so common.

Other CrossValidated pages contain much more information on confidence intervals and the differences between confidence intervals and credible regions. Links from those particular pages are also highly informative.


This is a good point....I think the closest think to the common interpretation of "precision" is more like RMS error. An unbiased but highly variable estimate is seen as no better than a low-variability but highly biased estimator...both cannot be relied upon to give an estimate close to the true value.

+1, but I am not sure I share your pessimistic view on "common sense". There is a great quote from Jeffreys about "common sense" in frequentist statistics: I have in fact been struck repeatedly in my own work, after being led on general principles to the solution of a problem, to find that Fisher had already grasped the essentials by some brilliant piece of common sense.
amoeba says Reinstate Monica

@amoeba consider Laplace's claim that "Probability theory is nothing but common sense reduced to calculation." The efforts devoted since then to probability theory at least show that the implications of common sense aren't always immediately obvious.
EdM

@amoeba: Fisher rejected CIs, and identifying Fisher as freq-ist. is misleading. His logic of intervals (fiducial) was similar to obj. Bayes, and he identifies probability with rational uncertainty. He says this: "It is sometimes asserted that the fiducial method generally leads to the same results as the method of [CIs]. It is difficult to understand how this can be so, since it has been firmly laid down that the method of confidence intervals does not lead to probability statements about the parameters of the real world, whereas the fiducial argument exists for this purpose." (Fisher, 1959)
richarddmorey

@richard, Thanks for the clarification. Fisher is known to have said contradictory things throughout his long career and to have changed his opinion a couple of times. I am not really familiar with his fiducial theory so cannot comment on that. My unconscious assumption was that Jeffreys in that quote was referring to the Fisher's "frequentist period" but I have no evidence for that. In my (limited!) experience, nobody ever uses fiducial inference. Nobody. Ever. Whereas frequentist techniques are used all the time and many go back to Fisher. Hence the association existing in my mind.
amoeba says Reinstate Monica

1

@Bey has it. There is no necessary connection between scores and performance nor price and quality nor smell and taste. Yet the one usually informs about the other.

One can prove by induction that one cannot give a pop quiz. On close examination this means one cannot guarantee the quiz is a surprise. Yet most of the time it will be.

It sounds like Morey et al show that there exist cases where the width is uninformative. Although that is sufficient to claim "There is no necessary connection between the precision of an estimate and the size of a confidence interval", it is not sufficient to further conclude that CIs generally contain no information about precision. Merely that they are not guaranteed to do so.

(Insufficient points to + @Bey's answer. )

हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.