एकरूपता के लिए समान रूप से वितरित भार उत्पन्न करें?


14

मिश्रण मॉडलिंग और रैखिक रूप से आधार कार्यों को संयोजित करने जैसे अनुप्रयोगों में भार का उपयोग करना आम है। बाट wi अक्सर का पालन करना चाहिए wi 0 और iwi=1 । मैं ऐसे वैक्टरों के एक समान वितरण से बेतरतीब ढंग से वेट वेक्टर चुनना चाहूंगा w=(w1,w2,)

यह उपयोग करने के लिए आकर्षक हो सकता है wi=ωijωj जहांωi(0, 1), तथापि के रूप में नीचे टिप्पणी में चर्चा की, के वितरणwसमान नहीं है।

हालांकि, बाधा iwi=1 , ऐसा लगता है कि समस्या की अंतर्निहित गतिशीलता n1 , और यह कि कुछ वितरण के अनुसार n - 1 मापदंडों को चुनकर को चुनना संभव हो सकता है और फिर कंप्यूटिंग करना चाहिए। उन मापदंडों से संबंधित w (क्योंकि एक बार n - 1 वजन निर्दिष्ट किए जाते हैं, शेष वजन पूरी तरह से निर्धारित होता है)।wn1wn1

समस्या के समान प्रतीत होता है क्षेत्र बिंदु पिकिंग समस्या (बल्कि 3-वैक्टर जिसका चुनने की तुलना, आदर्श एकता है, मैं लेने के लिए चाहते हैं n -vectors जिसका 2n आदर्श एकता है)।1

धन्यवाद!


3
आपकी विधि सिंप्लेक्स पर समान रूप से वितरित वेक्टर उत्पन्न नहीं करती है। जो आप सही तरीके से करना चाहते हैं, वह करने के लिए सबसे सरल तरीका है iid E x p ( 1 ) यादृच्छिक चर उत्पन्न करना और फिर उनकी राशि से उन्हें सामान्य करना। आप केवल n - 1 वेरिएंट को ड्रा करने के लिए कुछ अन्य विधि पाकर इसे करने की कोशिश कर सकते हैं, लेकिन मुझे दक्षता ट्रेडऑफ़ के बारे में संदेह है क्योंकि E x p ( 1 ) वेरिएंट U ( 0 , 1 ) वेरिएंट से बहुत कुशलता से उत्पन्न हो सकता है ।nExp(1)n1Exp(1)U(0,1)
कार्डिनल

जवाबों:


22

Choose x[0,1]n1 uniformly (by means of n1 uniform reals in the interval [0,1]). Sort the coefficients so that 0x1xn1. Set

w=(x1,x2x1,x3x2,,xn1xn2,1xn1).

Because we can recover the sorted xi by means of the partial sums of the wi, the mapping xw is (n1)! to 1; in particular, its image is the n1 simplex in Rn. Because (a) each swap in a sort is a linear transformation, (b) the preceding formula is linear, and (c) linear transformations preserve uniformity of distributions, the uniformity of x implies the uniformity of w on the n1 simplex. In particular, note that the marginals of w are not necessarily independent.

3D point plot

This 3D point plot shows the results of 2000 iterations of this algorithm for n=3. The points are confined to the simplex and are approximately uniformly distributed over it.


क्योंकि इस एल्गोरिथ्म के निष्पादन समय है , यह बड़े के लिए अक्षम है n । लेकिन यह सवाल का जवाब देता है! एक बेहतर तरीका (सामान्य रूप से) n - 1 -simplex पर समान रूप से वितरित मान उत्पन्न करने के लिए है , अंतराल [ 0 , 1 ] , गणना पर n वर्दी वास्तविक ( x 1 , , x n ) को आकर्षित करना है।O(nlog(n))O(n)nn1n(x1,,xn)[0,1]

yi=log(xi)

(which makes each yi positive with probability 1, whence their sum is almost surely nonzero) and set

w=(y1,y2,,yn)/(y1+y2++yn).

This works because each yi has a Γ(1) distribution, which implies w has a Dirichlet(1,1,1) distribution--and that is uniform.

[3D point plot 2]


1
@Chris If by "Dir(1)" you mean the Dirichlet distribution with parameters (α1,,αn) = (1,1,,1), then the answer is yes.
whuber

1
(+1) One minor comment: The intuition is excellent. Care in interpreting (a) may need to be taken, as it seems that the "linear transformation" in that part is a random one. However, this is easily worked around at the expense of additional formality by using exchangeability of the generating process and a certain invariance property.
cardinal

1
More explicitly: For distributions with a density f, the density of the order statistics of an iid sample of size n is n!f(x1)f(xn)1(x1<x2<<xn). In the case of f=1[0,1](x)आदेश आँकड़ों का वितरण एक बहुवचन पर समान है। इस बिंदु से लिया गया, शेष परिवर्तन नियतात्मक हैं और परिणाम निम्नानुसार है।
कार्डिनल

1
In1=[0,1]n1 is carved into (n1)! regions, of which one is distinguished from the others, and there's a predetermined affine bijection between each region and the distinguished one. Whence, the only additional fact we need is that a uniform distribution on a region is uniform on any measurable subset of it, which is a complete triviality.
whuber

2
@whuber: Interesting remarks. Thanks for sharing! I always appreciate your insightful thoughts on such things. Regarding my previous comment on "random linear transformation", my point was that, at least through x, the transformation used depends on the sample point ω. Another way to think of it is there is a fixed, predetermined function T:Rn1Rn1 such that w=T(x), but I wouldn't call that function linear, though it is linear on subsets that partition the (n1)-cube. :)
cardinal

1
    zz <- c(0, log(-log(runif(n-1))))
    ezz <- exp(zz)
    w <- ezz/sum(ezz)

The first entry is put to zero for identification; you would see that done in multinomial logistic models. Of course, in multinomial models, you would also have covariates under the exponents, rather than just the random zzs. The distribution of the zzs is the extreme value distribution; you'd need this to ensure that the resulting weights are i.i.d. I initially put rnormals there, but then had a gut feeling that this ain't gonna work.


That doesn't work. Did you try looking at a histogram?
cardinal

4
Your answer is now almost correct. If you generate n iid Exp(1) and divide each by the sum, then you will get the correct distribution. See Dirichlet distribution for more details, though it doesn't discuss this explicitly.
cardinal

1
Given the terminology you are using, you sound a little confused.
cardinal

2
Actually, the Wiki link does discuss this (fairly) explicitly. See the second paragraph under the Support heading.
cardinal

1
This characterization is both too restrictive and too general. It is too general in that the resulting distribution of w must be "uniform" on the n1 simplex in Rn. It is too restrictive in that the question is worded generally enough to allow that w be some function of an n1-variate distribution, which in turn presumably, but not necessarily, consists of n1 independent (and perhaps iid) variables.
whuber

0

The solution is obvious. The following MathLab code provides the answer for 3 weights.

function [  ] = TESTGEN( )
SZ  = 1000;
V  = zeros (1, 3);
VS = zeros (SZ, 3);
for NIT=1:SZ   
   V(1) = rand (1,1);     % uniform generation on the range 0..1
   V(2) = rand (1,1) * (1 - V(1));
   V(3) = 1 - V(1) - V(2);  
   PERM = randperm (3);    % random permutation of values 1,2,3
   for NID=1:3
         VS (NIT, NID) = V (PERM(NID));
    end
end 
figure;
scatter3 (VS(:, 1), VS(:,2), VS (:,3));
end

enter image description here


1
Your marginals do not have the correct distribution. Judging from the Wikipedia article on the Dirichlet distribution (random number generation section, which has the algorithm you have coded), you should be using a beta(1,2) distribution for V(1), not a uniform[0,1] distribution.
soakley

It does appear that the density increases in the corners of this tilted triangle. Nonetheless, it provides a nice geometric display of the problem.
DWin
हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.