रैखिक-प्रतिगमन गुणांक अनुमानों का विश्लेषणात्मक समाधान


9

मैं मैट्रिक्स नोटेशन को समझने की कोशिश कर रहा हूं, और वैक्टर और मैट्रिस के साथ काम कर रहा हूं।

अभी मैं समझना चाहता हूं कि गुणांक का वेक्टर कैसे अनुमान लगाता है β^ एकाधिक प्रतिगमन में गणना की जाती है।

मूल समीकरण लगता है

ddβ(yXβ)(yXβ)=0.

अब मैं यहाँ एक वेक्टर लिए कैसे हल करूँगा ?β

संपादित करें : रुको, मैं फंस गया हूं। मैं यहाँ हूँ और पता नहीं कैसे जारी रखने के लिए:

ddβ((y1y2yn)(1x11x12x1p1x21x22x2p1xn1xn2xnp)(β0β1βp))((y1y2yn)(1x11x12x1p1x21x22x2p1xn1xn2xnp)(β0β1βp))

ddβi=1n(yi(1xi1xi2xip)(β0β1βp))2

साथ सभी के लिए अवरोधन किया जा रहा है:xi0=1i

ddβi=1n(yik=0pxikβk)2

क्या मुझे आपसे सही दिशा निर्देशन मिलेगा?


@GaBorgulya, संपादन के लिए धन्यवाद, के बारे में पता नहीं था smallmatrix, इसलिए संपादित करने की कोशिश नहीं की, क्योंकि कई लाइनों में सूत्र को तोड़ने के सामान्य समाधान ने यहां काम नहीं किया होगा।
एमपिकटस

जवाबों:


12

हमारे पास है

ddβ(yXβ)(yXβ)=2X(yXβ)

इसे घटकों के साथ स्पष्ट रूप से समीकरण लिखकर दिखाया जा सकता है। उदाहरण के लिए, लिखें(β1,,βp) के बजाय β। फिर सम्मान के साथ व्युत्पत्ति लेते हैंβ1, β2,, ... βpऔर जवाब पाने के लिए सब कुछ ढेर। एक त्वरित और आसान चित्रण के लिए, आप के साथ शुरू कर सकते हैंp=2

With experience one develops general rules, some of which are given, e.g., in that document.

Edit to guide for the added part of the question

With p=2, we have

(yXβ)(yXβ)=(y1x11β1x12β2)2+(y2x21β1x22β2)2

The derivative with respect to β1 is

2x11(y1x11β1x12β2)2x21(y2x21β1x22β2)

Similarly, the derivative with respect to β2 is

2x12(y1x11β1x12β2)2x22(y2x21β1x22β2)

Hence, the derivative with respect to β=(β1,β2) is

(2x11(y1x11β1x12β2)2x21(y2x21β1x22β2)2x12(y1x11β1x12β2)2x22(y2x21β1x22β2))

Now, observe you can rewrite the last expression as

2(x11x21x12x22)(y1x11β1x12β2y2x21β1x22β2)=2X(yXβ)

Of course, everything is done in the same way for a larger p.


Awesome, I was looking for exactly that type of pdf. Thanks a ton!
Alexander Engelhardt

Oh, I thought I could do it myself now, but I can't. Can you tell me if my steps are right or if I should take "another way" to solve this?
Alexander Engelhardt

@Alexx Hardt: My first equation in the edit is the same as your very last equation in the particular case where p = 2. So, you can mimic my calculations for components 3, 4, ..., p.
ocram

Thanks again :) I think I'll actually use all three suggestions. I'm building a .pdf which explains and sums up basic stats matrix algebra, because I somehow never wanted to learn it when I learned it in my classes. To solve it with three different ways will help me understand it better, I hope.
Alexander Engelhardt

Oh, but this is for p=2 and n=2, right? I'll write it down with n=3 I think.
Alexander Engelhardt

13

You can also use formulas from Matrix cookbook. We have

(yXβ)(yXβ)=yyβXyyXβ+βXXβ

Now take derivatives of each term. You might want to notice that βXy=yXβ. The derivative of term yy with respect to β is zero. The remaining term

βXXβ2yXβ

is of form of function

f(x)=xAx+bx,

in formula (88) in the book in page 11, with x=β, A=XX and b=2Xy. The derivative is given in the formula (89):

fx=(A+A)x+b

so

β(yXβ)(yXβ)=(XX+(XX))β2Xy

Now since (XX)=XX we get the desired solution:

XXβ=Xy

+1 mpiktas: Your solution is more ingenious than mine and I think it should be used in more complex practical situations.
ocram

1
@ocram, thanks. I would not call it ingenious, it is a standard application of existing formulas. You just need to know the formulas :)
mpiktas

8

Here is a technique for minimizing the sum of squares in regression that actually has applications to more general settings and which I find useful.

Let's try to avoid vector-matrix calculus altogether.

Suppose we are interested in minimizing

E=(yXβ)T(yXβ)=yXβ22,
where yRn, XRn×p and βRp. We assume for simplicity that pn and rank(X)=p.

For any β^Rp, we get

E=yXβ^+Xβ^Xβ22=yXβ^22+X(ββ^)222(ββ^)TXT(yXβ^).

If we can choose (find!) a vector β^ such that the last term on the right-hand side is zero for every β, then we would be done, since that would imply that minβEyXβ^22.

But, (ββ^)TXT(yXβ^)=0 for all β if and only if XT(yXβ^)=0 and this last equation is true if and only if XTXβ^=XTy. So E is minimized by taking β^=(XTX)1XTy.


While this may seem like a "trick" to avoid calculus, it actually has wider application and there is some interesting geometry at play.

One example where this technique makes a derivation much simpler than any matrix-vector calculus approach is when we generalize to the matrix case. Let YRn×p, XRn×q and BRq×p. Suppose we wish to minimize

E=tr((YXB)Σ1(YXB)T)
over the entire matrix B of parameters. Here Σ is a covariance matrix.

An entirely analogous approach to the above quickly establishes that the minimum of E is attained by taking

B^=(XTX)1XTY.
That is, in a regression setting where the response is a vector with covariance Σ and the observations are independent, then the OLS estimate is attained by doing p separate linear regressions on the components of the response.

Fortunately the forum rules allow adding +1 to every answer. Thanks for the education, guys!
DWin

@DWin, did you mean to post this under the comments to the question?
cardinal

I suppose I could have. I had sequentially gone through the question and then all the answers (after the processing of the MathML stopped jerking around) and found each of the answers informative. I just dropped my comment on yours because it was where I stopped reading.
DWin

1
@DWin, yes, the rendering is a bit funky. I thought you might have intended the comment for another post since this one has no votes (up or down) and so the comment seemed to be out of place. Cheers.
cardinal

1
@cardinal +1, useful trick. This question turned out to be a pretty good reference.
mpiktas

6

One way which may help you understand is to not use matrix algebra, and differentiate with each respect to each component, and then "store" the results in a column vector. So we have:

βki=1N(Yij=1pXijβj)2=0

Now you have p of these equations, one for each beta. This is a simple application of the chain rule:

i=1N2(Yij=1pXijβj)1(βk[Yij=1pXijβj])=0
2i=1NXik(Yij=1pXijβj)=0

Now we can re-write the sum inside the bracket as j=1pXijβj=xiTβ So you get:

i=1NXikYii=1NXikxiTβ=0

Now we have p of these equations, and we will "stack them" in a column vector. Notice how Xik is the only term which depends on k, so we can stack this into the vector xi and we get:

i=1NxiYi=i=1NxixiTβ

Now we can take the beta outside the sum (but must stay on RHS of sum), and then take the invervse:

(i=1NxixiT)1i=1NxiYi=β
हमारी साइट का प्रयोग करके, आप स्वीकार करते हैं कि आपने हमारी Cookie Policy और निजता नीति को पढ़ और समझा लिया है।
Licensed under cc by-sa 3.0 with attribution required.