रैखिक-प्रतिगमन गुणांक अनुमानों का विश्लेषणात्मक समाधान

9

मैं मैट्रिक्स नोटेशन को समझने की कोशिश कर रहा हूं, और वैक्टर और मैट्रिस के साथ काम कर रहा हूं।

अभी मैं समझना चाहता हूं कि गुणांक का वेक्टर कैसे अनुमान लगाता है $\hat{\beta}$ एकाधिक प्रतिगमन में गणना की जाती है।

मूल समीकरण लगता है

\frac{d}{d β} (y - X β)^{'} (y - X β) = 0 .

$\frac{d}{d\boldsymbol{\beta}} (\boldsymbol{y}-\boldsymbol{X\beta})'(\boldsymbol{y}-\boldsymbol{X\beta}) = 0 \>.$

अब मैं यहाँ एक वेक्टर लिए कैसे हल करूँगा ? $\beta$

संपादित करें : रुको, मैं फंस गया हूं। मैं यहाँ हूँ और पता नहीं कैसे जारी रखने के लिए:

$\frac{d}{d{\beta}} \left( \left(\begin{smallmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{smallmatrix}\right) - \left(\begin{smallmatrix} 1 & x_{11} & x_{12} & \dots & x_{1p} \\ 1 & x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & & & & \vdots \\ 1 & x_{n1} & x_{n2} & \dots & x_{np} \\ \end{smallmatrix}\right) \left(\begin{smallmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{smallmatrix}\right) \right) ' \left( \left(\begin{smallmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{smallmatrix}\right) - \left(\begin{smallmatrix} 1 & x_{11} & x_{12} & \dots & x_{1p} \\ 1 & x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & & & & \vdots \\ 1 & x_{n1} & x_{n2} & \dots & x_{np} \\ \end{smallmatrix}\right) \left(\begin{smallmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{smallmatrix}\right) \right)$

$\frac{d}{d{\beta}} \sum_{i=1}^n \left( y_i - \begin{pmatrix} 1 & x_{i1} & x_{i2} & \dots & x_{ip} \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{pmatrix} \right)^2$

साथ सभी के लिए अवरोधन किया जा रहा है: $x_{i0} = 1$ $i$

$\frac{d}{d{\beta}} \sum_{i=1}^n \left( y_i - \sum_{k=0}^p x_{ik} \beta_k \right)^2$

क्या मुझे आपसे सही दिशा निर्देशन मिलेगा?

regression

— Alexander Engelhardt
स्रोत

@GaBorgulya, संपादन के लिए धन्यवाद, के बारे में पता नहीं था smallmatrix, इसलिए संपादित करने की कोशिश नहीं की, क्योंकि कई लाइनों में सूत्र को तोड़ने के सामान्य समाधान ने यहां काम नहीं किया होगा।

— एमपिकटस

12

हमारे पास है

$\frac{d}{d\beta} (y - X \beta)' (y - X\beta) = -2 X' (y - X \beta)$ ।

इसे घटकों के साथ स्पष्ट रूप से समीकरण लिखकर दिखाया जा सकता है। उदाहरण के लिए, लिखें $(\beta_{1}, \ldots, \beta_{p})'$ के बजाय $\beta$ । फिर सम्मान के साथ व्युत्पत्ति लेते हैं $\beta_{1}$ , $\beta_{2}$ ,, ... $\beta_{p}$ और जवाब पाने के लिए सब कुछ ढेर। एक त्वरित और आसान चित्रण के लिए, आप के साथ शुरू कर सकते हैं $p = 2$ ।

With experience one develops general rules, some of which are given, e.g., in that document.

Edit to guide for the added part of the question

With $p = 2$ , we have

$(y - X \beta)'(y - X \beta) = (y_1 - x_{11} \beta_1 - x_{12} \beta_2)^2 + (y_2 - x_{21}\beta_1 - x_{22} \beta_2)^2$

The derivative with respect to $\beta_1$ is

$-2x_{11}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{21}(y_2 - x_{21}\beta_1 - x_{22} \beta_2)$

Similarly, the derivative with respect to $\beta_2$ is

$-2x_{12}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{22}(y_2 - x_{21}\beta_1 - x_{22} \beta_2)$

Hence, the derivative with respect to $\beta = (\beta_1, \beta_2)'$ is

$\left( \begin{array}{c} -2x_{11}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{21}(y_2 - x_{21}\beta_1 - x_{22} \beta_2) \\ -2x_{12}(y_1 - x_{11} \beta_1 - x_{12} \beta_2)-2x_{22}(y_2 - x_{21}\beta_1 - x_{22} \beta_2) \end{array} \right)$

Now, observe you can rewrite the last expression as

$-2\left( \begin{array}{cc} x_{11} & x_{21} \\ x_{12} & x_{22} \end{array} \right)\left( \begin{array}{c} y_{1} - x_{11}\beta_{1} - x_{12}\beta_2 \\ y_{2} - x_{21}\beta_{1} - x_{22}\beta_2 \end{array} \right) = -2 X' (y - X \beta)$

Of course, everything is done in the same way for a larger $p$ .

— ocram
स्रोत

Awesome, I was looking for exactly that type of pdf. Thanks a ton!

— Alexander Engelhardt

Oh, I thought I could do it myself now, but I can't. Can you tell me if my steps are right or if I should take "another way" to solve this?

— Alexander Engelhardt

@Alexx Hardt: My first equation in the edit is the same as your very last equation in the particular case where p = 2. So, you can mimic my calculations for components 3, 4, ..., p.

— ocram

Thanks again :) I think I'll actually use all three suggestions. I'm building a .pdf which explains and sums up basic stats matrix algebra, because I somehow never wanted to learn it when I learned it in my classes. To solve it with three different ways will help me understand it better, I hope.

— Alexander Engelhardt

Oh, but this is for p=2 and n=2, right? I'll write it down with n=3 I think.

— Alexander Engelhardt

13

You can also use formulas from Matrix cookbook. We have

(y - X β)^{'} (y - X β) = y^{'} y - β^{'} X^{'} y - y^{'} X β + β^{'} X^{'} X β

$(y-X\beta)'(y-X\beta)=y'y-\beta'X'y-y'X\beta+\beta'X'X\beta$

Now take derivatives of each term. You might want to notice that $\beta'X'y=y'X\beta$ . The derivative of term $y'y$ with respect to $\beta$ is zero. The remaining term

β^{'} X^{'} X β - 2 y^{'} X β

$\beta'X'X\beta-2y'X\beta$

is of form of function

f (x) = x^{'} A x + b^{'} x,

$f(x)=x'Ax+b'x,$

in formula (88) in the book in page 11, with $x=\beta$ , $A=X'X$ and $b=-2X'y$ . The derivative is given in the formula (89):

\frac{\partial f}{\partial x} = (A + A^{'}) x + b

$\frac{\partial f}{\partial x}=(A+A')x+b$

so

\frac{\partial}{\partial β} (y - X β)^{'} (y - X β) = (X^{'} X + (X^{'} X)^{'}) β - 2 X^{'} y

$\frac{\partial}{\partial \beta}(y-X\beta)'(y-X\beta)=(X'X+(X'X)')\beta-2X'y$

Now since $(X'X)'=X'X$ we get the desired solution:

X^{'} X β = X^{'} y

$X'X\beta=X'y$

— mpiktas
स्रोत

+1 mpiktas: Your solution is more ingenious than mine and I think it should be used in more complex practical situations.

— ocram

1

@ocram, thanks. I would not call it ingenious, it is a standard application of existing formulas. You just need to know the formulas :)

— mpiktas

8

Here is a technique for minimizing the sum of squares in regression that actually has applications to more general settings and which I find useful.

Let's try to avoid vector-matrix calculus altogether.

Suppose we are interested in minimizing

E = (y - X β)^{T} (y - X β) = ‖ y - X β ‖_{2}^{2},

$\newcommand{\err}{\mathcal{E}}\newcommand{\my}{\mathbf{y}}\newcommand{\mX}{\mathbf{X}}\newcommand{\bhat}{\hat{\beta}}\newcommand{\reals}{\mathbb{R}} \err = (\my - \mX \beta)^T (\my - \mX \beta) = \|\my - \mX \beta\|_2^2 \> ,$ where

y \in R^{n}

$\my \in \reals^n$ ,

X \in R^{n \times p}

$\mX \in \reals^{n\times p}$ and

β \in R^{p}

$\beta \in \reals^p$ . We assume for simplicity that

p \leq n

$p \leq n$ and

r a n k (X) = p

$\mathrm{rank}(\mX) = p$ .

For any $\bhat \in \reals^p$ , we get

E = ‖ y - X \hat{β} + X \hat{β} - X β ‖_{2}^{2} = ‖ y - X \hat{β} ‖_{2}^{2} + ‖ X (β - \hat{β}) ‖_{2}^{2} - 2 (β - \hat{β})^{T} X^{T} (y - X \hat{β}) .

$\err = \|\my - \mX \bhat + \mX \bhat - \mX \beta\|_2^2 = \|\my - \mX \bhat\|_2^2 + \|\mX(\beta-\bhat)\|_2^2 - 2(\beta - \bhat)^T \mX^T (\my - \mX \bhat) \>.$

If we can choose (find!) a vector $\bhat$ such that the last term on the right-hand side is zero for every $\beta$ , then we would be done, since that would imply that $\min_\beta \err \geq \|\my - \mX \bhat\|_2^2$ .

But, $(\beta - \bhat)^T \mX^T (\my - \mX \bhat) = 0$ for all $\beta$ if and only if $\mX^T (\my - \mX \bhat) = 0$ and this last equation is true if and only if $\mX^T \mX \bhat = \mX^T \my$ . So $\err$ is minimized by taking $\bhat = (\mX^T \mX)^{-1} \mX^T \my$ .

While this may seem like a "trick" to avoid calculus, it actually has wider application and there is some interesting geometry at play.

One example where this technique makes a derivation much simpler than any matrix-vector calculus approach is when we generalize to the matrix case. Let $\newcommand{\mY}{\mathbf{Y}}\newcommand{\mB}{\mathbf{B}}\mY \in \reals^{n \times p}$ , $\mX \in \reals^{n \times q}$ and $\mB \in \reals^{q \times p}$ . Suppose we wish to minimize

E = t r ((Y - X B) Σ^{- 1} (Y - X B)^{T})

$\err = \mathrm{tr}( (\mY - \mX \mB) \Sigma^{-1} (\mY - \mX \mB)^T )$ over the entire matrix

B

$\mB$ of parameters. Here

Σ

$\Sigma$ is a covariance matrix.

An entirely analogous approach to the above quickly establishes that the minimum of $\err$ is attained by taking

\hat{B} = (X^{T} X)^{- 1} X^{T} Y .

$\hat{\mB} = (\mX^T \mX)^{-1} \mX^T \mY \>.$ That is, in a regression setting where the response is a vector with covariance

Σ

$\Sigma$ and the observations are independent, then the OLS estimate is attained by doing

p

$p$ separate linear regressions on the components of the response.

— cardinal
स्रोत

Fortunately the forum rules allow adding +1 to every answer. Thanks for the education, guys!

— DWin

@DWin, did you mean to post this under the comments to the question?

— cardinal

I suppose I could have. I had sequentially gone through the question and then all the answers (after the processing of the MathML stopped jerking around) and found each of the answers informative. I just dropped my comment on yours because it was where I stopped reading.

— DWin

1

@DWin, yes, the rendering is a bit funky. I thought you might have intended the comment for another post since this one has no votes (up or down) and so the comment seemed to be out of place. Cheers.

— cardinal

1

@cardinal +1, useful trick. This question turned out to be a pretty good reference.

— mpiktas

6

One way which may help you understand is to not use matrix algebra, and differentiate with each respect to each component, and then "store" the results in a column vector. So we have:

\frac{\partial}{\partial β_{k}} \sum_{i = 1}^{N} {(Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j})}^{2} = 0

$\frac{\partial}{\partial \beta_{k}}\sum_{i=1}^{N}\left(Y_{i}-\sum_{j=1}^{p}X_{ij}\beta_{j}\right)^{2}=0$

Now you have $p$ of these equations, one for each beta. This is a simple application of the chain rule:

\sum_{i = 1}^{N} 2 {(Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j})}^{1} (\frac{\partial}{\partial β_{k}} [Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j}]) = 0

$\sum_{i=1}^{N}2\left(Y_{i}-\sum_{j=1}^{p}X_{ij}\beta_{j}\right)^{1}\left(\frac{\partial}{\partial \beta_{k}}\left[Y_{i}-\sum_{j=1}^{p}X_{ij}\beta_{j}\right]\right)=0$

- 2 \sum_{i = 1}^{N} X_{i k} (Y_{i} - \sum_{j = 1}^{p} X_{i j} β_{j}) = 0

$-2\sum_{i=1}^{N}X_{ik}\left(Y_{i}-\sum_{j=1}^{p}X_{ij}\beta_{j}\right)=0$

Now we can re-write the sum inside the bracket as $\sum_{j=1}^{p}X_{ij}\beta_{j}=\bf{x}_{i}^{T}\boldsymbol{\beta}$ So you get:

\sum_{i = 1}^{N} X_{i k} Y_{i} - \sum_{i = 1}^{N} X_{i k} x_{i}^{T} β = 0

$\sum_{i=1}^{N}X_{ik}Y_{i}-\sum_{i=1}^{N}X_{ik}\bf{x}_{i}^{T}\boldsymbol{\beta}=0$

Now we have $p$ of these equations, and we will "stack them" in a column vector. Notice how $X_{ik}$ is the only term which depends on $k$ , so we can stack this into the vector $\bf{x}_{i}$ and we get:

\sum_{i = 1}^{N} x_{i} Y_{i} = \sum_{i = 1}^{N} x_{i} x_{i}^{T} β

$\sum_{i=1}^{N}\bf{x}_{i}\rm{Y}_{i}=\sum_{i=1}^{N}\bf{x}_{i}\bf{x}_{i}^{T}\boldsymbol{\beta}$

Now we can take the beta outside the sum (but must stay on RHS of sum), and then take the invervse:

{(\sum_{i = 1}^{N} x_{i} x_{i}^{T})}^{- 1} \sum_{i = 1}^{N} x_{i} Y_{i} = β

$\left(\sum_{i=1}^{N}\bf{x}_{i}\bf{x}_{i}^{T}\right)^{-1}\sum_{i=1}^{N}\bf{x}_{i}\rm{Y}_{i}=\boldsymbol{\beta}$

— probabilityislogic
स्रोत