Here is a technique for minimizing the sum of squares in regression that actually has applications to more general settings and which I find useful.
Let's try to avoid vector-matrix calculus altogether.
Suppose we are interested in minimizing
E=(y−Xβ)T(y−Xβ)=∥y−Xβ∥22,
where
y∈Rn,
X∈Rn×p and
β∈Rp. We assume for simplicity that
p≤n and
rank(X)=p.
For any β^∈Rp, we get
E=∥y−Xβ^+Xβ^−Xβ∥22=∥y−Xβ^∥22+∥X(β−β^)∥22−2(β−β^)TXT(y−Xβ^).
If we can choose (find!) a vector β^ such that the last term on the right-hand side is zero for every β, then we would be done, since that would imply that minβE≥∥y−Xβ^∥22.
But, (β−β^)TXT(y−Xβ^)=0 for all β if and only if XT(y−Xβ^)=0 and this last equation is true if and only if XTXβ^=XTy. So E is minimized by taking β^=(XTX)−1XTy.
While this may seem like a "trick" to avoid calculus, it actually has wider application and there is some interesting geometry at play.
One example where this technique makes a derivation much simpler than any matrix-vector calculus approach is when we generalize to the matrix case. Let Y∈Rn×p, X∈Rn×q and B∈Rq×p. Suppose we wish to minimize
E=tr((Y−XB)Σ−1(Y−XB)T)
over the entire
matrix B of parameters. Here
Σ is a covariance matrix.
An entirely analogous approach to the above quickly establishes that the minimum of E is attained by taking
B^=(XTX)−1XTY.
That is, in a regression setting where the response is a
vector with covariance
Σ and the observations are independent, then the OLS estimate is attained by doing
p separate linear regressions on the components of the response.
smallmatrix, इसलिए संपादित करने की कोशिश नहीं की, क्योंकि कई लाइनों में सूत्र को तोड़ने के सामान्य समाधान ने यहां काम नहीं किया होगा।