लॉजिस्टिक रिग्रेशन को बंद रूप में हल कब किया जाता है?

लो $x \in \{0,1\}^d$ और $y \in \{0,1\}$ और लगता है हम रसद प्रतिगमन का उपयोग कर भविष्यवाणी y दिया एक्स का कार्य मॉडल। लॉजिस्टिक रिग्रेशन गुणांक को बंद रूप में कब लिखा जा सकता है?

एक उदाहरण है जब हम एक संतृप्त मॉडल का उपयोग करते हैं।

है यही कारण है, को परिभाषित $P(y|x) \propto \exp(\sum_i w_i f_i(x_i))$ है, जहां $i$ की शक्ति-सेट में अनुक्रमित सेट $\{x_1,\ldots,x_d\}$ , और $f_i$ रिटर्न 1 यदि $i$ के वें सेट में सभी चर 1, और 0 अन्यथा हैं। तो फिर तुम प्रत्येक व्यक्त कर सकते हैं $w_i$ डेटा के आंकड़ों का एक तर्कसंगत समारोह के लघुगणक के रूप में इस रसद प्रतिगमन मॉडल में।

क्या बंद फॉर्म मौजूद होने पर अन्य दिलचस्प उदाहरण हैं?

logistic generalized-linear-model

— यारोस्लाव बुलटोव
स्रोत

मुझे लगता है कि आप का मतलब है "बंद फॉर्म में मापदंडों के MLE कब हैं?"

— Glen_b -Reinstate मोनिका

Can you give more detail what you did? Your question reads as if you tried to derive the ordinary least squares estimator for a logistic regression problem?

— Momo

Thanks for the interesting post/question, Yaroslav. Do you have a reference for the example that you show?

— Bitwise

It's been a while, but possibly it was in Lauritzen's "Graphical Models" book. The broader foundations of the answer for this question are there -- you get closed form solution when the (hyper)graph formed by sufficient statistics is chordal

— Yaroslav Bulatov

This might be interesting tandfonline.com/doi/abs/10.1080/… I believe this is a special case of an analytical solution when you only have a 2x2 table

— Austin

जवाबों:

As kjetil b halvorsen pointed out, it is, in its own way, a miracle that the linear regression admits an analytical solution. And this is so only by virtue of linearity of the problem (with respect to the parameters). In OLS, you have

\sum_{i} (y_{i} - x_{i}^{'} β)^{2} \to min_{β},

$\sum_i (y_i - x_i' \beta)^2 \to \min_\beta,$ which has the first order conditions

- 2 \sum_{i} (y_{i} - x_{i}^{'} β) x_{i} = 0

$-2 \sum_i (y_i - x_i'\beta) x_i = 0$ For a problem with

p

$p$ variables (including constant, if needed—there are some regression through the origin problems, too), this is a system with

p

$p$ equations and

p

$p$ unknowns. Most importantly, it is a linear system, so you can find a solution using the standard linear algebra theory and practice. This system will have a solution with probability 1 unless you have perfectly collinear variables.

Now, with logistic regression, things aren't that easy anymore. Writing down the log-likelihood function,

l (y; x, β) = \sum_{i} y_{i} \ln p_{i} + (1 - y_{i}) \ln (1 - p_{i}), p_{i} = (1 + \exp (- θ_{i}))^{- 1}, θ_{i} = x_{i}^{'} β,

$l(y;x,\beta) = \sum_i y_i \ln p_i + (1-y_i) \ln(1-p_i), \quad p_i = (1+\exp(-\theta_i))^{-1}, \quad \theta_i = x_i' \beta,$ and taking its derivative to find the MLE, we get

\frac{\partial l}{\partial β^{'}} = \sum_{i} \frac{d p_{i}}{d θ} (\frac{y_{i}}{p_{i}} - \frac{1 - y_{i}}{1 - p_{i}}) x_{i} = \sum_{i} [y_{i} - \frac{1}{1 + \exp (x_{i}^{'} β)}] x_{i}

$\frac{\partial l}{\partial \beta'} = \sum_i \frac{{\rm d}p_i}{{\rm d}\theta}\Bigl( \frac{y_i}{p_i} - \frac{1-y_i}{1-p_i} \Bigr)x_i = \sum_i \Bigl[y_i-\frac1{1+\exp(x_i'\beta)}\Bigr]x_i$ The parameters

β

$\beta$ enter this in a very nonlinear way: for each

i

$i$ , there's a nonlinear function, and they are added together. There is no analytical solution (except probably in a trivial situation with two observations, or something like that), and you have to use nonlinear optimization methods to find the estimates

\hat{β}

$\hat\beta$ .

A somewhat deeper look into the problem (taking the second derivative) reveals that this is a convex optimization problem of finding a maximum of a concave function (a glorified multivariate parabola), so either one exists, and any reasonable algorithm should be finding it rather quickly, or things blow off to infinity. The latter does happen to logistic regression when ${\rm Prob}[Y_i=1|x_i'\beta > c] = 1$ for some $c$ , i.e., you have a perfect prediction. This is a rather unpleasant artifact: you would think that when you have a perfect prediction, the model works perfectly, but curiously enough, it is the other way round.

— StasK
स्रोत

the question is why your last equation is not solvable. is it due to the logistic function's inverse diverging at 0 and 1, or is this due to the nonlinearity in general?

— eyaler

(+1) Regarding your last paragraph: From a mathematical perspective it does work "perfectly" in the sense that an MLE will yield a perfect separating hyperplane. Whether your numerical algorithm behaves sensibly in that circumstance is a separate matter. Laplace smoothing is often used in such situations.

— cardinal

@eyaler, I would say this is due to nonlinearity in general. My understanding is that there is a limited set of circumstances when this can be solved, although I don't know what these circumstances are.

— StasK

I don't understand, what mathematical condition is present that makes the system not have a closed form solution? Is there a general condition where things in general don't have closed form solutions?

— Charlie Parker

does the fact that logistic regression has no closed form something one can prove by looking at the gradient descent iteration for it?

— Charlie Parker

This post was originally intended as a long comment rather than a complete answer to the question at hand.

From the question, it's a little unclear if the interest lies only in the binary case or, perhaps, in more general cases where they may be continuous or take on other discrete values.

One example that doesn't quite answer the question, but is related, and which I like, deals with item-preference rankings obtained via paired comparisons. The Bradley–Terry model can be expressed as a logistic regression where

l o g i t (Pr (Y_{i j} = 1)) = α_{i} - α_{j},

$\mathrm{logit}( \Pr(Y_{ij} = 1) ) = \alpha_i - \alpha_j ,$ and

α_{i}

$\alpha_i$ is an "affinity", "popularity", or "strength" parameter of item

i

$i$ with

Y_{i j} = 1

$Y_{ij} = 1$ indicating item

i

$i$ was preferred over item

j

$j$ in a paired comparison.

If a full round-robin of comparisons is performed (i.e., a pairwise preference is recorded for each unordered $(i,j)$ pair), then it turns out that the rank order of the MLEs $\hat{\alpha}_i$ correspond to the rank order of $S_i = \sum_{j \neq i} Y_{ij}$ , the sum total of times each object was preferred over another.

To interpret this, imagine a full round-robin tournament in your favorite competitive sport. Then, this result says that the Bradley–Terry model ranks the players/teams according to their winning percentage. Whether this is an encouraging or disappointing result depends on your point of view, I suppose.

NB This rank-ordering result does not hold, in general, when a full round-robin is not played.

— cardinal
स्रोत

I was interested in binary because it was easiest to analyze. I have found a very broad sufficient condition in works of Lauritzen -- you get closed form if a corresponding log-linear model is decomposable

— Yaroslav Bulatov