एक रेखीय परिवर्तन के बाद कोसाइन समानता कैसे बदलती है?

9

क्या इसके बीच गणितीय संबंध है:

cosine समानता $\operatorname{sim}(A, B)$ दो वैक्टर के $A$ तथा $B$ , तथा
cosine समानता $\operatorname{sim}(MA, MB)$ का $A$ और , एक समान मैट्रिक्स माध्यम से गैर-समान रूप से बढ़ाया गया ? यहां एक दिया गया विकर्ण मैट्रिक्स है जो विकर्ण पर असमान तत्वों के साथ है। $B$ $M$ $M$

मैंने गणनाओं पर जाने का प्रयास किया, लेकिन एक सरल / दिलचस्प लिंक (अभिव्यक्ति) तक नहीं पहुंच सका। मुझे आश्चर्य है कि अगर वहाँ एक है।

उदा। गैर-समान स्केलिंग में कोणों को संरक्षित नहीं किया जाता है, लेकिन मूल कोण और गैर-समान स्केलिंग के बाद क्या संबंध है? वैक्टर S1 के एक सेट और वैक्टर S2 के दूसरे सेट के बीच के लिंक के बारे में क्या कहा जा सकता है - जहां S2 गैर-समान रूप से स्केलिंग S1 द्वारा प्राप्त किया जाता है?

linear-algebra cosine-similarity

— Turdus-merula
स्रोत

@ शुभंकर, धन्यवाद! हां, एम एक दिया गया मैट्रिक्स है (स्केलिंग मैट्रिक्स - इस प्रकार एक विकर्ण मैट्रिक्स, कोई अन्य प्रतिबंध नहीं)। एक अर्थ में, मैं यह जानना चाहता था कि क्या होता है (वैक्टर के किसी भी जोड़े के लिए कोसाइन समानता के संदर्भ में) एक वेक्टर स्थान जो एक गैर-रेखीय स्केलिंग से ग्रस्त है।

— हल्दस-मेरुला

2

यह ध्यान देने योग्य हो सकता है कि यदि सभी पैमाने कारक गैर-नकारात्मक हैं (जैसा कि कोई स्वाभाविक रूप से ग्रहण करेगा), तो सभी सममित सकारात्मक-निश्चित मैट्रिक्स को "स्केलिंग" मेट्रिक्स माना जा सकता है। आपके द्वारा चाहा गया संबंध बड़े पैमाने पर, अन्य बातों के साथ , मानचित्र अनुमानों में विकृति के अध्ययन और वर्णन में उपयोग किया जाता है। वहां, पृथ्वी की सतह पर अधिकतम और न्यूनतम कोणों में रुचि केंद्र, जो मानचित्र पर दो लंबवत दिशाओं से जुड़े होंगे । इन कोणों और दो पैमाने के कारकों के अनुपात के बीच सीधा संबंध है।

— whuber

8

चूंकि $M$ काफी सामान्य है, और ब्रह्मांड समानता में परिवर्तन विशेष पर निर्भर करता है $A$ तथा $B$ और उनके संबंध $M$ , कोई निश्चित सूत्र संभव नहीं है। हालांकि, व्यावहारिक रूप से कम्प्यूटेशनल सीमाएं हैं कि कॉस्मिक समानता कितनी बदल सकती है । के बीच के कोण को बढ़ाकर उन्हें पाया जा सकता है $MA$ तथा $MB$ यह देखते हुए कि कोसाइन के बीच समानता है $A$ तथा $B$ एक निर्दिष्ट मान है, कहते हैं $\cos(2\phi)$ (कहाँ पे $2\phi$ के बीच का कोण है $A$ तथा $B$ )। जवाब हमें बताता है कि कोई भी कोण कितना है $2\phi$ संभवतया परिवर्तन द्वारा झुक सकता है $M$ ।

गणना गड़बड़ होने का खतरा है। अंकन के कुछ चतुर विकल्प, कुछ प्रारंभिक सरलीकरण के साथ, प्रयास को कम करते हैं। यह पता चला है कि दो आयामों में समाधान से पता चलता है कि हमें क्या चाहिए। यह एक ट्रैक्टेबल समस्या है, जो केवल एक वास्तविक चर पर निर्भर करता है $\theta$ , जो कैलकुलस तकनीकों का उपयोग करके आसानी से हल किया जाता है। एक सरल ज्यामितीय तर्क इस समाधान को किसी भी संख्या में आयाम तक बढ़ाता है $n$ ।

गणितीय पूर्वाग्रहों

परिभाषा के अनुसार, किन्हीं दो वैक्टरों के बीच का कोण $A$ तथा $B$ उन्हें यूनिट की लंबाई को सामान्य करने और उनके उत्पाद लेने के द्वारा प्राप्त किया जाता है। इस प्रकार,

A ' B ( A ' A ) ( B ' B ) - - - - - - - - - - \sqrt = cos (2 ϕ)

$\frac{A^\prime B}{\sqrt{(A^\prime A)\, (B^\prime B)}} = \cos(2\phi)$

और, लेखन $\Sigma = M^\prime M$ की छवियों के बीच कोण के कोसाइन $A$ तथा $B$ परिवर्तन के तहत $M$ है

( एम ए ) ' ( एम बी ) ( ( एम) ए ) ' ( एम एक ) ) ( ( एम) बी ) ' ( एम बी ) ) - - - - - - - - - - - - - - - - - - - - - - - \sqrt = ए ' Σ बी ( ए ' Σ ए ) ( बी ' Σ बी ) - - - - - - - - - - - - \sqrt । (1)

$\frac{(MA)^\prime (MB)}{\sqrt{((MA)^\prime (MA))\, ((MB)^\prime (MB))}} = \frac{A^\prime \Sigma B}{\sqrt{(A^\prime \Sigma A) (B^\prime \Sigma B)}}.\tag{1}$

ध्यान दें कि केवल $\Sigma$ विश्लेषण में मायने रखता है, नहीं $M$ अपने आप। इसलिए हम सिंगुलर वैल्यू डीकम्पोजिशन (SVD) का फायदा उठा सकते हैं $M$ समस्या को आसान बनाने के लिए। स्मरण करो कि यह व्यक्त करता है $M$ एक ऑर्थोगोनल मैट्रिक्स के उत्पाद (दाएं से बाएं) के रूप में $V^\prime$ , एक विकर्ण मैट्रिक्स $D$ , और एक अन्य ऑर्थोगोनल मैट्रिक्स $U$ :

म = यू डी वी' ।

$M = U\,D\,V^\prime.$

दूसरे शब्दों में, विशेषाधिकार प्राप्त वैक्टर का एक आधार है $e_1, \ldots, e_n$ (के कॉलम $V$ ) जिस पर $M$ प्रत्येक rescaling द्वारा कार्य करता है $e_i$ द्वारा अलग से $i^\text{th}$ का विकर्ण प्रवेश $D$ (जिसे मैं फोन करूंगा $d_i$ ) और बाद में एक रोटेशन (या विरोधी रोटेशन) $U$ परिणाम के लिए। वह अंतिम रोटेशन किसी भी लंबाई या कोण को नहीं बदलेगा और इसलिए प्रभावित नहीं होना चाहिए $\Sigma$ । आप इसे औपचारिक रूप से गणना के साथ देख सकते हैं

Σ = म' म = (यू डी वी')' (यू डी वी') = वी डी (यू' यू) डी वी' = वी डी 2 वी' ।

$\Sigma = M^\prime M = (U D V^\prime)^\prime (U D V^\prime) = V D (U^\prime U) D V^\prime = V D^2 V^\prime.$

नतीजतन, अध्ययन करने के लिए $\Sigma$ हम स्वतंत्र रूप से बदल सकते हैं $M$ किसी भी अन्य मैट्रिक्स द्वारा जो समान मूल्यों का उत्पादन करता है $(1)$ । आदेश देकर $e_i$ ताकि $d_i$ आकार में कमी (और ग्रहण करना $M$ पहचान शून्य नहीं है), का एक अच्छा विकल्प $M$ है

म = 1 घ 1 डी वी' ।

$M = \frac{1}{{d_1}} D V^\prime.$

के विकर्ण तत्व $(1/{d_1})D$ कर रहे हैं

1 = d 1 / d 1 \geq λ 2 = d 2 / d 1 \geq λ 3 = d 3 / d 1 \geq \dots \geq λ n = d n / d 1 \geq 0.

$1 = d_1/d_1 \ge \lambda_2 = d_2/{d_1} \ge \lambda_3 = d_3/{d_1} \ge \cdots \ge \lambda_n = d_n/{d_1} \ge 0.$

विशेष रूप से, का प्रभाव $M$ (whether in its original or changed form) on all angles is completely determined by the fact that

M e i = λ i e i .

$M e_i = \lambda_i e_i.$

Analysis of a special case

Let $n=2$ . Because changing the lengths of vectors does not change the angle between them, we may assume $A$ and $B$ are unit vectors. In the plane all such vectors may be designated by the angle they make with $e_1$ , allowing us to write

A = cos (θ - ϕ) e 1 + sin (θ - ϕ) e 2 .

$A = \cos(\theta-\phi)e_1 + \sin(\theta-\phi)e_2.$

Therefore

B = cos (θ + ϕ) e 1 + sin (θ + ϕ) e 2 .

$B = \cos(\theta+\phi)e_1 + \sin(\theta+\phi)e_2.$

(See the figure below.)

Applying $M$ is simple: it fixes the first coordinates of $A$ and $B$ and multiplies their second coordinates by $\lambda_2$ . Therefore the angle from $MA$ to $MB$ is

f (θ) = arctan (λ 2 tan (θ + ϕ)) - arctan (λ 2 tan (θ - ϕ)) .

$f(\theta) = \arctan(\lambda_2 \tan(\theta+\phi)) - \arctan(\lambda_2 \tan(\theta-\phi)).$

Because $M$ is a continuous function, this difference of angles is a continuous function of $\theta$ . In fact, it is differentiable. This allows us to find the extreme angles by inspecting the zeros of the derivative $f^\prime(\theta)$ . That derivative is straightforward to compute: it is a ratio of trigonometric functions. The zeros can occur only among the zeros of its numerator, so let's not bother to compute the denominator. We obtain

f' (θ) = λ 2 ( 1 - λ 2 ) ( λ 2 + 1 ) sin ( 2 θ ) sin ( 2 ϕ ) * .

$f^\prime(\theta) = \frac{\lambda_2(1-\lambda_2)(\lambda_2+1)\sin(2\theta)\sin(2\phi)}{*}.$

The special cases of $\lambda_2=0$ , $\lambda_2=1$ ,and $\phi=0$ are easily understood: they correspond to the situations where $M$ is of reduced rank (and so squashes all vectors onto a line); where $M$ is a multiple of the identity matrix; and where $A$ and $B$ are parallel (whence the angle between them cannot change, regardless of $\theta$ ). The case $\lambda_2=-1$ is precluded by the condition $\lambda_2 \ge 0$ .

Apart from these special cases, the zeros occur only where $\sin(2\theta)=0$ : that is, $\theta=0$ or $\theta=\pi/2$ . This means that the line determined by $e_1$ bisects the angle $AB$ . We now know that the extreme values of the angle between $MA$ and $MB$ must lie among the values of $f(\theta)$ , so let's compute them:

f (0) f (π / 2) = arctan (λ 2 tan (ϕ)) - arctan (λ 2 tan (- ϕ)) = 2 arctan (λ 2 tan (ϕ)); = arctan (λ 2 tan (π / 2 + ϕ)) - arctan (λ 2 tan (π / 2 - ϕ)) = 2 arctan (λ 2 cot (- ϕ)) .

$\eqalign{ f(0) &= \arctan(\lambda_2 \tan(\phi)) - \arctan(\lambda_2 \tan(-\phi)) = 2\arctan(\lambda_2\tan(\phi)); \\ f(\pi/2) &= \arctan(\lambda_2 \tan(\pi/2+\phi)) - \arctan(\lambda_2 \tan(\pi/2-\phi)) = 2\arctan(\lambda_2\cot(-\phi)). }$

The corresponding cosines are

cos (f (0)) = 1 - λ 2 2 tan ( ϕ ) 2 1 + λ 2 2 tan ( ϕ ) 2 (2)

$\cos(f(0)) = \frac{1 - \lambda_2^2 \tan(\phi)^2}{1 + \lambda_2^2 \tan(\phi)^2}\tag{2}$

and

cos (f (π / 2)) = 1 - λ 2 2 cot ( ϕ ) 2 1 + λ 2 2 cot ( ϕ ) 2 = tan ( ϕ ) 2 - λ 2 2 tan ( ϕ ) 2 + λ 2 2 . (3)

$\cos(f(\pi/2)) = \frac{1 - \lambda_2^2 \cot(\phi)^2}{1 + \lambda_2^2 \cot(\phi)^2} = \frac{\tan(\phi)^2 - \lambda_2^2 }{\tan(\phi)^2 + \lambda_2^2}.\tag{3}$

Often it's sufficient to understand how $M$ distorts right angles. In this case, $2\phi=\pi/2$ , leading to $\tan(\phi) = \cot(\phi) = 1$ , which you may plug into the preceding formulas.

Note that the smaller $\lambda_2$ becomes, the more extreme these angles become and the greater is the distortion.

This figure shows four configurations of vectors $A$ and $B$ separated by an angle of $2\phi = \pi/3$ . The unit circle and its elliptical image under $M$ are shaded for reference (with the action of $M$ uniformly rescaled to make $\lambda_1=1$ ). The figure headings indicate the value of $\theta$ , the midpoint of $A$ and $B$ . The closest any such $A$ and $B$ can come when transformed by $M$ is a configuration like the one at the left with $\theta=0$ . The furthest apart they can be is a configuration like the one at the right with $\theta=\pi/2$ . Two intermediate possibilities are shown.

Solution for all dimensions

We have seen how $M$ acts by expanding each dimension $i$ by a factor $\lambda_i$ . This will distort the unit sphere $\{A\,|\, A^\prime A = 1\}$ into an ellipsoid. The $e_i$ determine its principal axes. The $\lambda_i$ are the distances from the origin, along these axes, to the ellipsoid. Consequently the smallest one, $\lambda_n$ , is the shortest distance (in any direction) from the origin to the ellipsoid and the largest one, $\lambda_1$ , is the furthest distance (in any direction) from the origin to the ellipsoid.

In higher dimensions $n\gt 2$ , $A$ and $B$ are part of a two-dimensional subspace. $M$ maps the unit circle in this subspace into the intersection of the ellipsoid with a plane containing $MA$ and $MB$ . This intersection, being a linear distortion of a circle, is an ellipse. Obviously the furthest distance to this ellipse is no more than $\lambda_1=1$ and the shortest distance is no less than $\lambda_n$ .

As we observed at the end of the preceding section, the most extreme possibility is when $A$ and $B$ are situated in a plane containing two of the $e_i$ for which the ratio of the corresponding $\lambda_i$ is as small as possible. This will happen in the $e_1, e_n$ plane. We already have the solution for that case.

Conclusions

The extremes of cosine similarity attainable by applying $M$ to two vectors having cosine similarity $\cos(2\phi)$ are given by $(2)$ and $(3)$ . They are attained by situating $A$ and $B$ at equal angles to a direction in which $\Sigma=M^\prime M$ maximally lengthens any vector (such as the $e_1$ direction) and separating them in a direction in which $\Sigma$ minimally lengthens any vector (such as the $e_n$ direction).

These extremes can be computed in terms of the SVD of $M$ .

— whuber
स्रोत

This is a fantastic answer! Thank you very much for this detailed discussion! I believe that you have a sign mistake in eqn (3) where you should just have an overall minus sign.

— LFH

I'm interested in the case where the angle

2ϕ $2\phi$ approaches zero and I would like to get an inequality between

2ϕ $2\phi$ and

f $f$ . Is it true that based on your computation, I just need to find the most extreme (that is smallest)

λn $\lambda_n$ and in this case, the asymptotic inequality is given by

2λnϕ≤f≤2λ−1nϕ $2\lambda_n\phi\leq f\leq 2\lambda_n^{-1}\phi$ as

ϕ→0 $\phi\to0$ ?

— LFH

6

You are probably interested in:

(M A, M B) = A T (M T M) B,

$(MA,MB)=A^T(M^TM)B,$

You can diagonalize $M^TM=U\Sigma U^T$ (or as you folks call it, PCA), which tells you that the similarity of $A,B$ under transformation $M$ behaves by projecting $A,B$ onto your principal components, and subsequently calculating similarity in this new space. To flesh this out a bit more, let the principal components be $u_i$ with eigenvalues $\lambda_i$ . Then

U B = \sum i (u i, b i) u i, U A = \sum i (u i, a i) u i,

$UB=\sum_i(u_i,b_i)u_i, \ UA=\sum_i(u_i,a_i)u_i,$

which gives you:

(M A, M B) = \sum i = 1 n (u i, a i) (u i, b i) λ i .

$(MA,MB)=\sum_{i=1}^n (u_i,a_i)(u_i,b_i)\lambda_i.$

Notice that there is a scaling going on here: the $\lambda_i$ are stretching/shrinking. When $A,B$ are unit vectors and if every $\lambda_i=1$ , then $M$ corresponds to a rotation, and you get: $\mbox{sim}(MA,MB)=\mbox{sim}(A,B)$ , which is equivalent to saying that inner products are invariant under rotations. In general, the angle stays the same when $M$ is a conformal transformation, which in this case requires that $M$ is invertible and the polar decomposition of $M$ satisfies $M=OP$ with $P=aI$ , i.e. $M^TM=a^2I$ .

— Alex R.
स्रोत

1

Your initial statement of the problem neglects the normalization of the vectors

A $A$ ,

B $B$ ,

MA $MA$ , and

MB $MB$ required to compute the cosine similarity. It does not appear that the subsequent analysis addresses this normalization, either. Note, in particular, that the cosine similarities are preserved even when all the eigenvalues are equal to some (positive) value that differs from

$1$ . That demonstrates, even in this simple case, that much more can be said.

— whuber

@whuber: cosine similarity is preserved exactly when

$M$ is a conformal transformation, which in this case is equivalent to requiring

$M$ to be invertible and

$M^TM=a^2I$ , a multiple of the identity. Said another way, the polar decomposition of

$M$ satisfies

$M=OP$ , where

$P=aI$ . You're right about normalization but, it seems silly to talk about cosine similarity with non-normalized vectors

$A,B$ .

— Alex R.

2

Not silly at all! Since this "similarity" is given by the cosine of the angle between the vectors, it makes sense for any two non-zero vectors. What I meant by "much more can be said" is that effective bounds on the angle between the images of

$A$ and

$B$ can be obtained in terms of the angle between

$A$ and

$B$ and the eigenvalues of

$M$ .

— whuber