In addition to the already-posted answers (which have been very helpful to me!), there is a geometric explanation for the connection between the L2 norm and the mean.
To use the same notation as chefwen, the formula for L2 loss is:
L2=1k∑i=1k(yi−β)2
We wish to find the value of β which minimizes L2. Notice that this is equivalent to minimizing the following, since multiplying by k and taking the square root both preserve order:
∑i=1k(yi−β)2−−−−−−−−−−⎷
If you consider the data vector y as a point in k-dimensional space, this formula calculates the Euclidean distance between the point y and the point β⃗ =(β,β,...,β).
So the problem is to find the value β which minimizes the Euclidean distance between the points y and β⃗ . Since the possible values of β⃗ all lie on the line parallel to 1⃗ =(1,1,...,1) by definition, this is equivalent to finding the vector projection of y onto 1⃗ .
It's only really possible to visualize this when k=2, but here is an example where y=(2,6). As shown, projecting onto 1⃗ yields (4,4) as we expect.
To show that this projection always yields the mean (including when k>2), we can apply the formula for projection:
β⃗ β=proj1⃗ y=y⋅1⃗ |1⃗ |21⃗ =∑ki=1yik