Does nearest neighbor distance approach farthest neighbor distance in high-dimensional space?

I was reading a classical ANN paper that discusses the cases when the distance difference between the nearest neighbor and the farthest neighbor becomes negligible in high-dimensional space. In Section 3.5.3, it provides an interesting manually constructed example, but without detailed proof steps. I was writing this blog to complement the process.

Problem: Let Xm=(X1,X2,,Xm)\vec X_m=(X_1,X_2,\cdots,X_m) be a random mm-dimensional vector, such that X1=U1X_1=U_1 and Xi=Ui+Xi1/2X_i=U_i+X_{i-1}/2, where UiUniform(0,i)U_i\sim \text{Uniform}(0,\sqrt{i}) is a continuous uniform random variable. Now, we sample two random vectors Pm,Qm\vec P_m,\vec Q_m independently from the distribution of Xm\vec X_m. What is the expected L2L_2 distance and variance of distance between Pm\vec P_m and Qm\vec Q_m?

The conclusion in the paper is 19m2+13m+o(m)\frac{1}{9}m^2+\frac{1}{3}m+o(m), but my results show the more accurate form should be 19m2+127m+o(m)\frac{1}{9}m^2+\frac{1}{27}m+o(m).

Solution: The expected distance between Pm\vec P_m and Qm\vec Q_m could be computed as follows:

E[PmQm22]=E[i=1m(PiQi)2]=i=1mE[(PiQi)2]E\left[||\vec P_m-\vec Q_m||_2^2\right]=E\left[\sum_{i=1}^m (P_i-Q_i)^2\right]=\sum_{i=1}^m E\left[(P_i-Q_i)^2\right]

Since Var(PiQi)=E[(PiQi)2](E[PiQi])2Var(P_i-Q_i)=E[(P_i-Q_i)^2]-(E[P_i-Q_i])^2 and E[PiQi]=0E[P_i-Q_i]=0, the above formulae transforms to solving:

Var(PiQi)=Var(Pi)+Var(Qi)2Cov(Pi,Qi)=2Var(Xi)Var(P_i-Q_i)=Var(P_i)+Var(Q_i)-2Cov(P_i,Q_i)=2Var(X_i)

Note that Cov(Pi,Qi)=0Cov(P_i,Q_i)=0 by independence. Then, our next step is to compute Var(Xi)Var(X_i).

It is not hard to see Xi1X_{i-1} and UiU_i are independent. Therefore, Var(Xi)=Var(Ui)+Var(Xi1)/4Var(X_i)=Var(U_i)+Var(X_{i-1})/4. Since Var(Ui)=i/12Var(U_i)=i/12, we have Var(Xi)=i/12+Var(Xi1)/4Var(X_i)=i/12+Var(X_{i-1})/4. By solving the above recurrence relation, we have:

Var(Xi)=127(1/4)i+19i127Var(X_i)=\frac{1}{27}(1/4)^i+\frac{1}{9}i-\frac{1}{27}

How to solve the recurrence (I do not know how, following is GPT’s answer but I verified):

  • We first solve the homogeneous part of the recurrence, which is Var(Xi)=Var(Xi1)/4Var(X_i)=Var(X_{i-1})/4. The characteristic equation for this homogeneous part is r=1/4r=1/4, so the general solution to the homogeneous part is C(1/4)iC(1/4)^i for some constant CC.

  • Next, we find a particular solution to the non-homogeneous recurrence. We can try a particular solution of the form Var(Xi)=Ai+BVar(X_i)=A*i+B for some constants AA and BB. Substituting this into the original recurrence gives us:

Ai+B=112i+14(A(i1)+B)Ai+B=(112+14A)i+14(BA)\begin{align} &\quad A*i+B = \frac{1}{12}i + \frac{1}{4}(A*(i-1)+B) \\&\Rightarrow A*i+B=\left(\frac{1}{12}+\frac{1}{4}A\right)i+\frac{1}{4}(B-A) \end{align}

  • We need to match the coefficients of ii and the constant terms on both sides of the equation. This gives us the following system of equations:

{A=112+14AB=14(BA)\begin{cases} A=\frac{1}{12}+\frac{1}{4}A \\ B=\frac{1}{4}(B-A) \end{cases}

  • Then A=19A=\frac{1}{9} and B=127B=-\frac{1}{27}. So the non-homogeneous solution is Var(Xi)=19i127Var(X_i)=\frac{1}{9}i-\frac{1}{27}.

  • Putting everything together, the general solution to the original recurrence is Var(Xi)=C(1/4)i+19i127Var(X_i)=C(1/4)^i+\frac{1}{9}i-\frac{1}{27}. Since Var(X1)=1/12Var(X_1)=1/12, we can solve for CC to get C=127C=\frac{1}{27}. Therefore, the final solution to the recurrence is:

Var(Xi)=127(1/4)i+19i127Var(X_i)=\frac{1}{27}(1/4)^i+\frac{1}{9}i-\frac{1}{27}

Now back to our goal, we need to summarize 2Var(Xi)2Var(X_i) for i=1,2,,mi=1,2,\cdots,m to get the expected distance between Pm\vec P_m and Qm\vec Q_m:

E[PmQm22]=i=1m2Var(Xi)=281(114m)+19m(m+1)227m=19m2+127m+o(m)\begin{align} E\left[\|\vec P_m-\vec Q_m\|_2^2\right]&=\sum_{i=1}^m 2Var(X_i)=\frac{2}{81}\left(1-\frac{1}{4^m}\right)+\frac{1}{9}m(m+1)-\frac{2}{27}m\\&=\frac{1}{9}m^2+\frac{1}{27}m+o(m) \end{align}

I am not writing the variance of distance here since it is much more complicated. Anyway, the difference of our results does not affect the main conclusion of the paper.