### Sphagetti Breakfast

Source: Very standard problem in Quant interviews (Taken from quantnet, xkcd forums)

Problem:
A bowl of spaghetti contains n strands. Thor picks two ends at random and joins them together. He does this until no ends remain.

What is the
a) expected number of spaghetti loops in the bowl?
b) expected average length of the loops? (in strands)
c) expected number of k-hoops? ( a k-hoop is a loop made from k strands)

1. Fi(k) = probability of k loops after i steps

Fi(0) = π [2(n-i)/{2(n-i)+1}]

Fi(k)= [F(i-1)(k-1)] *[2(n-i)/{2(n-i)+1}] + [F(i-1)(k)] * [1/ {2(n-i) + 1}]
..

Solve the above recursion to obtain the solution for part (a)

Still figuring out a way for rest parts

2. Here is solution for the first part of puzzle. Rest I've not been able to solve yet.

Denote f(k) as a random variable denoting number of loops formed when there are K acyclic strands left.

Now f(k) = p * ( 1 + f(k-1) ) + (1-p) * f(k-1) where p is probability of selecting both ends of the same strand.
p is k / ( 2k * (2k-1) / 2) = 1 / (2k-1)

So f(k) = 1/(2k-1) + f(k-1)
Taking expectation,
E( f(k) ) = 1/(2k-1) + E( f(k-1) )

We need E( f(n) ) which is just : 1/1 + 1/3 + .... 1/ (2n-1)

3. For the second part, what we need is n*(Summation P(k)*(1/k)) where P(k) is the probability that there are exactly k loops. Lets try to define a generating function for this.
Note that at the (i)th stage there are i open strands and 2i open ends. There is a 1/(2i-1) probability of increasing the number of loops by choosing two ends of the same strand, and (2i-2)/(2i-1) probability of not increasing the number of loops. Hence this corresponds to the expression (x + (2i-2))/(2i-1).
Therefore the entire generating function for the number of loops become F(x) = ( (x+(2n-2))*(x+(2n-4))*...*(x+2)*(x+0) )/( (2n-1)*(2n-3)*...*(3)*(1) )
The coefficient of x^k in the above gives us P(k)

Note that to obtain the answer of the first part (Summation P(k)*k), we can differentiate F(x), and put in x=1. Derivative of F(x) is F(x)/(x+2n-2) + F(x)/(x+2n-4) + ... + F(x)/(x+2) + F(x)/(x+0). F(1)=1. So the above expression becomes 1/(2n-1) + 1/(2n-3) + ... + 1/3 + 1/1.

To obtain the answer of the second part, we need to integrate F(x), but after first dividing by x, and then plugging in x=1. However I am not able to find a closed form at the moment.

4. Two small corrections.
There are i open strands and 2i open loops at the (n-i)th stage
And for the second part, we also need to multiply by n at the end. (Ans = n*(Integral of F(x)/x at 1))

5. So the most I can simplify the answer for second part is :
n * ( 1/n + S(1)/(n-1) + S(2)/(n-2) ... + S(n-1)/1 )/((2n-1)*(2n-3)*...*(3)*(1))
where S(k) is Sum of (n choose k) products of k terms each taken from {2,4,..,2n-2}
Eg. For n=5, S(3) = 2*4*6 + 2*4*8 + 2*6*8 + 4*6*8

6. Consider the random variable Xi=1/k if i'th strand is present in a loop of k strands for k={1,2,..n}.
so Xi= 1/k with prob Pk(say)
E(no of loops)=n*E(Xi) as all the strands are identically distrib.

a) First to calculate Pk let us look at I(n)-> Total no of ways of forming loops with n strands = [(2n,2)*(2n-2,2)....*(2,2)]/n!
=[2n!]/[(2^n)*n!]

b) Pk
No.of ways of forming a k-hoop is (n-1,k-1)*(k-1)!*2^(k-1)} *I(n-k))
hence Pk=above expression/I(n).

a) E(no.of loops)=n*exp(Xi)
=n*sum(Pk/K) sum for k over{1,2,3..n}

c) E(no.of k-hoops) can be computed by slightly altering Xi=1/k if it is part of a k-hoop or else 0.
so E(no.of hoops)=n*Pk/k
where as given earlier

Pk=Perm(n-1,k-1)*2^(k-1)*I(n-k)/I(n).

Comments: Closed form in 'c' but in 'a' trying to see if i can get a closed form.And numerically 'a' is matching with NG's solution.
'b' - still figuring out..

7. If we simplify
E(no.of k hoops)=1/2K)*(1+1/(2*n-1))*(1+1/(2*n-3))+(1+1/(2*n-5))+......(1+1/(2*n-2*k+1)).

8. Disclaimer: I haven't verified all the solutions above, but since this seems to be just an instantiation of a well known problem in Discrete Math/Theoretical CS, I though I would point this out.

This seems to me to be a simplified version of what is called the Configuration Model in the theory of Random Regular Graphs.

Consider each strand as a "vertex" consisting of a set of two "nodes". The process then can be described as randomly pairing these 2n nodes (2 for each strand or "vertex") into n pairs, and assuming there is an "edge" between two "vertices" if a node from paired with a node from the other. Thus, self loops are allowed in this hypergraph.

Now notice that a k-length loop in the problem just corresponds to a length k cycle in the random hypergraph constructed above. The expected number of length $k$ cycles in these hypergraphs (even when each vertex has any number $d$ of nodes) is a standard calculation in this setting, and is done roughly in the way Avinash suggested above. We call a "positioning" of a k-cycle to be a collection of paired nodes which project to some k-cycle on the vertices of the hypergraph. The number of possible "positionings" in the hypergraph of an oriented k-cycle with a distinguished first vertex is 2kN_k = n_k * 2^k, where n_k = n(n-1)...(n-k+1), and N_k is the number of positionings without a distinguished first vertex and without considering orientation. This is because you can chose the vertices in n_k ways, and then select the nodes that get paired in each direction for each of the k vertices in 2^k ways, and then you have to divide by 2k to take care of the multiple counting of the same cycle due to the orientation and the distinguished first vertices.

Now the probability p_k that these 2k nodes are actually matched as required by such a "positioning" is p_k = (2n-2k)!!/(2n)!!, where (2l)!! is the number of pairings of 2l objects, given by (2l)!! = (2l)!/(2^l l!).

Thus, the expectation of $X_k$, the number of $k$-hoops is $E[X_k] = p_k*N_k$ and expectation of the the number of hoops is $\sum_{k=1}^n p_k N_k$. The same calculation can be done if each strand has "degree" $d > 2$, and is very useful in random graph theory. A reference is the chapter on random regular graphs in the book Random Graphs by Svante Janson et al.

However, I am not sure of the vanilla calculations above apply to part (b). The reason is that we now have a very different beast: the expectation of the inverse of random variables, and I can't think of anything beyond the generating function method proposed by Rudradev above.

In fact, I am not immediately sure how to do something better even in the asymptotic (n->\infty) setting. In this case, it can be shown that for any fixed $k$, the distribution of (X_1, X_2, .....X_k) converges to the distribution of independent Poisson random variables (Z_1, Z_2, ....Z_k) where Z_i ~ Po(1/2i). However, we would need to handle non-constant k to solve b and so this does not seem to be of much help.

9. At each step, we either loop an existing strand or join two strands together. So, 1 strand goes out of the picture at each iteration. Or think of it in this way; after step i, you are left with only N-i strands. Now, probablity that you make a loop at step i, is 1/(N-i+1) where i varies from 1 to N. Summing this up gives us logN approximately. Just to be clear, I have used the logic of indicator random variables.