Painting Coloured Balls

Source: Asked to me by Sudeep Kamath (Third year PhD Student, UC at Berkeley, EE IITB Alumnus)

Problem: A box contains n balls coloured 1 to n. Each time you pick two balls from the bin - the first ball and the second ball, both uniformly at random and you paint the second ball with the colour of the first. Then, you put both balls back into the box. What is the expected number of times this needs to be done so that all balls in the box have the same colour?

1. The problem can be reduced to the following: We have been given a graph on n vertices which has nmo edges at all in the graph. There is a bag which contains all n^C_2 possible edges for the graph. We add edges one by one into the initially empty graph. At each step, we sample a random edge from the bag and add it to the graph. The corresponding sampling is with replacement. We stop as soon as the
graph becomes connected.

2. The problem can be solved by partitioning into stages i.e. initially the graph contains n components (since initially the graph has no edges at all so every vertex comprises of an independent component). Each time we add an edge the number of components either remain same or decrease by 1.

Let r.v. X_k denote the number of edges added be to reduce component size from n-k+1 to to n-k

X= total no. of edges added= X_1+X_2+...X_(n-1)

Compute E[X_k] and use linearity of expectation to find E[X]

3. @Mayank.. The flow makes sense.

Am I missing something here? How would you calculate E[X_k]?

4. Overwriting my last comment, I don't think this flow is correct. Its not that you are only adding edges and counting the component sets. Sometimes, edges have to be removed too. That creates more complications. Can you please clarify your solution?

5. @Pratik :
Under what circumstances do you want to remove an edge?

As i understand, the number of edges is equal to the number of times the original process is carried out.

6. I believe that the graph needs to a directed graph but even in that there are problems. For eg. for n=3.. even if all vertices are connected it does not mean that all have same colour. Also even if we have a directed flow which connects all vertices still all vertices may not have the same colour. An edge may be replaced by another edge of opposite direction or even of the same direction..

From the expectations for n=1,..4 I think that the answer is (n-1)^2. Cant think of a proof though

7. If you join point 1 and point 2 (both have colour 1). Then join point 2 and point 3 (both get colour 1). Then join point 3 and point 4 (both get colour 1). Now join point 3 with point 5 (so, point 3 gets colour 5, while the rest have colour 1.)

How can you show this using your approach of drawing the graph?

8. we can also have an interminable loop in the graph if the following situation should arise:

we pick up ball 1 first. Then ball 2. So ball 2 will have color of ball 1. next try, we pick up ball 3 first and then ball 2. so, ball 2 will now have color of ball 3. next try, we pick up ball 1 first and then ball 2. So, ball 2 will have color of ball 1 again. next we pick up ball 3 and the situation is repeated and ball 2 is being painted repeatedly.

9. This isn't my solution. I saw the major ideas in a related paper and I am only sketching them here.

The main idea is to note that if we concentrate on just one color c_i and ask how many balls of color c_i are there after n steps, it is a simple martingale problem. If at some stage, there are k balls of color c_i, in the next stage, there are k+1 or k-1 balls with probability p_k and k balls with probability 1-2p_k where p_k = k*(n-k)/(n*(n-1)).

Let N be the number of steps needed for all balls to be of the same color. Since we know how to handle things if we focus exclusively on one color, lets compute E(N) under the condition that c_i is the surviving color, i.e., lets compute E(N . 1(S_i)) where S_i is the event that c_i is the surviving color and 1(S_i) is the indicator random variable of S_i.

Let N_i = E(N 1(S_i)). To compute N_i, we take the familiar recursive expectation approach. Let f(k) = E(N 1(S_i)) given we start with k balls of color i, i.e., E(N 1(S_i) | k). Condition on the first pick but the recursion is slightly tricky due to the presence of the multiplication with the indicator rv. With probability p_k, it becomes E((1+N) 1(S_i) | k+1). With prob. p_k, it becomes E((1+N) 1(S_i) | k-1). With prob. (1-2p_k), it becomes E((1+N) 1(S_i) | k). Now, we use linearity of expectation and the fact that E(1(S_i)|k) = P(S_i | k). So, the recursion becomes f(k) = p_k f(k+1) + p_k f(k-1) + (1-2p_k) f(k) + p_k P(S_i | k+1) + p_k P(S_i | k-1) +(1-2p_k) P(S_i|k)

It is easy to show that P(S_i | k) = k/n. So the recursion becomes f(k) = p_k f(k+1) + p_k f(k-1) + (1-2p_k) f(k) + p_k where p_k = k(n-k)/(n*(n-1)). We have f(n) = 0 and f(0) = 0 and we need to compute f(1). The recursion works out to f(1) = (n-1)^2/n.

Since N = \sum_{i=1}^n N 1(S_i), we have E(N) = n E(N 1(S_i)) = (n-1)^2.

As you can see, this is a series of delicate arguments put together. I don't claim to fully understand the logic though I can follow it step by step. If anyone can elaborate on the approach, that would be great. Just to reiterate, I won't claim any credit for the proof.

10. One flaw in the solution posted by Dinesh:

From the way the question has been posed, I would assume that the two balls are drawn without replacement.

So, the probability from f(k) to f(k+1) and the probability from f(k) to f(k-1) are unequal - unlike what's taken in Dinesh's solution.

Another flaw that I can point to:
I am sure that
f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + 1
But using Dinesh's method, it comes out to be
f(k) = p_k f(k+1) + p_k f(k-1) + (1-2p_k) f(k) + p_k
I am not able to find a mistake in Dinesh's argument though!

But overall, the idea looks correct to me.

The main idea is to note that if we concentrate on just one color c_i and ask how many balls of color c_i are there after n steps, it is a simple martingale problem. If at some stage, there are k balls of color c_i, in the next stage, there are k+1 with prob p1_k, or k-1 balls with probability p2_k and k balls with probability p3_k.

Let N be the number of steps needed for all balls to be of the same color. Since we know how to handle things if we focus exclusively on one color, lets compute E(N) under the condition that c_i is the surviving color, i.e., lets compute E(N . 1(S_i)) where S_i is the event that c_i is the surviving color and 1(S_i) is the indicator random variable of S_i.

Let N_i = E(N 1(S_i)). To compute N_i, we take the familiar recursive expectation approach. Let f(k) = E(N 1(S_i)) given we start with k balls of color i, i.e., E(N 1(S_i) | k). Condition on the first pick but the recursion is slightly tricky due to the presence of the multiplication with the indicator rv. With probability p1_k, it becomes E((1+N) 1(S_i) | k+1). With prob. p2_k, it becomes E((1+N) 1(S_i) | k-1). With prob. p3_k, it becomes E((1+N) 1(S_i) | k). Now, we use linearity of expectation and the fact that E(1(S_i)|k) = P(S_i | k). So, the recursion becomes f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + p1_k P(S_i | k+1) + p2_k P(S_i | k-1) + p3_k P(S_i|k)

Simplifying,
f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + 1

f(0)=0, f(n)=0
Assume we find f(1) as a function of n

Since N = \sum_{i=1}^n N 1(S_i), we have E(N) = n E(N 1(S_i)) = n*f(1)

The only part left is to solve the recurrence relation.

11. @ Pratik: I assumed that the balls are taken without replacement. Assume that at some point there are k balls of color c_i and n-k balls of some other color. Then, at the next picking, we either pick [c_i c_i], [c_i c_j], [c_j c_i] or [c_j c_j] where c_j is some color other than c_i. The respective probabilities are (k/n)(k-1/n-1) , (k/n)(n-k/n-1), (n-k/n)(k/n-1) and (n-k/n)(n-k-1/n-1). Of these the case [c_i c_j] increases the number of balls of color c_i to k+1 while [c_j c_i] reduces it to k-1. Both occur with the same probability, viz., k*(n-k)/(n*(n-1)). So they both do occur with the same probability. Am I missing something?

12. Also, I don't agree with the derivation :

"f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + p1_k P(S_i | k+1) + p2_k P(S_i | k-1) + p3_k P(S_i|k)
Simplifying,
f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + 1"

We know P(S_i | k) = k/n. So the simplification for the last 3 terms is p1_k(k+1)/n + p2_k(k-1)/n + p3_k(k/n) = (k/n) + (p1_k-p2_k)/n. Combined with my previous assertion that p1_k = p2_k, we get the recursion
f(k) = p1_k f(k+1) + p2_k f(k-1) + p3_k f(k) + (k/n).

The (k/n) additive factor is important for the recursion to work out reasonably simply. Replacing it by 1 gives a really crazy recursion.