### Distinct numbers in a sample draw

**Source:**Asked to me by Chinmay Chouhan, Junior Undergraduate, CSE IITB

**Problem:**

Given the set of numbers from

*1*to

*n*:

*{ 1, 2, 3 .. n }*

We draw

*n*numbers randomly (with uniform distribution) from this set (with replacement). What is the expected number of distinct values that we would draw?

**Update**(Oct 30, 2011):

Solution posted by Yashoteja Prabhu (RA at Microsoft Research, IITB CSE 2011 Alumnus), Garvit Juniwal (IITB CSE Senior Undergraduate), Dinesh Krithivasan (IITM Alumnus, Phd University of Michigan, Senior Qualcomm Engineer), Nikhil Garg (IITD CSE Senior Undergraduate) and Avinash in comments!

By Induction on number of draws:

ReplyDeleteLet

P(k,i): Probability that after draws there are exactly i distinct numbers

E(k) : Expectation of distinct numbers after k draws

So, P(k+1,i)=i/n*P(k,i) + (n-i+1)/n*P(k,i-1)

Writing E(k+1) using above formula, and after some manipulations,we get:

E(k+1) = (1-1/n)E(k)+1

which leads to a geometric series.

So finally E(k+1) = n-n(1-1/n)^(k+1)

Substituting k+1 = n

E(n) = n[1-(1-1/n)^n]

Hope this is correct

Let y_i be such that

ReplyDeletey_i = 1 if I draw a new number at the i th draw and 0 otherwise.

Now, you have to find E[\sigma y_i] or \sigma E[y_i].

E[y_i] = Number of instances such that a new number is drawn at i th draw / n^n

= n*(n-1)^(i-1)*n^(n-i)/n^n

=(n/n-1)*(n-1/n)^i

\sigma E[y_i] = n*(1-(1-1/n)^n)

Define I_i to be the indicator random variable of drawing the ith number. Then the number of distinct values drawn is just the summation from 1 to n of the probability of I_i. Probability of drawing the ith element in n draws is 1 - (n-1)^n/n^n. So the expected no. of distinct draws is n(1-(1-1/n)^n). The fraction of undrawn numbers approaches 1/e as n tends to infinity.

ReplyDeleteI would imagine that solving the problem more directly by inclusion-exclusion will give the nth partial sum in the infinite expansion of 1/e.

(n+1)/2

ReplyDeleteDenote by Si as indicator variable saying that ith element is selected or not.

ReplyDeleteS = S1 + S2 ... Sn

where S is the desired score( number of distinct) elements.

E(S) = E(S1) + E(S2) ....

E(Si) = prob( ith element is chosen)

= 1 - prob(ith element is not chosen in any draw)

= 1 - ( (n-1)/n )^n

So E(S) = n * ( 1 - ( (n-1) / n)^n )

Sk= 1 if the element is not drawn in k-1 earlier draws.

ReplyDeleteSk=0 else

P(Sk=1) = n*(n-1)^(k-1)]/n^k= {(n-1)/n}*[(n-1)/n]^k

E(numb of distinct elements)=sigma(Pk) for k=1,2,3..n and this forms a GP.

n*(1-([n-1]/n)^n).

indicator r.v x_{i} be 1 when the event - "a distinct number is drawn in ith turn" is true.

ReplyDeleteP(x_{1}=1)=1

P(x_{2}=1)=(n-1)/n

.

.

P(x_{n}=1}= 1/n

P(x_{n+1}=1)=0

.

.

E(no. of turns before you get a number repeated) = 1/n + 2/n + ... n/n = (n+1)/2

Solutions by Yashoteja, Garvit, Dinesh, NG(Nikhil Garg), Avinash are correct.

ReplyDeleteMistake in solution by Abhay and Rohith Mohan Suvvari:

P(X_3=1) is not equal to (n-2)/n

P(X_3=1|X_1=1,X_2=1) is equal to (n-2)/n