Suppose we want to create a random sample of the set \(\{1, 2, 3, \dots, n\}\), that is, an \(m\)-element subset \(S\), where \(0 \leq m \leq n\), such that each \(m\)-subset is equally likely to be created. One way would be to set \(A[i] = i\) for \(i = 1, 2, 3, \dots, n\), call RANDOMIZE-IN-PLACE(A), and then take just the first \(m\) array elements. This method would make \(n\) calls to the RANDOM procedure. If \(n\) is much larger than \(m\), we can create a random sample with fewer calls to RANDOM. Show that the following recursive procedure returns a random \(m\)-subset \(S\) of \(\{1, 2, 3, \dots, n\}\), in which each \(m\)-subset is equally likely, while making only \(m\) calls to RANDOM:
RANDOM-SAMPLE(\(m, n\))
1
if m == 0:
2
return Ø
3
else S = RANDOM-SAMPLE(m-1, n-1)
4
i = RANDOM(1, n)
5
if i ∈ S:
6
S = S ⋃ {n}
7
else S = S ⋃ {i}
8
return S
RANDOM-SAMPLE(\(m, n\)) can produce \({n \choose m}\) possible \(m\)-subsets. For each \(m\)-subset to be an equally likely output, then each must have a \(1 / {n \choose m}\) chance of being produced. We will prove this by induction.
When \(m = 0\) there is only \({n \choose 0} = 1\) \(0\)-subset that can be produced (the empty set) which trivially occurs with probability \(1 / {n \choose m} = 1\). When \(m = 1\) we know that \(n \notin S\) and so it is once again trivial that a valid \(1\)-subset is produced with the desired probability \(1 / {n \choose 1} = 1/n\). When \(m > 1\) the situation gets more interesting. Consider two cases, the first where \(n \in S\) and the second where \(n \notin S\).
In the first case, the final recursive call will be either selecting \(n\) or selecting one of the \(m-1\) values already chosen in a previous recursive call. So the probability for each possible combination that includes \(n\) is
\[\frac{m - 1 + 1}{n} \left(1 / {n-1 \choose m-1} \right) = 1/ {n \choose m}\]In the second case, the final recursive call will have selected one of the \(n-m\) values that has not been selected yet. The probability for each possible combination that doesn’t include \(n\) is
\[\frac{n-m}{m} \left(1 {n-1 \choose m} \right) = 1/ {n \choose m}\]So we see that all possible \(m\)-subsets generated by this procedure have an equal probability of being selected.