Problem 5-2

Searching an unsorted array

This problem examines three algorithms for searching for a value \(x\) in an unsorted array \(A\) consisting of \(n\) elements.

Consider the following randomized strategy: pick a random index \(i\) into \(A\). If \(A[i] = x\), then we terminate; otherwise, we continue the search by picking a new random index into \(A\). We continue picking random indices into \(A\) until we find an index \(j\) such that \(A[j] = x\) or until we have checked every element of \(A\). Note that we pick from the whole set of indices each time, so that we may examine a given element more than once.

a. Write pseudocode for a procedure RANDOM-SEARCH to implement the strategy above. Be sure that your algorithm terminates when all indices into \(A\) have been picked.

RANDOM-SEARCH(x, A)

01 n = A.length

02 B = new Array()

03 c = 0

04 while c < n:

05 i = RANDOM(1, n)

06 if B[i] != 1:

07 if A[i] == x:

08 return i

09 else:

10 B[i] = 1

11 c = c + 1

12 RETURN NIL

This algorithm tracks the number of checked indices (\(c\)) as well as whether or not that particular value has been checked already (\(B\)). So when we are considering a new index we first check to see if we’ve looked at that index before, moving on to the next index immediately if we have. If we haven’t, then the actual array \(A\) is consulted and the index is recorded to both \(B\) and \(c\) if the value doesn’t match the search target. The containing while loop will terminate once all indices have been checked or once the search target value is found.

b. Suppose that there is exactly one index \(i\) such that \(A[i] = x\). What is the expted number of indices into \(A\) that we must pick before we find \(x\) and RANDOM-SEARCH terminates?

These are simply Bernoulli trials, so the expected number of picks is \(n\). More formally, we define a random variable \(N\) that describes the number of searches required. The expected value of \(N\) is:

\[\begin{split} E[N] &= \sum_{i \ge 1} i \cdot P (i \text{ iterations are required}) \\ &= \sum_{i \ge 1} i \left( \frac{n -1}{n} \right)^{i-1} \left( \frac{1}{n} \right) \\ &= \frac{1}{n} \cdot \frac{1}{(1 - \frac{n-1}{n})^2} \\ &= n \end{split}\]

c. Generalizing your solution to part (b), suppose that there are \(k \ge 1\) indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we find \(x\) and RANDOM-SEARCH terminates? Your answer should be a function of \(n\) and \(k\).

As in b., this is intuitively \(\frac{n}{k}\). Formally:

\[\begin{split} E[N] &= \sum_{i \ge 1} i \cdot P (i \text{ iterations are required}) \\ &= \sum_{i \ge 1} i \left( \frac{n - k}{n} \right)^{i-1} \left( \frac{k}{n} \right) \\ &= \frac{k}{n} \cdot \frac{1}{(1 - \frac{n-k}{n})^2} \\ &= \frac{n}{k} \end{split}\]

d. Suppose that there are no indices \(i\) such that \(A[i] = x\). What is the expected number of indices into \(A\) that we must pick before we have checked all elements of \(A\) and RANDOM-SEARCH terminates?

Suppose we represent array \(A\) as a set of identical bins and repeatedly toss identical balls into a random bin from the array. Now all of a sudden we are dealing with the exact same balls and pins problem described earlier in section 5.4.2. Page 134 describes the steps used to obtain the answer directly, which is \(n(\ln n + O(1))\).

Now consider a deterministic linear search algorithm, which we refer to as DETERMINISTIC-SEARCH. Specifically, the algorithm searches \(A\) for \(x\) in order, considering \(A[1], A[2], A[3], \dots, A[n]\) until either it finds \(A[i] = x\) or it reaches the end of the array. Assume that all possible permutations of the input array are equally likely.

e. Suppose that there is exactly one index \(i\) such that \(A[i] = x\). What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC SEARCH?

In the average-case, the search target is equally likely to be in any position within the array, and so the expected number of checks is \(\frac{n+1}{2}\). In the worst-case, the search target is in the final position of the array meaning we will need to check every position giving us an expected run time of \(\Theta(n)\).

f. Generalizing your solution in part e., suppose that there are \(k \gt 1\) indices \(i\) such that \(A[i] = x\). What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINISTIC-SEARCH? Your answer should be a function of \(n\) and \(k\).

We want to count the number of elements examined before we find one of the \(k\) indices that matches the value we are looking for. Let \(X\) be the random variable which represents this number and \(X_i\) be the indicator variable that we have examined the \(i^{th}\) element. Initially we have:

\[E[X] = \sum\limits^{n}\_{i=1} E [X_i]\]

\(X_i\) has two distinct probabilities, one where \(A[i] = x\) and another where \(A[i] \ne x\). Let \(S=\{i\vert A[i]=x\}\) and \(S' = \{i \vert A[i] \ne x\}\). Now we can continue:

\[\sum\limits^{n}\_{i=1} E [X_i] = \sum*{i \in S} P(X_i) + \sum*{i \in S'} P(X_i)\]

In the first case (\(S\)), \(P(X_i) = \frac{1}{k}\) because only one of the \(k\) indices that represent a match will have been examined. In the other case (\(S'\)), \(P(X_i) = \frac{1}{k+1}\) because we only examine index \(i\) if it appears before every one of the \(k\) solution indices. In order to resolve the summation, note that we know that there are \(k\) indices in \(S\) and so there must be \(n-k\) indices in \(S'\):

\[\sum_{i \in S} P(X_i) + \sum_{i \in S'} P(X_i) = \frac{k}{k} + \frac{n - k}{k + 1} = \frac{n+1}{k+1}\]

This is our average-case running time. In the worst case, all the \(k\) solution indices are found at the very end of the array which has a much more simple running time of \(n - k + 1\).

g. Suppose that there are no indices \(i\) such that \(A[i] = x\). What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time of DETERMINSTIC-SEARCH?

In both cases the algorithm examines every index and then returns NIL. This means both cases have the same running time of \(n\).

Finally, consider a randomized algorithm SCRAMBLE-SEARCH that works by first randomly permuting the input array and then running the deterministic linear search given above on the resulting permuted array.

h. Letting \(k\) be the number of indices \(i\) such that \(A[i] = x\), give the worst-case and expected running times of SCRAMBLE-SEARCH for the cases in which \(k = 0\) and \(k = 1\). Generalize your solution to handle the case in which \(k \ge 1\).

5.2 Scramble Search Meme

i. Which of the three searching algorithms would you use? Explain your answer.

DETERMINISTIC-SEARCH is my choice. SCRAMBLE-SEARCH isn’t worth considering because it just adds unnecessary running time to randomize the array and then proceeds to perform exactly the same as DETERMINISTIC-SEARCH. Both DETERMINISTIC-SEARCH and RANDOM-SEARCH can produce a match early on and therefore may terminate quickly, but while DETERMINISTIC-SEARCH is guaranteed to terminate after at most \(n\) checks, RANDOM-SEARCH has no such limit and may take quite a while to return NIL or find a rare match.