Suppose that pattern \(P\) and text \(T\) are randomly chosen strings of length \(m\) and \(n\), respectively, from the \(d\)-ary alphabet \(\Sigma_d = \{0, 1, \ldots, d-1 \}\), where \(d \ge 2\). Show that the expected number of character-to-character comparisons made by the implicit loop in line 4 of the naive algorithm is
\[(n - m + 1) \frac{1-d^{-m}}{1-d^{-1}} \le 2(n - m + 1)\]over all executions of this loop. (Assume that the naive algorithm stops comparing characters for a given shift once it finds a mismatch or matches the entire pattern.) Thus, for randomly chosen strings, the naive algorithm is quite efficient.
Line 4 of the NAIVE-STRING-MATCHER algorithm runs once for each value of \(s\) between \(0\) and \(n-m\), or \(n-m+1\) times. How many times Line 4 runs depends on whether we find matches or not - more formally: The probability that line 4 will perform a comparison on the \(i\)th character of \(T\) is the probability that the first \(i - 1\) characters matched in this sequence: \(\frac{1}{d^{i-1}}\). We will compare at least one character and at most \(m\) characters, so for a given \(s\) the expected number of runs is \(\sum\limits^{m}_{i=1}\frac{1}{d^{i-1}}\).
So now we know the number of times the loops represented on lines 3 and 4 will run, we can use linearity of expectation to combine them together and then leverage the properties of a geometric series to simplify:
\[(n-m+1) \sum\limits^{m}_{i=1}\frac{1}{d^{i-1}} = (n-m+1) \frac{1 - d^{-m} }{1 - d^{-1}}\]At this point, it’d be convenient for us to drop the \(1 - d^{-m}\) factor. For any \(d \ge 2\) and \(m \ge 1\),
\[0 < d^{-m} \le \frac{1}{2} \implies 0 < 1 - d^{-m} <1\]Which means:
\[(n-m+1) \sum\limits^{m}_{i=1}\frac{1}{d^{i-1}} \le (n - m + 1) \frac{1}{1-d^{-1}}\]Furthermore, since \(d \ge 2\) then \(\frac{1}{d} \le \frac{1}{2}\), and so:
\[1-d^{-1} \ge 1 - \frac{1}{2} = \frac{1}{2}\]As well as:
\[\frac{1}{1-d^{-1}} \le \frac{1}{1 - \frac{1}{2}} = \frac{1}{\frac{1}{2}} = 2\]Therefore:
\[(n-m+1) \sum\limits^{m}_{i=1}\frac{1}{d^{i-1}} \le 2(n - m + 1)\]