CS141 BB: ClassS04CS141/Hwk4Soln

Hwk4 solutions

maximum shared subsequence

Given two sequences, the shared subsequence is a sequence that is a subsequence of both sequences.

A maximum-size shared subsequence is a shared subsequence of maximum size.

For example, the maximum-size shared subsequence of (3,2,8,2,3,9,4,3,9) and (1,3,2,3,7,9) is (3,2,3,9).

In this homework we will develop a dynamic programming algorithm to find a maximum-size shared subsequence of two subsequences (the two subsequences will be given as input).

1. What is the maximum shared subsequence of each of the following pairs of sequences?

1A: (10, 3,2,8,2,3,9,4,3,9) and (10, 1,3,2,3,7,9)

: 10, 3, 2, 3, 9

1B: (1, 3,2,8,2,3,9,4,3,9) and (10, 1,3,2,3,7,9)

: 1, 3, 2, 3, 9

2. Prove the following claims:

Given two sequences A[1..n] = (A[1], A[2],..., A[n]) and B[1..m] = (B[1], B[2], ...., B[m]), define MCS(A, n, B, m) to be the maximum size of any subsequence shared by A[1..n] and B[1..m].

2A: If A[n] = B[m], then MCS(A, n, B, m) ≥ 1 + MCS(A, n-1, B, m-1).

: Let S be a maximum common subsequence of A[1..n-1] and B[1..m-1], so that |S| (the size of S) is MCS(A, n-1, B, m-1).
: Let S' be the sequence containing S followed by A[n].
: Then S' is a common subsequence of A[1..n] and B[1..m].
: Thus, MCS(A, n, B, m) ≥ |S'| = |S| + 1 = 1 + MCS(A, n-1, B, m-1).

2B: In all cases, MCS(A, n, B, m) ≤ 1 + MCS(A, n-1, B, m-1).

: Let S be a maximum common subseqence of A[1..n] and B[1..m].
: If S is empty (of size 0), then clearly the inequality holds.
: Otherwise, let S' be the sequence obtained from S by removing the last element of S.
: Then S' is a common subsequence of A[1..n-1] and B[1..m-1].
: Thus, MCS(A, n-1, B, m-1) ≥ |S'| = |S|-1 = MCS(A,n,B,m) - 1.
: Rewriting gives MCS(A, n, B, m) ≤ 1 + MCS(A,n-1,B,m-1) .

2C: If A[n] = B[m] then MCS(A, n, B, m) = 1 + MCS(A, n-1, B, m-1). (You may use the facts you proved in 2A and 2B.)

: From 2A and 2B, it follows directly that:

If A[n] = B[m] then MCS(A, n, B, m) ≤ 1 + MCS(A, n-1, B, m-1) and MCS(A, n, B, m) ≥ 1 + MCS(A, n-1, B, m-1).
: Thus,

If A[n] = B[m] then MCS(A, n, B, m) = 1 + MCS(A, n-1, B, m-1).

2D: In all cases, MCS(A, n, B, m) ≥ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).

: Let S be a maximum common subsequence of A[1..n-1] and B[1..m].
: Then S is also a common subsequence of A[1..n] and B[1..m].
: Thus MCS(A,n,B,m) ≥ |S| = MCS(A, n-1, B, m).
: A similar argument shows MCS(A,n,B,m) ≥ MCS(A, n, B, m-1).
: Thus,MCS(A,n,B,m) ≥ max { MCS(A, n, B, m-1), MCS(A,n-1,B,m) }.

2E: If A[n] ≠ B[m], then MCS(A, n, B, m) ≤ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).

: Assume A[n] ≠ B[m].
: Let S be a maximum common subsequence of A[1..n] and B[1..m].
: Since A[n] ≠ B[m], either S does not end in A[n], or S does not end in B[m] (or both).
: Thus, either S is a subsequence of A[1..n-1], or S is a subsequence of B[1..m-1] (or both).
: Suppose S is a subsequence of A[1..n-1].

Since S is also a subsequence of B[1..m], it follows that S is a common subsequence of A[1..n-1] and B[1..m].
Thus, MCS(A,n,B,m) = |S| ≤ MCS(A, n-1, B, m).
: If S is not a subsequence of A[1..n-1], then S is a subsequence of B[1..m-1].

In this case, a similar argument shows MCS(A,n,B,m) = |S| ≤ MCS(A, n, B, m-1).
: We conclude that either MCS(A,n,B,m) ≤ MCS(A, n-1, B, m)
: or MCS(A,n,B,m) ≤ MCS(A, n, B, m-1) (or both).
: Thus, MCS(A,n,B,m) ≤ max { MCS(A, n, B, m-1), MCS(A, n-1, B, m) }.

2F: If A[n] ≠ B[m], then MCS(A, n, B, m) = max(MCS(A, n-1, B, m), MCS(A, n, B, m-1). (You may use the facts you proved in 2D and 2E.)

: From 2D and 2E it follows that

If A[n] ≠ B[m], then MCS(A, n, B, m) ≤ max{ MCS(A, n-1, B, m), MCS(A, n, B, m-1) } AND MCS(A, n, B, m) ≥ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).
: This is equivalent to

If A[n] ≠ B[m], then MCS(A, n, B, m) = max{ MCS(A, n-1, B, m), MCS(A, n, B, m-1) }.

3. The facts proved in 2C and 2F lead to the following recursive algorithm to compute MCS(A, n, B, m):

 MCS(A, n, B, m)
   if (n == 0 or m ==0) return 0;
   if (A[n] == B[m]) return 1+MCS(A, n-1, B, m-1);
   return max(MCS(A, n, B, m-1), MCS(A, n-1, B, m));

3A: Give the best big-Ω lower bound you can on worst-case running time of MCS(A, n, B, m) as a function of n and m. Explain your reasoning.

: In the case when A[i] \neq B[j] for all i, j, the recursion tree for MCS(A, n, B, m) has branching factor 2
: at every node at depth min(n, m) or less. Thus, the recursion tree contains at least 2^min(n,m) nodes.
: Thus, the worst-case running time is at least Ω(2^min(n,m)).

3B: Give the best big-O upper bound you can on worst-case running time of MCS(A, n, B, m) as a function of n and m. Explain your reasoning.

: The recursion tree for MCS(A, n, B, m) has branching factor at most 2 at every node,
: and has depth at most n+m. Thus, the recursion tree contains at most 2^n+m nodes.
: Since O(1) work is done for each node in the recursion tree,
: the worst-case running time is at most O(2^n+m).

4A: Precisely describe a faster algorithm (running in time O(n m)) for computing MCS(A, n, B, m).

: Modify the recursive algorithm above to cache answers.
: Alternatively, use dynamic programming to compute MCS(A, i, B, j) for 1 ≤ i ≤ n and 1 ≤ j ≤ m "bottom up".

4B: Explain why your algorithm is correct.

: The correctness of the algorithm follows from the recurrence relation proved in problem 2.

4C: Give the best big-O upper bound you can on the worst-case running time of your algorithm, in terms of n and m. Explain your reasoning.

: O(n m), because there are at most n*m distinct subproblems, and each subproblem requires O(1) work.

5. Implement your algorithm and use it to find the maximum size of any subsequence shared by the following two sequences:

int A[100] = {48, 29, 25, 7, 21, 32, 32, 13, 38, 16, 13, 29, 8, 28, 0, 21, 11, 27, 17, 44, 28, 10, 49, 23, 20, 33, 35, 40, 4, 15, 40, 34, 23, 40, 3, 39, 26, 45, 16, 23, 22, 39, 25, 32, 2, 34, 3, 46, 16, 19, 4, 25, 36, 14, 37, 30, 34, 49, 5, 9, 32, 19, 19, 6, 33, 9, 28, 32, 1, 29, 41, 42, 11, 12, 31, 13, 33, 5, 31, 6, 35, 10, 27, 36, 45, 48, 38, 5, 27, 21, 34, 23, 11, 20, 22, 25, 11, 44, 3, 32};

int B[100] = {33, 31, 9, 41, 49, 35, 12, 3, 43, 2, 47, 43, 11, 29, 11, 24, 4, 15, 28, 48, 3, 28, 9, 20, 10, 0, 1, 26, 35, 37, 48, 26, 32, 8, 14, 48, 9, 45, 16, 27, 13, 21, 6, 28, 36, 1, 16, 4, 41, 33, 49, 36, 20, 44, 46, 26, 36, 42, 22, 29, 29, 24, 30, 3, 20, 42, 3, 36, 14, 1, 44, 26, 35, 9, 47, 32, 43, 47, 29, 45, 36, 20, 0, 48, 10, 18, 40, 20, 41, 42, 11, 5, 30, 32, 46, 20, 38, 9, 19, 24};

You may use C++, Python, or Perl to implement your algorithm. Include a print-out of your algorithm and the length of the shared subsequence that it finds.

The answer I get is 21.

Here is my code:

#include <iostream.h>

#define N 100

int A[N] = {48, 29, 25, 7, 21, 32, 32, 13, 38, 16, 13, 29, 8, 28, 0, 21, 11, 27, 17, 44, 28, 10, 49, 23, 20, 33, 35, 40, 4, 15, 40, 34, 23, 40, 3, 39, 26, 45, 16, 23, 22, 39, 25, 32,  2, 34, 3, 46, 16, 19, 4, 25, 36, 14, 37, 30, 34, 49, 5, 9, 32, 19, 19, 6, 33, 9, 28, 32, 1, 29, 41, 42, 11, 12, 31, 13, 33, 5, 31, 6, 35, 10, 27, 36, 45, 48, 38, 5, 27, 21, 34, 23,  11, 20, 22, 25, 11, 44, 3, 32};

int B[N] = {33, 31, 9, 41, 49, 35, 12, 3, 43, 2, 47, 43, 11, 29, 11, 24, 4, 15, 28, 48, 3, 28, 9, 20, 10, 0, 1, 26, 35, 37, 48, 26, 32, 8, 14, 48, 9, 45, 16, 27, 13, 21, 6, 28, 36, 1 , 16, 4, 41, 33, 49, 36, 20, 44, 46, 26, 36, 42, 22, 29, 29, 24, 30, 3, 20, 42, 3, 36, 14, 1, 44, 26, 35, 9, 47, 32, 43, 47, 29, 45, 36, 20, 0, 48, 10, 18, 40, 20, 41, 42, 11, 5, 30,  32, 46, 20, 38, 9, 19, 24};

int max(int i, int j) {
  return i > j ? i : j;
}

main() {
  int MCS[N+1][N+1];

  int best = 0;
  int besti = 0, bestj = 0;

  for (int i = 0;  i <= N;  ++i) {
    MCS[0][i] = MCS[i][0] = 0;
  }

  for (int i = 1;  i <= N;  ++i)
    for (int j = 1;  j <= N;  ++j)
      if (A[i-1] == B[j-1]) {
        MCS[i][j] = 1 + MCS[i-1][j-1];

        if (MCS[i][j] > best) {
          best = MCS[i][j];
          besti = i;
          bestj = j;
        }
      }  
      else
=>      MCS[i][j] = max(MCS[i-1][j], MCS[i][j-1]);

  cout << best << endl;
  cout << "ending at A[" << besti-1 << "] = " << A[besti-1] << ", B[" << bestj << "] = " << B[bestj-1] << endl;
}