ClassS04CS141/Hwk4Soln

ClassS04CS141 | ClassS04CS141 | recent changes | Preferences

Showing revision 2
Difference (from revision 2 to revision 2) (minor diff)
(The revisions are identical or unavailable.)

Hwk4 solutions

maximum shared subsequence

Given two sequences, the shared subsequence is a sequence that is a subsequence of both sequences.

A maximum-size shared subsequence is a shared subsequence of maximum size.

For example, the maximum-size shared subsequence of (3,2,8,2,3,9,4,3,9) and (1,3,2,3,7,9) is (3,2,3,9).

In this homework we will develop a dynamic programming algorithm to find a maximum-size shared subsequence of two subsequences (the two subsequences will be given as input).


1. What is the maximum shared subsequence of each of the following pairs of sequences?

1A: (10, 3,2,8,2,3,9,4,3,9) and (10, 1,3,2,3,7,9)

10, 3, 2, 3, 9

1B: (1, 3,2,8,2,3,9,4,3,9) and (10, 1,3,2,3,7,9)

1, 3, 2, 3, 9


2. Prove the following claims:

Given two sequences A[1..n] = (A[1], A[2],..., A[n]) and B[1..m] = (B[1], B[2], ...., B[m]), define MCS(A, n, B, m) to be the maximum size of any subsequence shared by A[1..n] and B[1..m].

2A: If A[n] = B[m], then MCS(A, n, B, m) ≥ 1 + MCS(A, n-1, B, m-1).

Let S be a maximum common subsequence of A[1..n-1] and B[1..m-1], so that |S| (the size of S) is MCS(A, n-1, B, m-1).
Let S' be the sequence containing S followed by A[n].
Then S' is a common subsequence of A[1..n] and B[1..m].
Thus, MCS(A, n, B, m) ≥ |S'| = |S| + 1 = 1 + MCS(A, n-1, B, m-1).

2B: In all cases, MCS(A, n, B, m) ≤ 1 + MCS(A, n-1, B, m-1).

Let S be a maximum common subseqence of A[1..n] and B[1..m].
If S is empty (of size 0), then clearly the inequality holds.
Otherwise, let S' be the sequence obtained from S by removing the last element of S.
Then S' is a common subsequence of A[1..n-1] and B[1..m-1].
Thus, MCS(A, n-1, B, m-1) ≥ |S'| = |S|-1 = MCS(A,n,B,m) - 1.
Rewriting gives MCS(A, n, B, m) ≤ 1 + MCS(A,n-1,B,m-1) .

2C: If A[n] = B[m] then MCS(A, n, B, m) = 1 + MCS(A, n-1, B, m-1). (You may use the facts you proved in 2A and 2B.)

From 2A and 2B, it follows directly that:
If A[n] = B[m] then MCS(A, n, B, m) ≤ 1 + MCS(A, n-1, B, m-1) and MCS(A, n, B, m) ≥ 1 + MCS(A, n-1, B, m-1).
Thus,
If A[n] = B[m] then MCS(A, n, B, m) = 1 + MCS(A, n-1, B, m-1).

--

2D: In all cases, MCS(A, n, B, m) ≥ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).

Let S be a maximum common subsequence of A[1..n-1] and B[1..m].
Then S is also a common subsequence of A[1..n] and B[1..m].
Thus MCS(A,n,B,m) ≥ |S| = MCS(A, n-1, B, m).
A similar argument shows MCS(A,n,B,m) ≥ MCS(A, n, B, m-1).
Thus,MCS(A,n,B,m) ≥ max { MCS(A, n, B, m-1), MCS(A,n-1,B,m) }.

2E: If A[n] ≠ B[m], then MCS(A, n, B, m) ≤ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).

Assume A[n] ≠ B[m].
Let S be a maximum common subsequence of A[1..n] and B[1..m].
Since A[n] ≠ B[m], either S does not end in A[n], or S does not end in B[m] (or both).
Thus, either S is a subsequence of A[1..n-1], or S is a subsequence of B[1..m-1] (or both).
Suppose S is a subsequence of A[1..n-1].
Since S is also a subsequence of B[1..m], it follows that S is a common subsequence of A[1..n-1] and B[1..m].
Thus, MCS(A,n,B,m) = |S| ≤ MCS(A, n-1, B, m).
If S is not a subsequence of A[1..n-1], then S is a subsequence of B[1..m-1].
In this case, a similar argument shows MCS(A,n,B,m) = |S| ≤ MCS(A, n, B, m-1).
We conclude that either MCS(A,n,B,m) ≤ MCS(A, n-1, B, m)
or MCS(A,n,B,m) ≤ MCS(A, n, B, m-1) (or both).
Thus, MCS(A,n,B,m) ≤ max { MCS(A, n, B, m-1), MCS(A, n-1, B, m) }.

2F: If A[n] ≠ B[m], then MCS(A, n, B, m) = max(MCS(A, n-1, B, m), MCS(A, n, B, m-1). (You may use the facts you proved in 2D and 2E.)

From 2D and 2E it follows that
If A[n] ≠ B[m], then MCS(A, n, B, m) ≤ max{ MCS(A, n-1, B, m), MCS(A, n, B, m-1) } AND MCS(A, n, B, m) ≥ max(MCS(A, n-1, B, m), MCS(A, n, B, m-1).
This is equivalent to
If A[n] ≠ B[m], then MCS(A, n, B, m) = max{ MCS(A, n-1, B, m), MCS(A, n, B, m-1) }.

3. The facts proved in 2C and 2F lead to the following recursive algorithm to compute MCS(A, n, B, m):

 MCS(A, n, B, m)
   if (n == 0 or m ==0) return 0;
   if (A[n] == B[m]) return 1+MCS(A, n-1, B, m-1);
   return max(MCS(A, n, B, m-1), MCS(A, n-1, B, m));

3A: Give the best big-Ω lower bound you can on worst-case running time of MCS(A, n, B, m) as a function of n and m. Explain your reasoning.

In the case when A[i] \neq B[j] for all i, j, the recursion tree for MCS(A, n, B, m) has branching factor 2
at every node at depth min(n, m) or less. Thus, the recursion tree contains at least 2min(n,m) nodes.
Thus, the worst-case running time is at least Ω(2min(n,m)).

3B: Give the best big-O upper bound you can on worst-case running time of MCS(A, n, B, m) as a function of n and m. Explain your reasoning.

The recursion tree for MCS(A, n, B, m) has branching factor at most 2 at every node,
and has depth at most n+m. Thus, the recursion tree contains at most 2n+m nodes.
Since O(1) work is done for each node in the recursion tree,
the worst-case running time is at most O(2n+m).


4A: Precisely describe a faster algorithm (running in time O(n m)) for computing MCS(A, n, B, m).

Modify the recursive algorithm above to cache answers.
Alternatively, use dynamic programming to compute MCS(A, i, B, j) for 1 ≤ i ≤ n and 1 ≤ j ≤ m "bottom up".

4B: Explain why your algorithm is correct.

The correctness of the algorithm follows from the recurrence relation proved in problem 2.

4C: Give the best big-O upper bound you can on the worst-case running time of your algorithm, in terms of n and m. Explain your reasoning.

O(n m), because there are at most n*m distinct subproblems, and each subproblem requires O(1) work.


5. Implement your algorithm and use it to find the maximum size of any subsequence shared by the following two sequences:

int A[100] = {48, 29, 25, 7, 21, 32, 32, 13, 38, 16, 13, 29, 8, 28, 0, 21, 11, 27, 17, 44, 28, 10, 49, 23, 20, 33, 35, 40, 4, 15, 40, 34, 23, 40, 3, 39, 26, 45, 16, 23, 22, 39, 25, 32, 2, 34, 3, 46, 16, 19, 4, 25, 36, 14, 37, 30, 34, 49, 5, 9, 32, 19, 19, 6, 33, 9, 28, 32, 1, 29, 41, 42, 11, 12, 31, 13, 33, 5, 31, 6, 35, 10, 27, 36, 45, 48, 38, 5, 27, 21, 34, 23, 11, 20, 22, 25, 11, 44, 3, 32};

int B[100] = {33, 31, 9, 41, 49, 35, 12, 3, 43, 2, 47, 43, 11, 29, 11, 24, 4, 15, 28, 48, 3, 28, 9, 20, 10, 0, 1, 26, 35, 37, 48, 26, 32, 8, 14, 48, 9, 45, 16, 27, 13, 21, 6, 28, 36, 1, 16, 4, 41, 33, 49, 36, 20, 44, 46, 26, 36, 42, 22, 29, 29, 24, 30, 3, 20, 42, 3, 36, 14, 1, 44, 26, 35, 9, 47, 32, 43, 47, 29, 45, 36, 20, 0, 48, 10, 18, 40, 20, 41, 42, 11, 5, 30, 32, 46, 20, 38, 9, 19, 24};

You may use C++, Python, or Perl to implement your algorithm. Include a print-out of your algorithm and the length of the shared subsequence that it finds.

The answer I get is 21.

Here is my code:

#include <iostream.h>

#define N 100

int A[N] = {48, 29, 25, 7, 21, 32, 32, 13, 38, 16, 13, 29, 8, 28, 0, 21, 11, 27, 17, 44, 28, 10, 49, 23, 20, 33, 35, 40, 4, 15, 40, 34, 23, 40, 3, 39, 26, 45, 16, 23, 22, 39, 25, 32,  2, 34, 3, 46, 16, 19, 4, 25, 36, 14, 37, 30, 34, 49, 5, 9, 32, 19, 19, 6, 33, 9, 28, 32, 1, 29, 41, 42, 11, 12, 31, 13, 33, 5, 31, 6, 35, 10, 27, 36, 45, 48, 38, 5, 27, 21, 34, 23,  11, 20, 22, 25, 11, 44, 3, 32};

int B[N] = {33, 31, 9, 41, 49, 35, 12, 3, 43, 2, 47, 43, 11, 29, 11, 24, 4, 15, 28, 48, 3, 28, 9, 20, 10, 0, 1, 26, 35, 37, 48, 26, 32, 8, 14, 48, 9, 45, 16, 27, 13, 21, 6, 28, 36, 1 , 16, 4, 41, 33, 49, 36, 20, 44, 46, 26, 36, 42, 22, 29, 29, 24, 30, 3, 20, 42, 3, 36, 14, 1, 44, 26, 35, 9, 47, 32, 43, 47, 29, 45, 36, 20, 0, 48, 10, 18, 40, 20, 41, 42, 11, 5, 30,  32, 46, 20, 38, 9, 19, 24};

int max(int i, int j) {
  return i > j ? i : j;
}

main() {
  int MCS[N+1][N+1];

  int best = 0;
  int besti = 0, bestj = 0;

  for (int i = 0;  i <= N;  ++i) {
    MCS[0][i] = MCS[i][0] = 0;
  }

  for (int i = 1;  i <= N;  ++i)
    for (int j = 1;  j <= N;  ++j)
      if (A[i-1] == B[j-1]) {
        MCS[i][j] = 1 + MCS[i-1][j-1];

        if (MCS[i][j] > best) {
          best = MCS[i][j];
          besti = i;
          bestj = j;
        }
      }  
      else
=>      MCS[i][j] = max(MCS[i-1][j], MCS[i][j-1]);

  cout << best << endl;
  cout << "ending at A[" << besti-1 << "] = " << A[besti-1] << ", B[" << bestj << "] = " << B[bestj-1] << endl;
}


ClassS04CS141 | ClassS04CS141 | recent changes | Preferences
This page is read-only | View other revisions | View current revision
Edited May 19, 2004 10:33 am by Neal (diff)
Search: