Question text is in black, solutions in blue.
Q1: 10 points Q2: 10 points Q3: 20 points Q4: 20 points Q5: 20 points Q6: 20 points Total: 100 points
Split: Given a list of m items it returns two
lists, one with the smallest floor(m/2) (Java "m/2") items in the input (in
some unknown order) and the other list with the rest.
If Split takes O(n) time on a list of size m, then we can create
a list containing only our desired item in O(n) time, no matter what k is.
TRUE. Once we split the list in two, we know which half contains the desired element. If k < m/2 it is the element of the lower half with k elements less than it, and if k ≥ m/2 it is the element of the upper half with k - m/2 elements less than it. So we get a recurrence of T(n) = T(n/2) + O(n), which solves to T(n) = O(n) because substitution gives us T(n) ≤ cn + cn/2 + cn/4 + ... ≤ 2cn. Some of you said the the O(log n) splits took O(n) each rather than O(m), this leads to a correct but not optimal bound of O(n log n). Getting a bound of O(n log n) does not prove that a bound of O(n) is false!
FALSE. The statement would be true if it said "within mΔ" rather than
"within Δ", but it's not enough to observe this -- we have to find an
example where the stated claim fails. The point is that there may be more than
one edge of the residual graph showing the potential of an extra flow less than
Δ, and it might be possible to add all these flows. A simple
counterexample to the claim is a network with four nodes s, a, b, and t, edges
(s,a), (s,b), (a,t), and (b,t) each of capacity 2, a zero flow for f, and
Δ = 3. The residual graph has four edges of capacity 2 and so
Gf
Write and solve a recurrence to determine the exact (not big-O)
length of fence required to capture the lion with algorithm R(L). Do not count
the fence enclosing the entire region at the beginning. (Hint: It may help to
first solve the problem for L = ε, L = 2ε, and L = 4ε.)
The intended recurrence for the amount of fence is F(L) = 3L/2 + F(L/2), with
a base case of F(ε) = 0 (since no more fence is needed in the case where
L = ε). From this we can derive F(2ε) = 3ε,
F(4ε) = 9ε, and F(8ε) = 21ε. If we are lucky or
clever we may notice the pattern that F(L) = 3L - 3ε, and prove that
this formula is true because it satisfies the base case and the inductive case
of the recurrence: F(L) = 3L/2 + F(L/2) = 3L/2 + (3L/2 = 3ε) by the IH,
which is 3L - 3ε as desired.
What if we aren't lucky or clever enough to see the pattern? We can derive:
Describe an algorithm that will determine the optimal set of words.
Determine the big-O running time of your algorithm as a function of n and k,
and verify that it is polynomial in them.
I intended this problem to be solved similarly to Weighted Interval Scheduling,
but I didn't notice like some of you that it is WIS, viewed correctly.
Here's the reduction -- first check for every i and j whether there is a copy of
word wi starting at letter j of z. If there is, create a job for
the schedule problem, starting at time j, ending at time j +
length(wi) + 1, and having value vi. (Why the "+1"? If
one word ends at letter k of z and another begins at letter k, we can't use
both so we want the corresponding jobs to overlap.) Now we submit this WIS
problem, with O(nm) jobs, to the WIS algorithm which solves it in time
O(mn log(mn)) by dynamic programming. It takes us O(mn2) to do all
the checks for inclusion of words in z, or O(mnq) if q is an upper bound on the
length of the words wi. Actually we can do the whole problem in
O(mn) because we don't really need the sort -- the occurrences come to us with
their end times already computed, so it's easy to sort them by end time in
O(mn) time.
If we don't notice this reduction, we can still solve the problem by dynamic
programming. We let f(j) be the optimal score from the first j letters of z.
If there is no instance of a scoring word ending at letter j, then f(j) =
f(j-1). Otherwise, for each such word wi we calculate vi
+ f(j - length(wi)), find the maximum of each such total and f(j-1),
and set f(j) to this maximum. This takes n phases, and in each phase we have
to compare up to m values to find the maximum. Thus we have O(mn) time if
we can do the string comparison operations to test for inclusion in O(1).
More realistically, these take O(n) time each or O(q) if q is an upper bound
on the size of the scoring words.
Describe an algorithm t determine this number, and determine its running time
as a function of m and n. (I intended to insist that this running time be
polynomial in m and n, but I didn't.) If you make use of a standard algorithm
from the book, you need not describe it in detail but you must indicate
how you are transforming your given input into an input suitable for that
algorithm, and what you are doing with its output.
This problem is clearly similar to Network Flow -- the problem is to make the
reduction precise. We make a network by setting s to A, setting t to B, making
directed edges in each direction for each road, and then removing the
edges into s and out of t so that s is a source and t is a sink. Each edge
gets capacity 1. We find the size of the optimal flow in this network, and
call this number k. This turns out to be our answer.
But the Network Flow problem says nothing about roads or accidents, so we
need an argument that k is the correct number. We must show that k
accidents can separate A from B, and that k-1 accidents cannot. The first is
true because there must be a cut of size k in the network, and if we kill all
k edges in that cut there cannot be a remaining path from A to B because this
would be an augmenting path in the flow network for the maximum flow. If we
have only k-1 accidents, there must still be a flow in the remaining graph.
This is because if there were not, the set of nodes reachable from A and the
set of nodes not reachable would form a cut, and since only the k-1 removed
edges could go over the cut its size would be at most k-1, whereas we know that
the minimum cut has size k.
The network flow graph has n nodes and O(m) edges (since there are fewer
than 2m directed edges). The capacity out of s in the network flow graph is
at most n-1 = O(n), since there is at most one edge of capacity 1 to each other
vertex. So there are O(n) phases of Ford-Fulkerson, each involving a BFS of
O(m) time, and the running time of the entire algorithm is O(mn).
Determine the big-O running time of your algorithm in terms of m and n --
it should be polynomial in them.
We use the Bellman-Ford algorithm to compute the quantity C(x,y,i) for each
vertex x, each vertex y, and each integer i with 0 ≤ i ≤ n, equal to the
minimum cost of any path from x to y that uses i or fewer hops. We do this
by dynamic programming in O(nm) time: For i=0 we know that
the cost is 0 if x=y and infinite otherwise. For general i we set C(x,y,i) to
the minimum of C(x,y,i-1) and the sum C(x,z,i-1) + L(z,y) for each edge (z,y)
in G. (Here L(z,y) is the cost of the edge (z,y).) This gives us the correct
value of all the C(x,y,i) in O(m) time from the values of all the C(x,y,i-1).
Now we must use this information to determine whether there is a negative
cycle. The solution is that there is a negative cycle if and only if for at
least one pair (x,y), we have C(x,y,n) < C(x,y,n-1). We must prove the "if
and only if". If there is no negative cycle, we know that any path of n or
more hops from x to y has a corresponding path with fewer hops from x to y and
an equal or lower cost. So there cannot be an n-hop path that has less cost
than all paths with n-1 or fewer hops. Conversely, if there is a
negative cycle there is no single minimum-cost
path between any two nodes on the
cycle, because we can find a path with less cost by taking the negative cycle
one more time. But if C(x,y,n-1) = C(x,y,n) for every x and every y, this means
that an examination of all the edges has not found any improvement in cost, and
this would keep happening as we looked at C(x,y,n+1), C(x,y,n+2), and so on
forever. So there would be minimum-cost paths from each x to each y,
contradicting the existence of a negative cycle.
Last modified 22 November 2006