Q1: 10 points Q2: 10 points Q3: 10 points Q4: 20 points Q5: 15 points Q6: 15 points Q7: 20 points Total: 100 points
FALSE. It is quite possible for the single highest-reward job to overlap two or more other jobs that could otherwise be scheduled together for more total reward. A simple example is (start 0, finish 2, reward 2), (start 1, finish 4, reward 3), (start 3, finish 5, reward 2). This greedy algorithm schedules only the middle job and scores 3, while you could schedule the other two jobs and get 4. The counterexample also shows that this greedy algorithm can fail even in the special case of Question 2, since rewards are proportional to length.
FALSE. The first job could overlap with a long, valuable job. A simple example has only two jobs: (start 0, finish 2, reward 2) and (start 1, finish 10, reward 9). The greedy algorithm takes the first job for 2 while we could schedule the second job alone and get 9.
TRUE. The Huffman algorithm repeatedly combines the two lowest-frequency letters in the alphabet into a single letter, adding the two frequencies to get a new frequency. As long as there are three or more letters in the alphabet, our high-frequency letter must be the most frequent and thus cannot be one of the two lowest. So it is not combined until there are only two letters left, which means that it definitely goes on level one of the tree (the first level below the root) and gets a one-bit code.
The Cut Property says that this edge e must be contained in any
minimum spanning tree for G. (There could easily be multiple MST's, for
example if there are multiple equally costly ways to connect up X and V-X.)
Let u be the endpoint of e in X and let v be the endpoint of e in V-X.
Let T be any spanning tree that does not contain e. The graph T ∪
{e} has exactly one cycle, formed by e and the unique path from u to v that
exists in T. This path in T must include at least one edge e' = (u', v')
that has one
endpoint in X and one in V-X. Let U = T - {e'} ∪ {e}. U has a smaller
total weight than T because e' has weight larger than e (by the assumption on
e). U is a spanning tree -- it has n-1 edges and we can show that it is
connected. Any vertex in G has a path to either u' or v' in T - {e'}, and the
part of the cycle (of T ∪ {e}) without e' forms a path in T ∪ {e} -
{e'} from u' to v'. So there is a path in U from any vertex to any other
vertex.
Some students didn't remember what the Cut Property was -- it did form the
major part of one lecture so I think it was fair game. I tried to give you
enough of the setting so that you could remember the property if you were
familiar with any of the MST arguments in the book.
Let the depth of U be d and let w by a vertex on level d of U. We know that
the BFS tree from v indicates the shortest-path distance from v to every node
(counting each edge as distance 1). Thus there is no path in G of length less
than d from v to w. If the depth of T were less than d, there would be a path
in G of length less than d from v to w, given by the path in T. This is
impossible, so T cannot have depth less than d.
Many people correctly noted (from the homework) that if G is itself a tree,
than T and U are the same tree and thus have the same depth. They then went
on to the case where G has a cycle, as in another homework problem. If there
is a cycle, and x is the first node in the cycle encountered (by both searches),
then one of the neighbors of x in the cycle will have depth one greater than
that of x in U but a larger depth in T. However, this does not in itself show
anything about the depth of the trees T and U, because this depends on
the deepest node in each tree and these neighbors of x might not be those
nodes.
This problem turned out to be harder for you than I intended -- the right way
to go about it was a brute-force algorithm, which we have not
emphasized in the class so far. Neither BFS nor DFS is very helpful. Many
people gave incorrect algorithms, that might find an independent set but
could easily fail to find one that was there. I gave significant partial
credit for these if there was a correct timing analysis. (I was
disappointed with the large number of people who made no attempt to time their
algorithm.)
The simple brute-force algorithm I had in mind was to consider each of the
(n choose k) subsets of V with size k, and check each one to see whether it
is an independent set. The latter test involves looking at the (k choose 2)
possible pairs of nodes in the set, and seeing whether any of them has an edge.
This takes O(k2) time, and since (n choose 2) =
Θ(nk), the total time is Θ(nkk2).
Some people found a
a slightly better algorithm, placing nodes on a stack as they were found to
form an independent set with the nodes already on the stack. This algorithm
spends at most O(n) time trying to increase the stack from size 0 to 1, at most
O(2n2) trying to increase
it from 1 to 2, and so on, until the dominant
term has O(knk) time trying to increase it from k-1 to k. (To try
to add a new node, you must check the k-1 nodes on the stack to see if any has
an edge to the new node.) Since this is also O(nkk2), it
is acceptable.
Many people put the nodes in a priority queue sorted by degree and either
added low-degree nodes to the proposed independent set, eliminating nodes with
edges to the added nodes, or deleted high-degree nodes in the hope of winding
up with k nodes of degree 0. Both of these methods can fail to find an
independent set that exists -- I can show you examples. The point is that you
had no justification to say that this algorithm would always work, so no real
reason to think it was correct. You still got a fair number of points if you
described such an algorithm clearly and timed it correctly.
Consider any schedule S with a block of idle time of length x in front of some
job J. Make a new schedule S' by switching J and the block of idle time,
leaving all the other jobs in the same place. The only reward that changes is
that of J, which increases by x, so S cannot have been optimal.
If we continue making such swaps with every block of idle time that is in
front of a job, we keep increasing the reward and we eventually reach a
schedule where there is no idle time because all those blocks occur after all
those jobs. This is then a schedule T with no idle time that is better than S.
Many of you constructed this T directly from S, by sliding all jobs after each
idle time block forward by the length of the block. One thing I insisted on
if you did this was that you consider the effect on all the jobs -- in fact you
increase the net reward for every job you move forward.
As in lecture, we consider any schedule that has no idle time (using 7a)
and is not
ordered by length, and consider one of its inversions where job J is just before
job I but is longer than I. We improve this schedule by switching I and J,
leaving all the other jobs exactly where they are. We increase the reward
of I by the length of J, and decrease the reward of J by the length of I. Since
J is longer than I, this is a positive net gain.
Repeating this process leads us to a schedule that is sorted by length. Only
in such a schedule can we not improve the reward by swapping to remove an
inversion.
Note that with this reward system, the preferred finish times have no effect
on the schedule, because moving a preferred finish time of a job adds or
subtracts the same amount to the net reward of any schedule.
Last modified 9 October 2006