Romanian Informatics Olympiad - Modified Huffman Encoding

Source: (Romanian Informatics Olympiad ONI'03, extended team selection)

File:Huffman tree 2.svg

( ^ Representative diagram of Huffman Encoding. No relevance in the problem)

A telegraph machine can transmit only lines and dots; it takes 2 seconds to transmit a line, but only 1 second to transmit a dot. We generally want to transmit texts containing letters of the English alphabet, and digits (so we have N<=36 symbols in total). Therefore, a prefix-free encoding using lines and dots is needed. Given the frequencies of the N symbols in a large text, find the minimum time it takes to transmit the text using a suitable encoding. The solution should run in O(N^4) time, and use O(N^3) space.


  1. This one has been untouched almost three weeks, so here's the solution in the author's words:

    "This is a special case of Huffman coding with unequal letter costs (a polynomial solution is not known for the general case). We need to construct a modified Huffman tree, in which the right child of a node is considered at depth 2 below the node itself. If we take the level traversal of the leaves in this tree, the symbols must appear in decreasing order of frequency (by a greedy exchange argument). So we must only determine the topology of the tree. This can be done by conceptually building the tree from root to leaves using a dynamic programming algorithm; the state is given by the number of internal nodes one and two levels above, and the total number of leaves on upper levels. Note that we can construct the algorithm so that it doesn't need to include the level in the state (this is needed to get the required complexity). Constant factors are significantly improved if we implement this using a memorized recursive call (it turns out only about 10% of the array needs to be computed)."

    RIP Mihai.


Post a Comment