4. Greedy Algorithms

Greedy algorithms:

Examples

Minimum Spanning Tree

These can be proved using proof by counter-example.

Coin Changing Problem

Coins of denominations C={c1,,cn}C = \{c_1, \dots, c_n\}. Find the minimum number of coins such that the total value is VV. Assume:

Greedy strategy:

Counter example: C={1,10,25}C=\{1, 10, 25\}, V=30V=30. The optimal solution is 3103 \cdot 10, but the greed algorithm gives 125+511\cdot 25 + 5\cdot 1.

What is a sufficient property such that the greedy algorithm will always be optimal? Every coin is a multiple of the one below it.

Example: Interval Scheduling

There is a set of jobs with a start (sjs_j) and end (fjf_j) times. Find the subset that contains the most jobs.

You could:

To prove a greedy algorithm right, assume its wrong and find contradiction. To prove it wrong, simply find a counter-example.

Fractional Knapsack Problem

There is a set SS of nn items. Each item has a positive value bib_i and weight wiw_i. Find the subset which yields the maximum value without exceeding given weight WW.

You can select fraction of item xix_i; the weight and values are scaled proportionally.

To solve this with a greedy algorithm, sort it by the benefit/weight ratio (bi/wib_i/w_i), filling up the knapsack with the item with the greatest ratio that is still available until target weight is reached.

0/1 Knapsack Problem

Can only take the whole item or not take it at all. Requires dynamic programming.

Huffman Coding

Variable vs Fixed length

Fixed length codes are convenient for indexing, counting (e.g. ASCII: 7-bit binary string per character, one character per byte).

Variable length codes may be more efficient (e.g. Morse code: frequent letters have short encodings; UTF-8: most common characters 1 byte long).

Variable-Length Coding

Let a=0a=0, b=1b=1, c=01c=01 etc.

Problem: there is no spacing, so it is ambiguous where a character starts and ends.

Solution: ensure no binary string in the code is a prefix of any other binary string.

For aa to ff, one possible solution is a=000a=000, b=0010b=0010, c=0011c=0011, d=01d=01, e=10e=10, f=11f=11.

This can be represented as a tree with two children per node and characters being found in leaf nodes:

            .
           /  \
        0 /    \ 1
         .       .
        /\       /\
     0 /  \ 1 0 /  \ 1
      .    d   e    f
     /\
  0 /  \ 1
   a    .
        /\
     0 /  \ 1
      b    c

Encoding

The most frequent characters should have the shortest encoding. However, this approach will fail.

Instead, the key is that the two least frequent characters should be leaves at the bottom of the tree. Hence, to create a Huffman tree:

Popping and insertion into a min-heap are O(logn)O(\log{n}) operations and nn iterations are required. Hence, the time complexity for encoding a Huffman tree is O(nlogn)O(n \log{n})

Disadvantages

The encoding table must be sent along with the encoded document. Hence, the overhead may be large for small documents.

Huffman encoding is optimal for global character frequencies. However:

Lempel-Ziv usually gives better compression with substring recognition. The deflate algorithm combines the LZ and Huffman algorithms.