Greedy algorithms:
- Are simple and intuitive
- Choose the locally optimal choice (for some given objective function)
- Hope it will lead to a globally optional choice
- Do not use backtracking
Examples
Minimum Spanning Tree
- Prim’s Algorithm
- Start anywhere
- Connect to the cheapest node that doesn’t create a cycle
- Kruskal’s algorithm
- Forest of trees
- Look for the cheapest node, create a new tree or connect two trees together
These can be proved using proof by counter-example.
Coin Changing Problem
Coins of denominations
- You have an unlimited amount of each coin
-
-
and hence, there is a solution for every integer solution
Greedy strategy:
- Select the largest coin such that
- Add the coin to the list and decrement
:
Counter example:
What is a sufficient property such that the greedy algorithm will always be optimal? Every coin is a multiple of the one below it.
Example: Interval Scheduling
There is a set of jobs with a start (
You could:
- Sort by start times
- Sort by job length
- Sort by finish times
- This is correct, but it is hard to prove this is optimal
To prove a greedy algorithm right, assume its wrong and find contradiction. To prove it wrong, simply find a counter-example.
Fractional Knapsack Problem
There is a set
You can select fraction of item
To solve this with a greedy algorithm, sort it by the benefit/weight ratio (
0/1 Knapsack Problem
Can only take the whole item or not take it at all. Requires dynamic programming.
Huffman Coding
Variable vs Fixed length
Fixed length codes are convenient for indexing, counting (e.g. ASCII: 7-bit binary string per character, one character per byte).
Variable length codes may be more efficient (e.g. Morse code: frequent letters have short encodings; UTF-8: most common characters 1 byte long).
Variable-Length Coding
Let
Problem: there is no spacing, so it is ambiguous where a character starts and ends.
Solution: ensure no binary string in the code is a prefix of any other binary string.
For
This can be represented as a tree with two children per node and characters being found in leaf nodes:
.
/ \
0 / \ 1
. .
/\ /\
0 / \ 1 0 / \ 1
. d e f
/\
0 / \ 1
a .
/\
0 / \ 1
b c
Encoding
The most frequent characters should have the shortest encoding. However, this approach will fail.
Instead, the key is that the two least frequent characters should be leaves at the bottom of the tree. Hence, to create a Huffman tree:
-
Create a min-heap where the value is:
- The frequency of the character for a leaf node or
- The sum of frequencies for all its children
-
While there are nodes:
- Join the two smallest nodes together
- Add the combined node back into the min-heap
Popping and insertion into a min-heap are
Disadvantages
The encoding table must be sent along with the encoded document. Hence, the overhead may be large for small documents.
Huffman encoding is optimal for global character frequencies. However:
- The whole document must be processed
- Local optimization not possible (e.g. character frequencies change half way through)
- Can’t recognize patterns; it only recognizes letters, not words
Lempel-Ziv usually gives better compression with substring recognition. The deflate algorithm combines the LZ and Huffman algorithms.