Memoisation
Fibonacci can be defined as a recurrence relation:
One solution to this is memoisation: the caching of results.
General Purpose Memoiser
def memoise(f):
known = {}
def new_func(*params):
if param not in known:
known[param] = f(*params)
return known[params]
return new_func
@memoise
def function_definition():
# Or use the built in tools:
from func_tools import lru_cache
# LRU: least recently used; gets rid of oldest result once number of results stored reaches `maxsize`
@lru_cache(maxsize = None)
def function_definition():
// ...
Notes:
- A nested function has access to variables within the outer function’s scope
*paramin the function definition returns a tuple,*parampasses the elements of the tuple individually as arguments@funcacts as a decorator that takes the function as an argument and modifies it
Fibonacci, Iteratively
Bottom-up iteration:
def fibonacci_i(n):
a, b = 0, 1
for i in range(n):
a, b = b, a + b
return a
This approach is
Top-down Dynamic Programming
Recursion and memoisation.
Bottom-up Dynamic Programming
- Store a table of solutions, solving the simplest sub-problems first
- Fill out the table by combining sub-problems until target problem is reached
Example: Minimum Cost Path
Rectangular grid of cells, each with a weight. You can move to any adjacent cell (including diagonals) from each cell. Find the minimum cost needed to get from top to bottom.
Recurrence formula
Hence in the normal case, the cost of the cell is the cell weight plus minimum cost to get to cell above it (left, directly above, or right).
Using a top-down approach:
- Infinite cost if outside grid
- Top row: just get the weight of the cell
- Otherwise
- Check cache, return if found
- Find the minimum cost of the cells above/neighboring it
- Add the weight of the cell
- Save it to cache and return
- Find the cost for each cell in the bottom row, and find the minimum from there
Using a bottom-up approach::
- Create a table of all cells
- Get weights for top row
- For all cells in second row, find the minimum cost to get to that cell and add the cost of that cell
- Repeat for all rows
- Find the minimum-cost path by backtracking
- This is easier if you store the column of the cell you came from while generating the table
If the grid is square, it is
This problem can also be found using Dijkstra’s shortest path by converting the problem to a multi-stage graph (with weights moved to edges). However, this is harder to implement and less efficient.
Suitably of DP
DP approaches are likely to work when the problem:
- Is combinatorial: brute force solution of
either or - Involves only integers or slices of fixed strings with well-define bounds: a finite search domain
- Is looking for the optimal solution
Then build the solution from solutions to sub-problems:
-
Requires optimal substructure
-
Expect overlapping subproblems
- Otherwise, the cache is useless
- NB: cache will often require the path to the result
Optimal Substructure
When solutions to subproblems can be reused (multiple times!) to find the optimal solution to the problem.
For example, in the shortest path problem, the shortest path for a sub-problem will be inside the shortest path for the larger problem.
The longest path problem does not have an optimal substructure.
Properties of Recursive Solutions
- Want to make a cache of all the calls you make
- How many unique function calls is it possible to make? How big could the cache be?
- For
, the cache would be - Hence, this can become very large
- For
Coin Changing Problem
A greedy approach does not produce optimal results. Hence, DP must be used.
Top-Down
Given coinage = [1, 10, 25] and amount = 80, the solution will always be minimum of:
1 + change(80 - 25, coinage, 3): the solution given all three coins are available- OR
change(80, coinage, 2): the solution given only the first two coins are available
Either amount or coinage decreases to decrease the size of the problem.
An alternative approach would be finding the minimum of:
1 + change(80 - 1, coinage)1 + change(80 - 10, coinage)1 + change(80 - 25, coinage)
That is 1 + min(coinage_calculate(amount - coin, coinage) for coin in coinage)
Without memoisation, there are a maximum of len(coinage) sub-calls per call. With memoisation, the graph collapses to amount + 1 calls:
- The number of coins is the same for every call (given the second approach). Hence, the only varying argument is the amount.
- This would be inefficient using LRU cache as it would save both arguments for each call. Hence:
- Pass home-grown cache as an argument or
- Define an inner function and use LRU cache on that instead
0/1 Knapsack
No fractional amounts. For each item you have two options:
- Pick up the item: get a smaller knapsack that can carry
and have a current worth of - Don’t pick up the item: get a knapsack that can carry
and have a current worth of
Repeat for each item, picking the option that gives the greater worth.
Edge cases:
- No more items
- Item can’t fit
Cache using index of item (
Top-Down
The number of possible calls is
If using LRU cache, run the computation in in a nested function using LRU cache, and store the array of items outside the scope of the function.
Bottom-Up
Generate table of knapsack weight (column) and index (row), filling table row-wise. Go backwards to reconstruct solution.
1 indexing is also useful in this case, in which an extra row and column (0) will be needed.
For each cell, you have two options:
- Don’t pick up: go to the value of cell directly above
- Pick up: value of item plus value of the cell one row above and
to the left
Take the maximum of these two options.
This has a time and space complexity of
NB: this is pseudo-polynomial time - it is polynomial with the size of the knapsack, not the number of items.
Assuming the path is required, use backtracking to reconstruct the solution: start at the end, and If the value of the item directly above is the same, it means the optimal solution came from not picking up the item. Else, go up one and left by the weight of the item. This has a complexity of
Subset Sum Problem
Given a set of integers, find a subset that sums to zero.
Generalize this to finding a subset that sums to some value
def subset_sum_exists(numbers, n, required_sum)
# returns true iff subset within first n elements that sum to required_sum
Longest Common Subsequence
Given two strings
Brute forcing this is not possible - there are
Create a function,
Bottom-Up
Create a table of 1-origin subscripts for all possible values of
- If
, add 1 to the value of the cell one row up and one column to the left - Else, get the max of either the cell to the left or the cell above
Edit Distance
The diff algorithm. Transform source string
When a transformation is not needed, it is called the alignment operation, and has zero cost.
Work from end of string to start.
- Copy/Alignment: If
, - Otherwise, use the minimum of:
- Deletion:
- Insertion:
- Substitution:
- Prefer substitution over deletion/insertion $$ D_{i, j} = \begin{cases} 0 & i = j = 0 \ i & j = 0 \ j & i = 0 \ D_{i-1, j-1} & a_i = b_j \ 1 + \min(D_{i-1, j}, D_{i, j-1}, D_{i-1, j-1}) & \text{otherwise} \end{cases} $$
- Deletion:
Longest Increasing Subsequence
One solution is to sort, remove duplicates, and then run LCS on the two sequences. However, a more efficient solution is possible.
Let
Recurrence relation: find the maximum of LIS for all solutions before the current index, provided the last (and so biggest) element is smaller than the current element.
This can be further optimized into an