Definitions
Graph:
-
Sparse graph:
-
Dense graph:
-
Directed graph: edges are ordered pairs
, so one way. -
Undirected graph: edges are a set of vertices,
Weighted graphs: edges have a weight
Path: a sequence of vertices where:
- Each vertex is connected to each other by a edge
- Each edge is not used more than once
Length of path: number of edges.
Cycle: path where the source and destination vertex is the same:
- Cyclic if graph contains at least one cycle
- Acyclic if not
- If it is also directed, it is called a directed acyclic graph
- Acyclic if not
- Loop: cycle of length one
Empty graph:
Subgraphs
Component: a non-empty subgraph where every pair of vertices connected by a path. If there is a path between two vertices, they are in the same component.
Tree: graph with a root vertex such that there is only one path between the vertex and root.
Directed tree: tree where all edges point either towards or away from the root.
Forest: graph where every component is a tree
Textual Representation of Graphs
Header:
DorUfor directed/undirectedn: number of vertices (which are numbered from 0)W: appears if it is a weighted graph
All other lines after the header define a edge: V1 V2 or V1 V2 Weight
Data Structures
Adjacency Matrix:
-
binary matrix , where if and are connected to each other via a edge and 0 otherwise - If undirected, the matrix is symmetric along the main diagonal
- If the graph is weighted, the value is the weight, or null if no edge
Adjacency List:
- A list of length
, each element being a list for every vertex - Each sub-list contains the vertices it is connected to
○ If directed,
is in the list for if there is a edge ○ If the graph is weighted, each item in the list is of the form
Adjacency matrices are faster (
Forests:
- Each vertex has zero or one parents
- Store as an array: if vertex
has parent ,
Graph Traversal
Each vertex can be in one of three states:
- Undiscovered
- Discovered
- Processed
Predecessor Tree:
- Root: starting vertex
- For
in the list of vertices: - If
or , where is undiscovered: - Add
to the tree: is the child of (even if the graph is undirected) - Set
. Note that this is different from the forest map as a child can have multiple parents, but will only ever specify one parent: whatever was first
- Set
- Add
- If
BFS
Two procedures: BFS-Tree initializes the data structure and calls BFS-Loop:
def BFSTree(adj, index):
# Where:
# - adj is the adjacency matrix
# - index is the index of the root
n = len(vertices)
state = [UNDISCOVERED * n]
state[index] = DISCOVERED
parent = [NULL * n]
Q = Queue()
Q.add(index)
return BFSLoop(adj, Q, state, parent)
def BFSLoop(adj, Q, state, parent):
while Q is not empty:
u = Q.remove()
for v in adj[u]:
# For each vertex u is connected to
if state[v] == UNDISCOVERED:
state[v] = DISCOVERED
parent[v] = u
Q.add(v)
state[u] = PROCESSED
return parent
Analysis
Size of
- Time complexity of
BFSTree: - Time complexity of
BFSLoop: proportional to number of times the loop is run, so maximum of number of edges in the graph:
So complexity is
DFS
def DFSLoop(adj, index, state, parent):
for v in adj[index]:
# For each vertex the given index is connected to
if state[v] == UNDISCOVERED:
state[v] = DISCOVERED
parent[v] = index
DFSLoop(adj, v, state, parent)
state[index] = PROCESSED
return parent
Properties of DFS and BFS
- Works with directed and undirected, although weights are ignored
- Only creates a single tree
- DFS can be done by replacing BFS queue with stack, and replacing enqueue and dequeue with push and pop
- BFS: vertices discovered in order of increasing distance from start
- DFS: vertex only marked as processed once all vertices reachable from it are discovered
Shortest Path
Use BFS to produce parent/predecessor tree, working backwards from the end until you get to the start.
def treePath(parentTree, start, end):
if start == end:
return [start]
path = treePath(parentTree, start, parentTree[end])
path.append(end)
return path
Connected Components
When BFS or DFS is run, all vertices reachable from the start node are processed: that is, it finds a subgraph.
Thus, to find all components, run BFS or DFS on each vertex that has not yet been discovered. To make sorting the vertices into components easier, there can be an array that stores the component number; on what run of the DFS/BFS the vertex was first discovered.
Strong Connectivity
A directed graph is strongly connected If there is a path between every ordered pair of vertices.
This can be determined inefficiently by running BFS and check if all vertices are processed for all possible starting vertices.
Transpose operation,
- Adjacency matrix: mirror along the main diagonal.
- Adjacency list:
Hub: vertex
If there is such a vertex, then the graph must be strongly connected as any vertex can go through
- Run a traversal starting with
- Return false if any vertex is undiscovered
- Construct the transpose of the graph
- Run a traversal on the transpose starting at
- Return false if any vertex is undiscovered
Thus, time complexity is the same as graph traversal:
Topological Ordering (DAGs only)
Ordering of vertices such that if
All DAGs have at least one such ordering.
Programs such as make will use it to find a valid order to compile rules; package installers will use it to find the correct installer to install dependencies.
- Initialize empty stack
- For each vertex:
- If undiscovered, start modified DFS: when vertex is processed, push it to the stack
- The vertex at the end will be pushed to the stack first
- This ordering will continue until it gets to the start vertex
- When the DFS is run again with another start vertex, if it is ‘before’ the previous start vertex, vertices between the two will get added to the stack
- If undiscovered, start modified DFS: when vertex is processed, push it to the stack
- Return the stack from top to bottom
Cycle Detection
Directed graphs: run DFS: if a vertex that has already been discovered is reached, there is a cycle. Run this with each possible starting vertex, or until a loop has been discovered.
Run DFS: if, while on vertex
If the graph has multiple components, the DFS must be run multiple times, using a similar algorithm to when finding connected components.
Weighted Algorithms
Minimum Spanning Tree
Tree that includes all vertices of the graph and has the lowest total weight among all spanning trees. The graph must be undirected.
Prim’s Algorithm
- Create a list: the parent of each vertex
- Create a list: if a vertex is in the tree
- Create an list containing the closest distance of a vertex to the tree, each with a starting value of infinity
- While there are vertices remaining:
- Find the vertex v that is closest to the tree
- For the first loop, pick vertex 0 (or anything, really)
- Add v to the list of vertices in the tree
- For all vertices still not in the tree:
- If the distance between the vertex and v is smaller than the distance stored in the list
- Update the distance, and set the parent of the vertex to v
- If the distance between the vertex and v is smaller than the distance stored in the list
- Find the vertex v that is closest to the tree
- Return the parent and distance
The algorithm is
def prim(adj):
# Prim's algorithm: minimum spanning tree
in_tree = [False] * len(adj)
distance = [float('inf')] * len(adj) # Stores distance from tree
parent = [None] * len(adj)
distance[0] = 0
while False in in_tree:
u = next_vertex(in_tree, distance)
# Pick closest vertex not in tree. If distance is min heap with edit
# support, then function is log(n) and algorithm is nlog(n)
in_tree[u] = True
for (vertex, weight) in adj[u]:
if not in_tree[vertex] and weight < distance[vertex]:
distance[vertex] = weight
parent [vertex] = u
return (parent, distance)
Shortest Path Trees
The shortest path tree rooted at vertex
Dijkstra’s Algorithm
NB: this will not always produce correct results on graphs with negative weights.
def dijkstra(adj, s):
# adj = adjacency list
# s = index for start vertex
in_tree = [False] * len(adj)
distance = [float('inf')] * len(adj)
parent = [False] * len(adj)
distance[s] = 0
while False in in_tree:
u = next_vertex(in_tree, distance)
# Pick the closest vertex not yet in the tree. Efficiency can improve
# if priority tree (min heap) is used
in_tree[u] = True
for (vertex, weight) in adj[u]:
if not in_tree[vertex] and distance[u] + weight < distance[vertex]:
# If the distance to the vertex using `u` is closer than it was
# when `u` was not in the tree, update the tree
distance[vertex] = distance[u] + weight
parent [vertex] = u
return (parent, distance)
All-Pairs Shortest Path
Dijkstra gives the shortest path from a given source vertex to every vertex in graph.
All-pairs gives the shortest path between every pair of vertices in graph. This can be done running Dijkstra for each vertex:
Floyd-Warshall Algorithm
DP algorithm to all-pairs. Solutions are limited to graphs without negative cycles: the sum of weights along a cycle is not negative.
Intermediate vertices: vertices between source and destination vertices.
Properties of Shortest Paths
Simple: no vertex repeated along the path (assuming the condition that there are no negative cycles).
Any sub-path of a shortest path is also a shortest path. If this was false, that sub-path could be used to make a shorter shortest path.
The Algorithm
Use recursion on
- If
is an intermediate vertex on the shortest path: - It is equivalent to joining the shortest paths from
to , and from to
- It is equivalent to joining the shortest paths from
- That is, adding the lengths of
and -
is not an intermediate vertex: the same as
There are three base cases:
-
- When
and
- When
-
- When
and - No intermediate edges, and no edges connecting
and
- When
-
- No intermediate edges, and a edge connecting
and : set the distance to the weight of that edge
- No intermediate edges, and a edge connecting
Top-Down
All three parameters can take on
Bottom-up
Uses table of size
Thus, the space complexity is
Run the algorithm with
Backtracking
Technique to generate all or some solutions to a problem incrementally. Generate one possible solution, and if it isn’t correct, then use it to generate children, and repeat.
def dfs_backtrack(candidate, input, output):
if is_solution(candidate, input):
add_to_output(candidate, output)
else:
for child in children(candidate, input):
dfs_backtrack(child, input, output)
candidate: vertex of the treeinput: information about instance of the problem i.e. desired output (e.g. size of the solution)output: list of valid solutionsis_solution: returns true if it is a leaf node: valid solutions won’t have valid children, so there is no need to go further down the treechildren: returns list of possible candidates
In some cases such as search, only one solution may be required.
Pruning
In some cases, it may be known that it is impossible for any children (and their children etc.) of a node to produce a solution, meaning making further calls is unnecessary.
Check if pruning is required at the start of the DFS/BFS backtrack function.