6. Graph Theory

Definitions

Graph: G(V,E)G(V,E), where VV is a set of vertices, EE is a set of edges, each relating two vertices together. $$ n=|V| \ m=|E| \ m \le n^2 $$

Weighted graphs: edges have a weight ω\omega.

Path: a sequence of vertices where:

Length of path: number of edges.

Cycle: path where the source and destination vertex is the same:

Empty graph: ({},{})(\{\}, \{\}).

Subgraphs

G=(V,E)G'=(V',E') where VV,EEV' \subseteq V, E' \subseteq E

Component: a non-empty subgraph where every pair of vertices connected by a path. If there is a path between two vertices, they are in the same component.

Tree: graph with a root vertex such that there is only one path between the vertex and root.

Directed tree: tree where all edges point either towards or away from the root.

Forest: graph where every component is a tree

Textual Representation of Graphs

Header:

All other lines after the header define a edge: V1 V2 or V1 V2 Weight

Data Structures

Adjacency Matrix:

Adjacency List:

Adjacency matrices are faster (Θ(1)\Theta(1) vs Θ(n)\Theta(n)) for operations such as checking for the existence of an edge, adding, and deleting edges, but uses Θ(n2)\Theta(n^2) space compared with Θ(n)\Theta(n) for adjacency lists.

Forests:

Graph Traversal

Each vertex can be in one of three states:

Predecessor Tree:

BFS

Two procedures: BFS-Tree initializes the data structure and calls BFS-Loop:

def BFSTree(adj, index):
  # Where:
  # - adj is the adjacency matrix
  # - index is the index of the root
  n = len(vertices)
  state = [UNDISCOVERED * n]
  state[index] = DISCOVERED

  parent = [NULL * n]
  Q = Queue()
  Q.add(index)
  return BFSLoop(adj, Q, state, parent)

def BFSLoop(adj, Q, state, parent):
   while Q is not empty:
    u = Q.remove()
    for v in adj[u]:
      # For each vertex u is connected to
      if state[v] == UNDISCOVERED:
        state[v] = DISCOVERED
        parent[v] = u
        Q.add(v)
    state[u] = PROCESSED
  return parent

Analysis

Size of GG: V=n|V|=n, E=m|E|=m, where mn2m \le n^2.

So complexity is O(n+m)=O(max(n,m))O(n+m)=O(\max(n,m))

DFS

def DFSLoop(adj, index, state, parent):
  for v in adj[index]:

    # For each vertex the given index is connected to
    if state[v] == UNDISCOVERED:
      state[v] = DISCOVERED
      parent[v] = index
      DFSLoop(adj, v, state, parent)
    state[index] = PROCESSED
  return parent

Properties of DFS and BFS

Shortest Path

Use BFS to produce parent/predecessor tree, working backwards from the end until you get to the start.

def treePath(parentTree, start, end):
    if start == end:
        return [start]
    path = treePath(parentTree, start, parentTree[end])
    path.append(end)
    return path

Connected Components

When BFS or DFS is run, all vertices reachable from the start node are processed: that is, it finds a subgraph.

Thus, to find all components, run BFS or DFS on each vertex that has not yet been discovered. To make sorting the vertices into components easier, there can be an array that stores the component number; on what run of the DFS/BFS the vertex was first discovered.

Strong Connectivity

A directed graph is strongly connected If there is a path between every ordered pair of vertices.

This can be determined inefficiently by running BFS and check if all vertices are processed for all possible starting vertices.

Transpose operation, GTG^T: for all edges, reverse the order:

Hub: vertex hh is a hub if hh is reachable from every vertex AND hh can visit every vertex.

If there is such a vertex, then the graph must be strongly connected as any vertex can go through hh to get to any other vertex. Thus, if there is a hub, every vertex is a hub. To determine this:

  1. Run a traversal starting with hh
  2. Return false if any vertex is undiscovered
  3. Construct the transpose of the graph
  4. Run a traversal on the transpose starting at hh
  5. Return false if any vertex is undiscovered

Thus, time complexity is the same as graph traversal: O(n+m)O(n+m)

Topological Ordering (DAGs only)

Ordering of vertices such that if (u,v)(u,v) exists, uu is ordered before vv.

All DAGs have at least one such ordering.

Programs such as make will use it to find a valid order to compile rules; package installers will use it to find the correct installer to install dependencies.

  1. Initialize empty stack
  2. For each vertex:
    • If undiscovered, start modified DFS: when vertex is processed, push it to the stack
      • The vertex at the end will be pushed to the stack first
      • This ordering will continue until it gets to the start vertex
      • When the DFS is run again with another start vertex, if it is ‘before’ the previous start vertex, vertices between the two will get added to the stack
  3. Return the stack from top to bottom

Cycle Detection

Directed graphs: run DFS: if a vertex that has already been discovered is reached, there is a cycle. Run this with each possible starting vertex, or until a loop has been discovered.

Run DFS: if, while on vertex uu, if uu goes to vv and vv has already been discovered, there is a loop. If the graph is undirected, ignore the vertex (u,parent[u])(u,parent[u]).

If the graph has multiple components, the DFS must be run multiple times, using a similar algorithm to when finding connected components.

Weighted Algorithms

Minimum Spanning Tree

Tree that includes all vertices of the graph and has the lowest total weight among all spanning trees. The graph must be undirected.

Prim’s Algorithm

The algorithm is O(n2)O(n^2), but if a heap-based priority queue used for the distance array, the time complexity of finding the next vertex is O(logn)O(\log{n}). Hence, the time complexity of the overall algorithm to O(m+nlogn).O(m+n\log{n}).

def prim(adj):
# Prim's algorithm: minimum spanning tree
  in_tree  = [False]        * len(adj)
  distance = [float('inf')] * len(adj) # Stores distance from tree
  parent   = [None]         * len(adj)
  distance[0] = 0

  while False in in_tree:
    u = next_vertex(in_tree, distance)
    # Pick closest vertex not in tree. If distance is min heap with edit 
    # support, then function is log(n) and algorithm is nlog(n)
    in_tree[u] = True
    for (vertex, weight) in adj[u]:
      if not in_tree[vertex] and weight < distance[vertex]:
        distance[vertex] = weight
        parent  [vertex] = u
  
  return (parent, distance)

Shortest Path Trees

The shortest path tree rooted at vertex ss is the spanning tree such that distance from root to any other vertex is the smallest possible.

Dijkstra’s Algorithm

NB: this will not always produce correct results on graphs with negative weights.

def dijkstra(adj, s):
  # adj = adjacency list
  # s = index for start vertex
  in_tree = [False]         * len(adj)
  distance = [float('inf')] * len(adj)
  parent = [False]          * len(adj)
  distance[s] = 0
  while False in in_tree:
    u = next_vertex(in_tree, distance)

    # Pick the closest vertex not yet in the tree. Efficiency can improve
    # if priority tree (min heap) is used
    in_tree[u] = True
    for (vertex, weight) in adj[u]:
      if not in_tree[vertex] and distance[u] + weight < distance[vertex]:
        # If the distance to the vertex using `u` is closer than it was
        # when `u` was not in the tree, update the tree
        distance[vertex] = distance[u] + weight
        parent  [vertex] = u

  return (parent, distance)

All-Pairs Shortest Path

Dijkstra gives the shortest path from a given source vertex to every vertex in graph.

All-pairs gives the shortest path between every pair of vertices in graph. This can be done running Dijkstra for each vertex: O(n3)O(n^3). However, it is greedy so will not work with graphs with negative edges.

Floyd-Warshall Algorithm

DP algorithm to all-pairs. Solutions are limited to graphs without negative cycles: the sum of weights along a cycle is not negative.

Intermediate vertices: vertices between source and destination vertices. p=(vs,v1,v2,,vl,vd)p = (v_s,v_1,v_2, \dots,v_l,v_d). Path has ll intermediate vertices.

Properties of Shortest Paths

Simple: no vertex repeated along the path (assuming the condition that there are no negative cycles).

Any sub-path of a shortest path is also a shortest path. If this was false, that sub-path could be used to make a shorter shortest path.

The Algorithm

nn vertices, 0n10\dots n-1, with d(i,j,k)d(i,j,k) being the shortest distance from ii to jj if intermediate vertices are all in 0k0 \dots k.

Use recursion on d(i,j,k)d(i,j,k), using results from d(i,j,k1)d(i,j,k-1):

There are three base cases:

Top-Down

All three parameters can take on nn values, so the size of the cache is O(n3)O(n^3), and each individual call runs in constant time. Thus, the algorithm is O(n3)O(n^3).

Bottom-up

Uses table of size n×nn\times n, with the columns and rows being ii and jj respectively. Initially, the table contains the weights of the edge between ii and jj, infinity if it doesn’t exist, or 00 if i=ji = j.

Thus, the space complexity is Θ(n2)\Theta(n^2). This works as the value of d(i,j,k)d(i,j,k) is dependent only on the values of d(i,j,k1d(i,j,k-1 so only the k1k-1 values are needed, and the matrix can be incrementally updated.

Run the algorithm with k=0k=0, and once all cells are filled in, move on to k=1,2,k=1, 2, \dots.

Backtracking

Technique to generate all or some solutions to a problem incrementally. Generate one possible solution, and if it isn’t correct, then use it to generate children, and repeat.

def dfs_backtrack(candidate, input, output):
  if is_solution(candidate, input):
    add_to_output(candidate, output)
  else:
    for child in children(candidate, input):
      dfs_backtrack(child, input, output)

In some cases such as search, only one solution may be required.

Pruning

In some cases, it may be known that it is impossible for any children (and their children etc.) of a node to produce a solution, meaning making further calls is unnecessary.

Check if pruning is required at the start of the DFS/BFS backtrack function.