Week 11: Games - Non-cooperative Multi-agent Systems

Many problems can be modelled as games: multiple agents with (possibly competing) interests. They can be described by:

Example: papers scissors rock:

Game Trees

In perfect-information games, a game tree is a finite tree where the nodes are states and the arcs correspond to actions by the agents.

Hence:

Perfect-information, Zero-sum, Turn-based games

Properties:

These properties creates adversary.

Optimal Strategy: Min-Max Function

Designed to find the best move at each stage:

  1. Generate the whole game tree
  2. Apply the utility function to each leaf
  3. Back-up values from the leaves through branch nodes
    • A max node computes the max of its child values, and vice-versa for a min node
  4. If a max node at the root, choose the move leading to the value with the largest value (and vice-versa if root is a min node)

This is optimal if both players are playing optimally.

def min_max_decision(state, root_is_max = True):
  if root_is_max:
    return min([(min_value(child), child) for child in state.children()])[1]
  else:
    return max([(max_value(child), child) for child in state.children()])[1]

def max_value(state):
  if state.is_terminal:
    return state.calculate_utility()

  return max([(min_value(child), child) for child in state.children()])[1]

def min_value(state):
  if state.is_terminal:
    return state.calculate_utility()

  return min([(max_value(child), child) for child in state.children()])[1]

Reducing Search Space

Game tree size:

Hence, the search space needs to be reduced:

Alpha-Beta Pruning

If an option is bad compared to other available options, there is no point in expending search time to find out exactly how bad it is.

Example:

MAX           3
           /     \
          /       \
Min      3         ?
       / | \     /   \
Leaf  3  9  6   2     ?

We have computed that the value for the first child (min node) is 3 and that the first child of the second child has a value of 2. Hence, the second child has a maximum value of 2, so regardless of the true value of the second child, the outcome of the max will not change.

More generally:

(player)   MAX              (player)   MIN
           / \                         / \
min       m   ...            max     m    ...
               ...                         ...
                 \                           \
min               ?          max              ?
                /   \                       /   \
max    m > n   n     ?       min    m < n  n     ?

If m > n, the max node is guaranteed to get a utility of at least m - hence, the min node with utility n or less will never be reached. Thus, it does not need to be evaluated further. The opposite occurs for a min node.

The algorithm has:

These two values should be passed down to the child nodes during search. If the value returned by the child node is greater than α\alpha for a max node, then α\alpha should be updated, and vice versa if it is a min node. If αβ\alpha \geq \beta, the node can be pruned.

from math import inf

def alpha_beta_search(tree, is_max_node = True, alpha = -inf, beta = inf):
  # Input: nested array. Numbers are leaf nodes, arrays are inner nodes
  # Returns a tuple of the best utility and path that gets that utility (index numbers)
  # If path is none, pruning occurred or it is a leaf node
  best_path = None
  best_utility = -inf if is_max_node else inf

  if isinstance(tree, int):
    return (tree, None)

  for (i, child) in enumerate(tree):
    utility, path = alpha_beta_search(child, not is_max_node, alpha, beta)
    path = [i] if path is None else ([i] + path) # Append index of child to path received by child

    if is_max_node:
      if utility > best_utility:
        (best_utility, best_path) = (utility, path)

        if utility > alpha:
          # This child is now the largest child encountered by this max node or any parent max nodes
          alpha = utility
    else:
      if utility < best_utility:
        (best_utility, best_path) = (utility, path)

        if utility < beta:
          beta = utility

    if alpha >= beta:
      return (utility, None)
      # In the case that this is a max node:
      # The child node is min node and its value is larger than or equal to the smallest value
      # encountered so far by any parent min nodes. Hence, this (max) node pick this child or
      # a larger one, so this node's parent will reject it. Thus, this node can be pruned
      # Similar logic applies if it is a min node

  return (best_utility, best_path)
Effectiveness

Worst case: if branches are ordered, no pruning takes place.

Best case: each player’s best move is the left-most child/evaluated first.

Good move ordering improves effectiveness. Examples:

Alpha-beta search often gives us O(bd2)O(b^\frac{d}{2}); an improvement over O(bd)O(b^d) (i.e. take the square root of the branching factor).

Static (Heuristic Evaluation) Function

Estimates how good the current board configuration (a non-terminal state) is for a player.

A typical function could evaluate how good the state is for the player subtract the opponent’s score. If the board evaluation is XX for one player, it is X-X for its opponent.

This can be used to perform a cut-off search: after a maximum depth is reached, use the heuristic evaluation instead of finding the actual utility.

Sidenote: three players. We now have two degrees of freedom; the utility is a tuple, not a scalar. Each player will attempt to maximize it’s own dimensions.