8. Computational Geometry II

Range Searching

Introduction

There are a large number of geometric objects. How do you find all objects intersecting/within a given area/volume?

We assume:

Objects are points
Range is rectangular
There are many queries on the same point set
- Hence, we can pre-process the data

1D range searching

Determine which points lie in $[r_{min}, r_{max}]$ :

matches = [x for x in nums if rmin <= x <= rmax]

To improve performance, sort the items: $O(n\log{n})$ . Each query will use binary search to find the smallest point in the range, and then iterate through the list until the item is out of the range. Hence, the cost of each query is $O(\log{n} + m)$ , where $m$ is the number of points in the range.

However, this approach does not generalize to multiple dimensions.

1D Binary Search Tree (1D k-d tree)

A balanced binary search tree with numbers at the leaves
Internal nodes are medians - (almost) len(data)//2 - 1
Left subtree contains $\text{numbers} \le \text{median}$
Right subtree contains $\text{median} < \text{numbers}$
Assume no duplicates

To create the search tree, sort the elements; the left subtree should contain the lower half of the list, plus element with index len(list)//2 - 1. Repeat the process with the subtrees until you reach a leaf node.

from collections import namedtuple
Node = namedtuple("Node", ["value", "left", "right" # Cheap way to make a class

def search_tree(nums, sorted=False):
  if not sorted:
    nums = sorted(nums)

  if len(nums) == 1:
    tree = Node(nums[0], None, None) # Leaf node
    else:
      midpoint = len(nums) // 2 # Halfway, rounding down
      left = search_tree(nums[:midpoint], True)
      # Up to but not including midpoint
      right = search_tree(nums[midpoint:], True)
      # Including the midpoint and the end
      tree = Node(nums[midpoint - 1], left, right)
      # Left child will be equal to parent, which meets the
      # less than or equal to criteria

    return tree

Search

def range_query(tree, low, high):
  # Inclusive range
  matches = []
  if tree.left is None and tree.right is None:
    if low <= tree.value <= high:
      matches.append(tree.value)
  else:
    if low <= node.value and tree.left is not None:
      matches += query(tree.left, low, high)
    if node.value < high and tree.right is not None:
      # Range may include both children, so use if and not else
      matches += range_query(tree.right, low, high)
  return matches

The algorithm visits at least one leaf node, so the minimum cost is $O(\log{n})$ .

If it visits $m$ leaves, then it also traversed $m/2$ parents, $m/4$ , grandparents, $\dots$ , 1 root node.

$\frac{m}{2} + \frac{m}{4} + \frac{m}{8} + \dots \approx 2m$ , so the order of complexity is $O(\log{n+m})$ .

k-d trees

Spatial Subdivision

Regular grid: regular divisions.

Quad tree: regular grid, but each cell contains a sufficiently small number of cells; if it has more, recursively subdivide it into quarters.

k-d tree: divide cell into halves along the median (each half contains the same number of cells) on one axis. For each half, divide it again into equal halves again, but this time on the other axis. Rinse and repeat until there a sufficiently small number of cells in each cell or a maximum depth is reached.

Differences:

The leaf nodes can contain multiple points
Nodes labelled with the axis it is subdividing on
- Subscript is a list showing path from root to node (0 for left, 1 for right)
  - e.g. 01 means root/left/right
Only worth rebuilding the entire tree if there are significant additions/removals

def kd_tree(points, depth = 0):
  if len(points) <= threshold or depth >= max_depth:
    return Leaf(points)
  else:
    axis = depth % num_axes
    points.sort(key=lambda point: point[axis])
    mid_index = len(points) // 2
    midpoint = points[mid_index - 1][axis]
    # Subtract 1 since left <= midpoint, left does not include midpoint
    left =  kd_tree(points[:mid_index], depth + 1)
    right = kd_tree(points[mid_index:], depth + 1)
    return Node(axis, midpoint, left, right)

k-d Range Search:

def points_in_range(node, query_shape):
  if node is leaf:
    matches = points in node.points within shape
  else:
    matches = []
    if intersection of shape and lower half space not empty:
      matches += points_in_range(node.left, query_shape)
    if intersection of space and upper half space not empty:
      matches += points_in_range(node.right, query_shape)
  
  return matches

Unlike in the one dimensional case, multiple points can have the same value for a given dimension as when the split occurs on another dimension, it could end up on either side. Hence, both the lower and upper half of the shape can include points on the line.

Closest Neighbor

For a given point (that does not need to be in the tree), find the closest point within the tree.

Use the k-d tree to find the cell the point would be in - it is guaranteed to contain at least one point
Find the closest point in the cell, and take that distance as the upper bound on the minimum distance
Search for cells that intersect with the square (notes)/circle (optimal) centred around the original point
- Check if any of the points inside the cells are closer

Quadtrees

Split each region into 4 sub-regions, partitioned evenly on both axes
Repeat until the maximum depth is reached, or the region contains fewer than some specified number off points
Node contains coordinates of centre, width/height of the square, and their children
Node labels: path from root e.g. 0/1/1 would be root/quadrant 1/quadrant 1

This only works in 2D. Performance:

Best case: equally distributed, $T(n) = n + T(\frac{n}{4}) = n\log{n}$
Worst case: all points in the same cell at depth $d$ : $O(nd)$

Line Sweep

Simulates a vertical line sweeping across objects on $xy$ plane on one direction
Sort points by the $x$ coordinate
At any point, solution to problem for all points to the left is known
When point added, check if solution changes

Data structures:

Event queue
- Positions of event points
- Visited points deleted from queue
Status structure
- State of algorithm at each position
Solution set
- Solution at current position

Closest Point-Pair

$n$ points on a plane: find the points $P$ and $Q$ such that they are the minimum distance from each other.

In the 1D case, simply sort then search through the list sequentially: $O(n\log{n})$ .

2D Case

Idea:

Store current closest distance $d$ , current point $P$
Frontier set: set of points where the $x$ coordinate is in the range $(P_x - d, P_x]$
- Kind-of-but-not-really points where the $y$ coordinate is also in the range $(P_y - d, P+y + d)$
Only need to the check distance $P$ is to for points in the frontier set - any point further away cannot be the closest point

Data structure:

Point list
- Points sorted by x coordinate
State
- Current sweep position - index in point list
  - Initially $P_2$
- Current best point pair and distance (NOT $x$ distance)
  - Initially $(P_0, P_1)$
- Frontier set, $F$
  - Includes current point
  - Initially $\{ P_0, P_1, P_2 \}$

At each step:

Get the current point, add it to the frontier set
Remove elements from $F$ where the horizontal distance from the current point is greater than $d$
- DO NOT remove elements further away than $d$ vertically
If they are closer than $d$ vertically, calculate the distance and update $d$ if necessary

The frontier set must be sorted by $y$ to make it efficient and have efficient addition and removal. Hence, a self-balancing binary tree or SortedContainer could be used.

Complexity:

Sorting by $x$ : $O(n\log{n})$
Each point added and removed from frontier set: $O(n\log{n})$ total
Search for closest pair takes total $O(n\log{n})$ time ○ $O(\log{n})$ to select the range of points $d$ from the point in the $y$ axis
Hence, the algorithm is $O(n\log{n})$ in terms of time