Range Searching
Introduction
There are a large number of geometric objects. How do you find all objects intersecting/within a given area/volume?
We assume:
- Objects are points
- Range is rectangular
- There are many queries on the same point set
- Hence, we can pre-process the data
1D range searching
Determine which points lie in
matches = [x for x in nums if rmin <= x <= rmax]
To improve performance, sort the items:
However, this approach does not generalize to multiple dimensions.
1D Binary Search Tree (1D k-d tree)
- A balanced binary search tree with numbers at the leaves
- Internal nodes are medians - (almost)
len(data)//2 - 1 - Left subtree contains
- Right subtree contains
- Assume no duplicates
To create the search tree, sort the elements; the left subtree should contain the lower half of the list, plus element with index len(list)//2 - 1. Repeat the process with the subtrees until you reach a leaf node.
from collections import namedtuple
Node = namedtuple("Node", ["value", "left", "right" # Cheap way to make a class
def search_tree(nums, sorted=False):
if not sorted:
nums = sorted(nums)
if len(nums) == 1:
tree = Node(nums[0], None, None) # Leaf node
else:
midpoint = len(nums) // 2 # Halfway, rounding down
left = search_tree(nums[:midpoint], True)
# Up to but not including midpoint
right = search_tree(nums[midpoint:], True)
# Including the midpoint and the end
tree = Node(nums[midpoint - 1], left, right)
# Left child will be equal to parent, which meets the
# less than or equal to criteria
return tree
Search
def range_query(tree, low, high):
# Inclusive range
matches = []
if tree.left is None and tree.right is None:
if low <= tree.value <= high:
matches.append(tree.value)
else:
if low <= node.value and tree.left is not None:
matches += query(tree.left, low, high)
if node.value < high and tree.right is not None:
# Range may include both children, so use if and not else
matches += range_query(tree.right, low, high)
return matches
The algorithm visits at least one leaf node, so the minimum cost is
If it visits
k-d trees
Spatial Subdivision
Regular grid: regular divisions.
Quad tree: regular grid, but each cell contains a sufficiently small number of cells; if it has more, recursively subdivide it into quarters.
k-d tree: divide cell into halves along the median (each half contains the same number of cells) on one axis. For each half, divide it again into equal halves again, but this time on the other axis. Rinse and repeat until there a sufficiently small number of cells in each cell or a maximum depth is reached.
Differences:
- The leaf nodes can contain multiple points
- Nodes labelled with the axis it is subdividing on
- Subscript is a list showing path from root to node (0 for left, 1 for right)
- e.g.
01meansroot/left/right
- e.g.
- Subscript is a list showing path from root to node (0 for left, 1 for right)
- Only worth rebuilding the entire tree if there are significant additions/removals
def kd_tree(points, depth = 0):
if len(points) <= threshold or depth >= max_depth:
return Leaf(points)
else:
axis = depth % num_axes
points.sort(key=lambda point: point[axis])
mid_index = len(points) // 2
midpoint = points[mid_index - 1][axis]
# Subtract 1 since left <= midpoint, left does not include midpoint
left = kd_tree(points[:mid_index], depth + 1)
right = kd_tree(points[mid_index:], depth + 1)
return Node(axis, midpoint, left, right)
k-d Range Search:
def points_in_range(node, query_shape):
if node is leaf:
matches = points in node.points within shape
else:
matches = []
if intersection of shape and lower half space not empty:
matches += points_in_range(node.left, query_shape)
if intersection of space and upper half space not empty:
matches += points_in_range(node.right, query_shape)
return matches
Unlike in the one dimensional case, multiple points can have the same value for a given dimension as when the split occurs on another dimension, it could end up on either side. Hence, both the lower and upper half of the shape can include points on the line.
Closest Neighbor
For a given point (that does not need to be in the tree), find the closest point within the tree.
- Use the k-d tree to find the cell the point would be in - it is guaranteed to contain at least one point
- Find the closest point in the cell, and take that distance as the upper bound on the minimum distance
- Search for cells that intersect with the square (notes)/circle (optimal) centred around the original point
- Check if any of the points inside the cells are closer
Quadtrees
- Split each region into 4 sub-regions, partitioned evenly on both axes
- Repeat until the maximum depth is reached, or the region contains fewer than some specified number off points
- Node contains coordinates of centre, width/height of the square, and their children
- Node labels: path from root e.g.
0/1/1would beroot/quadrant 1/quadrant 1
This only works in 2D. Performance:
- Best case: equally distributed,
- Worst case: all points in the same cell at depth
:
Line Sweep
- Simulates a vertical line sweeping across objects on
plane on one direction - Sort points by the
coordinate - At any point, solution to problem for all points to the left is known
- When point added, check if solution changes
Data structures:
- Event queue
- Positions of event points
- Visited points deleted from queue
- Status structure
- State of algorithm at each position
- Solution set
- Solution at current position
Closest Point-Pair
In the 1D case, simply sort then search through the list sequentially:
2D Case
Idea:
- Store current closest distance
, current point - Frontier set: set of points where the
coordinate is in the range - Kind-of-but-not-really points where the
coordinate is also in the range
- Kind-of-but-not-really points where the
- Only need to the check distance
is to for points in the frontier set - any point further away cannot be the closest point
Data structure:
- Point list
- Points sorted by x coordinate
- State
- Current sweep position - index in point list
- Initially
- Initially
- Current best point pair and distance (NOT
distance) - Initially
- Initially
- Frontier set,
- Includes current point
- Initially
- Current sweep position - index in point list
At each step:
- Get the current point, add it to the frontier set
- Remove elements from
where the horizontal distance from the current point is greater than - DO NOT remove elements further away than
vertically
- DO NOT remove elements further away than
- If they are closer than
vertically, calculate the distance and update if necessary
The frontier set must be sorted by SortedContainer could be used.
Complexity:
- Sorting by
: - Each point added and removed from frontier set:
total - Search for closest pair takes total
time ○ to select the range of points from the point in the axis - Hence, the algorithm is
in terms of time