Scale and rotation invariant descriptors.
Correspondence using window matching: if points are used, it is highly ambiguous.
Stereo Cameras
Baseline: distance between cameras. Wider baseline allows greater accuracy further away, while a smaller baseline allows overlap at closer distances. Increased camera resolution can increase depth resolution overall.
Rectification: transform of images onto a common image plane (i.e. sensor, but inverted). The image planes of the two cameras must be parallel
Correspondence using window matching:
-
For each window and for every pixel offset, determine how well the two match; minimum will give you the pixel offset and thus distance
-
For window of size
: -
Matching metric: sum of squared pixel differences
-
Image normalization: variation in sensor gain/sensitivity means normalization is recommended.
Window magnitude: $$ \left\Vert I \right\Vert_{W_m(x, y)} = \sqrt{\sum_{(u, v) \in W_m(x, y)}{\left[I(u, v)\right]^2}} $$
Average:
Normalized pixel:
Vectorization: convert matrix into vector by unwrapping: horizontal lines together. Denote as
Normalization scales magnitude of the
Distance ((normalized) sum of squared differences);
Normalized correlation:
Local Features
Aperture problem and normal flow: if you only have a partial view of a moving, one-dimensional object, you cannot always tell how it is moving (e.g. a moving line whose ends are outside the viewport) n.
Given velocities
Normal flow, the vector representing translation of the line in the direction in the direction of its normal, can be written as:
By considering multiple moving points, the velocity
Lucas-Kanade
Assumes the same velocity for all pixels within the window, and that pixel intensities do not change between frames.
https://docs.opencv.org/4.5.0/d4/dee/tutorial_optical_flow.html
Solve using:
LHS: sum of outer product tensor of gradient vector
Good features:
- Satisfy brightness constancy
- Has sufficient texture variation
- But not too much texture variation - too many edges is also a problem as it is hard to tell how the
- Corresponds to a ‘real’ surface patch (e.g. shadows not real)
- Does not deform too much over time
Previous equation can be written as
For this to be solvable:
-
should be invertible -
should not be too small - signal to noise ratio - Eigenvalues
and should not be too small
- Eigenvalues
-
should be well-conditioned -
should not be too large (where is the larger eigenvalue)
-
- Original scoring function:
- Shi-Tomasi scoring function:
Harris Detector
Using auto-correlation on ‘interesting’ points - where there are important differences in all directions.
For a point
Avoiding discrete shifts:
Auto-correlation matrix:
The matrix captures the structure of the local neighborhood. Interest can be measured using the matrix’s eigenvalues:
- 2 strong eigenvalues: interesting point
- 1 strong eigenvalue: contour
- 0 eigenvalue: uniform regions
Interest point detection can be done using thresholding, or a local maximum for localization.
Feature distortion:
- Model as Affine transforms: parallel lines preserved
- OpenCV
findFeatures - Affine transforms:
-
-
- Six parameters, min. six pixels per window
- Pass into BCCE and minimize error
-
Invariant Local Features
Local features that are invariant to translation, rotation, scale etc… They should have:
- Locality: features are local; robust to occlusion/clutter
- Distinctiveness: individual features can be matched to a large database of objects
- Quantity: even small projects should have features
- Efficiency: close to real-time performance
- Extensibility: can be extended for a wide range of differing feature types
SIFT: Scale-invariant feature transform (SIFT).
- Scale invariance:
- Gaussian pyramid
- Blur then halve the dimensions (one octave)
- Compute Difference of Gaussian (DoG): difference between neighboring Gaussian layers
- Approximation of Laplacian of Gaussian
- Compare pixel against 8 neighbors in same scale, plus against 9 neighbors in the DoG one octave above/below: use as keypoint if a minimum/maximum across all 26 neighbors
- Gaussian pyramid
- Rotation invariance:
- Create histogram of local (i.e. neighbor pixel) gradient directions; each bin covers 10 degrees
- Canonical orientation = peak of histogram
- Descriptor:
- 16x16 region in scale space around keypoint
- Rotate region to match canonical orientation
- Create orientation histograms on 4x4 pixel neighborhoods; 8 bins/orientations each
- Hence 16 neighborhoods with 8 bins each: 128 element vector