06. Local Features

Scale and rotation invariant descriptors.

Correspondence using window matching: if points are used, it is highly ambiguous.

Stereo Cameras

Baseline: distance between cameras. Wider baseline allows greater accuracy further away, while a smaller baseline allows overlap at closer distances. Increased camera resolution can increase depth resolution overall.

Rectification: transform of images onto a common image plane (i.e. sensor, but inverted). The image planes of the two cameras must be parallel

Correspondence using window matching:

Image normalization: variation in sensor gain/sensitivity means normalization is recommended.

Window magnitude: $$ \left\Vert I \right\Vert_{W_m(x, y)} = \sqrt{\sum_{(u, v) \in W_m(x, y)}{\left[I(u, v)\right]^2}} $$

Average:

Iˉ=1Wm(x,y)(u,v)Wm(x,y)I(u,v) \bar{I} = \frac{1}{\vert W_m(x, y) \vert}\sum_{(u, v) \in W_m(x, y)}{I(u, v)}

Normalized pixel:

I^(x,y)=I(x,y)IˉIIˉWm(x,y) \hat{I}_{(x, y)} = \frac{I(x, y) - \bar{I}}{\left\Vert I - \bar{I}\right\Vert_{W_m(x, y)}}

Vectorization: convert matrix into vector by unwrapping: horizontal lines together. Denote as ω\omega.

Normalization scales magnitude of the m2m^2-dimensional space into unit length. Two metrics possible for comparing two windows: distance and angle.

Distance ((normalized) sum of squared differences);

CSSD(d)=ωLωR(d)2 C_\textrm{SSD}(d) = \Vert \omega_L - \omega_R(d) \Vert^2

ωR(d)\omega_R(d) is window centered around (xd,y)(x - d, y))

Normalized correlation:

CNC(d)=ωLωR(d)=cos(θ) C_\textrm{NC}(d) = \omega_L \cdot \omega_R(d) = \cos(\theta)

Local Features

Aperture problem and normal flow: if you only have a partial view of a moving, one-dimensional object, you cannot always tell how it is moving (e.g. a moving line whose ends are outside the viewport) n.

Given velocities uu and vv and partial derivatives IxI_x, IyI_y and ItI_t for a given pixel, the brightness change constraint equation (BCCE), which states that brightness should stay constant for a given point over time, can be approximated (1st order Taylor series) as:

Ixu+Iyv+It=0IU=0 \begin{aligned} I_xu + I_yv + I_t &= 0 \\ \nabla I \cdot \vec{U} &= 0 \end{aligned}

Normal flow, the vector representing translation of the line in the direction in the direction of its normal, can be written as:

u=ItIII u_\perp = - \frac{I_t}{\vert \nabla I \vert} \frac{\nabla I}{\vert \nabla I \vert}

By considering multiple moving points, the velocity UU can be found:

I1U=It1I2U=It2 \begin{aligned} \nabla I^1 \cdot U &= -I_t^1 \\ \nabla I^2 \cdot U &= -I_t^2 \\ \dots \end{aligned}

Lucas-Kanade

Assumes the same velocity for all pixels within the window, and that pixel intensities do not change between frames.

https://docs.opencv.org/4.5.0/d4/dee/tutorial_optical_flow.html

E(u,v)=x,yΩ(Ix(x,y)u+Iy(x,y)v+It)2 E(u, v) = \sum_{x, y \in \Omega}{ \left( I_x(x, y)u + I_y(x, y)v + I_t \right)^2 }

Solve using:

[Ix2IxIyIxIyIy2](uv)=(IxItIyIt) \begin{bmatrix} \sum{I_x^2} & \sum{I_x I_y} \\ \sum{I_x I_y} & \sum{I_y^2} \\ \end{bmatrix} \begin{pmatrix} u \\ v \end{pmatrix} = -\begin{pmatrix} \sum{I_x I_t} \\ \sum{I_y I_t} \end{pmatrix}

LHS: sum of outer product tensor of gradient vector

(IIT)U=IIt \left( \sum{\nabla I \nabla I^T} \right) \vec{U} = -\sum{\nabla I I_t}

Good features:

Previous equation can be written as Au=b\bold{A}\vec{u} = -\vec{b}.

For this to be solvable:

Harris Detector

Using auto-correlation on ‘interesting’ points - where there are important differences in all directions.

For a point (x,y)(x, y) and shift (Δx,Δy)(\Delta x, \Delta y), the auto-correlation is:

f(x,y)(xk,yk)W(I(xk,yk)I(xk+Δx,yk+Δy))2 f(x, y) \sum_{(x_k, y_k) \in W}{ \left( I(x_k, y_k) - I( x_k + \Delta x, y_k + \Delta y ) \right)^2 }

Avoiding discrete shifts:

I(xk+Δx,yk+Δy)=I(xk,yk)+(Ix(xk,yk)Iy(xk,yk))(ΔxΔy) I(x_k + \Delta x, y_k + \Delta y) = I(x_k, y_k) + \begin{pmatrix}I_x(x_k, y_k) & I_y(x_k, y_k) \end{pmatrix} \begin{pmatrix}\Delta x \\ \Delta y \end{pmatrix}
f(x,y)=(xk,yk)W((Ix(xk,yk)Iy(xk,yk))(ΔxΔy))2 f(x, y) = \sum_{(x_k, y_k) \in W}{ \left( \begin{pmatrix}I_x(x_k, y_k) & I_y(x_k, y_k) \end{pmatrix} \begin{pmatrix}\Delta x \\ \Delta y \end{pmatrix} \right)^2 }

Auto-correlation matrix:

=(ΔxΔy)[(xk,yk)W(Ix(xk,yk))2(xk,yk)WIx(xk,yk)Iy(xk,yk)(xk,yk)WIx(xk,yk)Iy(xk,yk)(xk,yk)W((Iy(xk,yk))2](ΔxΔy) = \begin{pmatrix}\Delta x & \Delta y\end{pmatrix} \begin{bmatrix} \sum_{(x_k, y_k) \in W}{\left(I_x(x_k, y_k)\right)^2} & \sum_{(x_k, y_k) \in W}{I_x(x_k, y_k) I_y(x_k, y_k)} \\ \sum_{(x_k, y_k) \in W}{I_x(x_k, y_k) I_y(x_k, y_k)} & \sum_{(x_k, y_k) \in W}{\left((I_y(x_k, y_k)\right)^2} \end{bmatrix} \begin{pmatrix}\Delta x \\ \Delta y\end{pmatrix}

The matrix captures the structure of the local neighborhood. Interest can be measured using the matrix’s eigenvalues:

Interest point detection can be done using thresholding, or a local maximum for localization.

Feature distortion:

Invariant Local Features

Local features that are invariant to translation, rotation, scale etc… They should have:

SIFT: Scale-invariant feature transform (SIFT).

Sift explanation