03. Cameras and Lenses

Pinhole Cameras

Projection Equation

Let $f$ be the focal distance - distance between the pinhole and sensor. A point $(x, y, z)$ will be projected onto the sensor at:

\begin{aligned} u = f \frac{x}{z} \\ v = f \frac{y}{z} \end{aligned}

and $z = f$ . Although all values would be multiplied by $-1$ for a real camera, we can model the sensor plane as being in front of the pinhole. This projection can be represented as a matrix equation:

\begin{pmatrix} u \\ v \\ 1 \end{pmatrix} \sim \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}

Lenses

Pinhole cameras must balance diffraction (aperture too small) and light ray convergence (aperture too large). Lenses allow far more light to pass through while still allowing light rays to converge on the same point on the sensor plane for light from a specific distance.

The most basic approximation of a lens is the thin lens, which assumes that the lens has zero thickness. More accurate models:

Assume finite lens thickness
Use higher-order approximation for $\sin{\theta}$
Take chromatic aberrations, where different wavelengths have different refractive indices and thus focus at different points on the sensor, into account
- Coatings can be used to minimize this
Consider the impacts of reflection on the lens surface
- Once again, coatings can minimize this and maximize light transmission
Consider vignetting

Camera Calibration

Used to determine relationship between image coordinates and real-world coordinates - geometric camera calibration.

Intrinsic Parameters

Improvements are needed to the matrix above to consider:

Scaling: convert real-world coordinates on the sensor to pixel space by multiplying the $xyz$ values by $\alpha$ and $\beta$ , taking into account that the pixels may not be square
Origin: need to translate by $(u_0, v_0)$ as usually, the center of the sensor is not the origin
Skew: camera pixel axes may be skewed by angle $\theta$

\begin{aligned} u &= \alpha \frac{x}{z} - \alpha\cot(\theta)\frac{y}{z} + u_0 \\ v &= \frac{\beta}{\sin(\theta)} \frac{y}{z} + v_0 \end{aligned}

As a matrix:

\begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = \frac{1}{z} \begin{pmatrix} \alpha & -\alpha\cot(\theta) & u_0 & 0 \\ 0 & \frac{\beta}{\sin(\theta)} & v_0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}

In more compact notation: $$ \overrightarrow{p} = \frac{1}{z} \begin{pmatrix} K & \overrightarrow{0} \end{pmatrix} \overrightarrow{P} $$

Where $\overrightarrow{P}$ are the world coordinates and $\overrightarrow{p}$ are the pixel coordinates.

Then, extrinsic parameters: translation and rotation of the camera frame, must be taken into account, further complicating things: $$ ^C{P} = ^{C}{W}{R} + ^{W}{P} + ^{C}O{W} $$ Combining the two: $$ \overrightarrow{p} = \frac{1}{z} K\begin{pmatrix} {C}{W}{R} & ^{C}O{W} \end{pmatrix} \overrightarrow{P} = \frac{1}{z} M \overrightarrow{P} $$

\begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = \frac{1}{z} \begin{pmatrix} \cdotp & m_1^T & \cdotp & \cdotp \\ \cdotp & m_2^T & \cdotp & \cdotp \\ \cdotp & m_3^T & \cdotp & \cdotp \\ \end{pmatrix} \overrightarrow{P}

$1 = \frac{m_3 \cdot \overrightarrow{P}}{z}$ and hence, $u = \frac{m_1 \cdot \overrightarrow{P}}{m_3 \cdot \overrightarrow{P}}$ and $v = \frac{m_2 \cdot \overrightarrow{P}}{m_3 \cdot \overrightarrow{P}}$

By using these equations on many features, we can find the value of $m$ that minimizes error to determine the intrinsic and extrinsic parameters.