03. Cameras and Lenses

Pinhole Cameras

Projection Equation

Let ff be the focal distance - distance between the pinhole and sensor. A point (x,y,z)(x, y, z) will be projected onto the sensor at:

u=fxzv=fyz \begin{aligned} u = f \frac{x}{z} \\ v = f \frac{y}{z} \end{aligned}

and z=fz = f. Although all values would be multiplied by 1-1 for a real camera, we can model the sensor plane as being in front of the pinhole. This projection can be represented as a matrix equation:

(uv1)(f0000f000010)(xyz1) \begin{pmatrix} u \\ v \\ 1 \end{pmatrix} \sim \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}

Lenses

Pinhole cameras must balance diffraction (aperture too small) and light ray convergence (aperture too large). Lenses allow far more light to pass through while still allowing light rays to converge on the same point on the sensor plane for light from a specific distance.

The most basic approximation of a lens is the thin lens, which assumes that the lens has zero thickness. More accurate models:

Camera Calibration

Used to determine relationship between image coordinates and real-world coordinates - geometric camera calibration.

Intrinsic Parameters

Improvements are needed to the matrix above to consider:

u=αxzαcot(θ)yz+u0v=βsin(θ)yz+v0 \begin{aligned} u &= \alpha \frac{x}{z} - \alpha\cot(\theta)\frac{y}{z} + u_0 \\ v &= \frac{\beta}{\sin(\theta)} \frac{y}{z} + v_0 \end{aligned}

As a matrix:

(uv1)=1z(ααcot(θ)u000βsin(θ)v000010)(xyz1) \begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = \frac{1}{z} \begin{pmatrix} \alpha & -\alpha\cot(\theta) & u_0 & 0 \\ 0 & \frac{\beta}{\sin(\theta)} & v_0 & 0 \\ 0 & 0 & 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1 \end{pmatrix}

In more compact notation: $$ \overrightarrow{p} = \frac{1}{z} \begin{pmatrix} K & \overrightarrow{0} \end{pmatrix} \overrightarrow{P} $$

Where P\overrightarrow{P} are the world coordinates and p\overrightarrow{p} are the pixel coordinates.

Then, extrinsic parameters: translation and rotation of the camera frame, must be taken into account, further complicating things: $$ ^C{P} = ^{C}{W}{R} + ^{W}{P} + ^{C}O{W} $$ Combining the two: $$ \overrightarrow{p} = \frac{1}{z} K\begin{pmatrix} {C}{W}{R} & ^{C}O{W} \end{pmatrix} \overrightarrow{P} = \frac{1}{z} M \overrightarrow{P} $$

(uv1)=1z(m1Tm2Tm3T)P \begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = \frac{1}{z} \begin{pmatrix} \cdotp & m_1^T & \cdotp & \cdotp \\ \cdotp & m_2^T & \cdotp & \cdotp \\ \cdotp & m_3^T & \cdotp & \cdotp \\ \end{pmatrix} \overrightarrow{P}

1=m3Pz1 = \frac{m_3 \cdot \overrightarrow{P}}{z} and hence, u=m1Pm3Pu = \frac{m_1 \cdot \overrightarrow{P}}{m_3 \cdot \overrightarrow{P}} and v=m2Pm3Pv = \frac{m_2 \cdot \overrightarrow{P}}{m_3 \cdot \overrightarrow{P}}

By using these equations on many features, we can find the value of mm that minimizes error to determine the intrinsic and extrinsic parameters.