Weighting:
- 50%: research project presented as a conference style paper, including a relevant literature review (due week 8)
- 15-25 references in the paper
- Deep learning: use when it will give you a better result
- No novelty required
- Possible projects:
- Automatic timecode/GPS coordinate extraction from dashcam/surveillance cameras?
- Censoring faces in real-time (except specific people)
- AR angry birds with finger dragging on table
- Automatically turning screen off when no one is looking at the screen
- Othello/chess state detection
- Russian flag replacement (with Ukraine)
- No-mask detection (stretch goal: if holding drink/food, they get a pass)
- Finger tracking for cursor movement (e.g. hand wet)
- Bread baking? Does rate at which bread rises drop near the end of the baking cycle
- Port/cable detection (e.g. USB-C, HDMI)
- Virtual touch bar: mirror on webcam to view keyboard area, add stickers and detect touch to change brightness
- Drink detection: detect when you take a drink, alert if you haven’t taken a sip in an hour
- Deep learning: feel free to use it where beneficial
- Classification: is there an x (and possibly a y or z) in the image or not?
- Object detection: where is the object in the image?
- Dense/semantic segmentation: recognizing what ‘object’ (e.g. sky) each pixel belongs to
- Instance segmentation: recognizing and locating multiple instances of an object
- Tracking: tracking the motion of objects
- 10%: class participation/presentations (presentations in final weeks of semester)
- 40%: final exam
Goal: to recognize objects and their motions.
Signal processing on 1D data; computer vision on 2D data.
Image processing on still images; computer vision on video and still images.
Difficulties
The sensory gap: gap between reality and what a recording of the scene can capture.
The semantic gap: lack of contextual knowledge about a scene.
The human visual system is very good:
- ~50% of cerebral cortex used for vision
- Emualating the operation of human behaviours:
- Perception based on relative brightness (e.g. Checker shadow illusion)
- Logarithmic response to brightness
- Low-light performance (for passive systems)
Recovering 3D information
Several cues available:
- Motion
- Stereo (< 3m)
- Texture
- Shading
- Contours
Or actual depth hardware:
- Active stereo (IR - brightness limited for safety)
- Structured light: dot projection
- Time of flight
- RF-modulated
- Direct timing (LIDAR)
- Brightness not limited as eye exposure duration so short
Labs: Intel Realsense D435.
Processing
- Pre-processing: noise reduction, contrast etc.
- Low-level: color, boundary/edge, shape, texture detection
- High-level: object detection, determining spatial relationships, meanings
The higher-level the processing, the less generic and the more domain-specific knowledge is required.
Low-Level Image Processing:
- Image compression
- Noise reduction (while maximizing information kept)
- Edge extraction
- Contrast enhancement (only helps humans)
- Segmentation
- Thresholding
- Morphology
- Image restoration (e.g. deblurring using knowledge of speed of subject)
Approaches
- Build a simple model of the world; constrain the environment (e.g. fixed lighting environment)
- Focus on definite tasks with clear requirements
- Find provably good algorithms
- Try ideas based on theory
- Experiment on real world
- Solutions may not generalize to a more complex environment
- Update the model