01. Introduction to Human-Computer Interaction

Andy Cockburn: Room 313, working Thursdays and Fridays

Tutors: team368@cosc.canterbury.ac.nz

Course breakdown:

Labs: 9%, 1% per lab
Usability analysis and storyboard
- 25%, 5pm 22 September
Design specification and rationale
- 15%, 5pm 20 October
Exam: 51%

Goals:

Understand key human factors influencing HCI
Know and apply guidelines, models, methods that aid in interface design

HCI: discipline concerned with the design, evaluation and implementation of interactive computing systems for human use.

There should be a cycle of evaluating, designing and implementation.

Usability

Three key pillars:

Learnability: rapid attainment of some level of performance
- Can be modelled as the inverse of time spent on the interface
Efficiency: can get a lot of work done per unit time
Subjective satisfaction: how much you enjoy using it

Two minor pillars:

Errors: should be few errors in an efficient interface.
Memorability: should be memorable if the interface is learnable.

Trade-offs: efficiency and learnability (inverse of time spent) are often at odds with each other. The performance/efficiency ceiling is often lower for more learnable interfaces.

Preliminary Factors

Safety considerations
Need for throughput (efficiency)
Frequency of use
Physical space, lighting, noise, pollution
Social context
Cognitive factors: age, fatigue, stress, focus

Usability is like oxygen: you only notice it when it’s absent. See: doors with handles that you need to push.

Managing Complexity

The job of HCI is to manage complexity: designing an object to be simple and clear; the relentless pursuit of simplicity.

Interface
Complexity
    ^
    |                                              ____
    |                                         ____/
    | Poorly designed                    ____/
    | UIs; complexity               ____/
    | amplified                ____/
    |                     ____/     Well designed UIs
    |                ____/
    |           ____/
    |      ____/
    | ____/
    |/
    +--------------------------------------------------> Domain
     Door      Word        CAD          Nuclear          Complexity
             Processor                power plant

Models

Models are simplifications of reality that (should) help with the understanding of a complex artifact.

Don Norman’s Model of Interaction

From ‘The Psychology/Design of Everyday Things’, 1988.

This helps understand the designer’s role in creating a system that is used by a thinking person.

               constructs
   Designer/ -------------> System/system image
designer model                ^
                    Provides  | Provides input based on
                    feedback/ | their prediction of how
                     output   |  to achieve their goal 
                              v
                             User/
                          user model

The designer tries to construct a system that they have not fully defined. The designer’s model is their conception of interaction; often incomplete, fuzzy or compromised in the actual implementation.

System image: how the system appears to be used (by the user); this does not necessarily reflect the truth of the system.

The user’s model begins very weak, coming from familiarity with the real world or other similar systems. They will use this experience to try and interact with the user system, building their model based on feedback from the system.

Ideally, there should be conformance between the designer and user’s model.

There is no direct communication between the designer and user; the designer can only communicate with the user through the system.

Execute-Evaluate Cycle

Execute:

Goal -> Intention -> Actions -> Execution
- The user has a goal and knows the outcome they want
- They form an intention to complete the goal with the system and translate this to the language of the user interface; one or more actions
- They then execute the actions
- ‘Gulf of Execution’: problems executing intentions/actions

Evaluate:

Perceive -> Interpret -> Evaluate
- Perceive the response/feedback by the system to their actions
- Evaluate; determine the effect of their action. Did it meet their goal?
- ‘Gulf of Evaluation’: problems assessing state, determining effect etc.

UISO Interaction and Framework

Abowd and Beale, 1991.

User, System, Input and Output.

Emphasizes translation during interaction:

Articulation: user translates task from task language to input language
Performance: system acts on the user input (callbacks etc.); translates input language into core language and modifies the system state
Presentation: show the new state to the user; translate the core (system) state into output language
Observation: user interprets the new system output

User has some low level task (e.g. saving a file); they need to translate their intention to an input language; this is one of the most difficult parts of user interface design.

              --> Output ---
Presentation /              \ Observation
            /                \
           /                  v
    System                      User
    (Core)                      (Task)
           ^                  /
Performance \                / Articulation
             \--- Input <---/

Mappings

Good mappings; the relationship between controls and their effects, increase usability.

Affordances

Objects afford particular actions to users; there is a strong correlation between how it looks like it should be used and how it is used:

Door handles afford pulling
Dials afford turning
Buttons afford pushing
Bush shelters
- Glass affords smashing
- Plywood affords graffiti

Poor affordances encourages incorrect actions, but strong affordances may stifle efficiency.

Over-/Under-determined Dialogues

Well-determined: natural translation from task to input language
Under-determined: user knows what they want to do but not how to do it
- e.g. command line
Over-determined: user forced through unnecessary or unnatural steps
- e.g. ‘Click OK to proceed’, lengthy wizards
- User turns into a robot; no freedom in what to do

Beginner user interface designers tend to think about the interface in terms of system requirements: the system needs x, y, z information so lets ask the user about these things up-front. These over-determined dialogues lead to horrible design.

Direct Manipulation

Visibility of objects
Direct, rapid, incremental and reversible actions:
Reversibility allows users no-risk exploration of the user interface
Rapid feedback
Syntactic correctness
- Disable illegal actions (e.g. greying buttons out when action not available)
  - Tooltips can help with the problem of not knowing why the action is not available
Replace language with action
- Language needs to be learned and remembered (e.g. command lines)
- Actions; see and point

Advantages:

Easy to learn
Low memory requirements
Easy to undo
Immediate feedback to user actions
Users can use spatial cues

Disadvantages:

Consumes more screen real estate
High graphical system requirements
May trap users in ‘beginner mode’

The Human

Input: vision, hearing, haptics
Output: pointing, steering, speech, typing etc.
Processing: visual search (slow), decision times (fast), learning
Memory
Phenomena and collaboration
Error (predictably irrational behavior)

Fun Example

A trivial task that many humans will get wrong.

Count the number of occurrences of the letter ‘f’ given a set of words:

Finished files are the results of years of scientific study combined with the experience of many years

Three phonetics Fs: ‘finished’, ‘files’, ‘scientific’, are easily found.

But three non-phonetic Fs in ‘of’ are often forgotten.

Click

Even a blank graphic has affordances on where people usually click: on or near the center, or along the diagonals or corners.

Human Factors

Psychological and physiological abilities hae implications for design:

Perception: how we perceive things
Cognitive: how we process information
Motor: how we perform actions
Social: how we interact with others

The Human Information Processor

Card, Moran, Newell 1983.

               Eyes/Ears
                   │
                   ▼
    ┌──── Perceptual Processor ────┐
    │                              │
    ▼                              ▼
Visual Image ──────┬─────── Auditory Image
 Storage           │           Storage
                   │
                   ▼
      ┌─────Working Memory ◄─────────┐
      ▼                  ▲           ▼
    Motor                │        Long-Term
  Processor              │         Memory
      │                  │           ▲
      |                  |           |
      ▼                  │           ▼
  Movement               └──────► Cognitive
  Response                        Processor

Human Input

Vision

Cells:

Rods: low light, monochrome, 100 million rods across the retina
Cones: color, 6 million rods in fovea
- S/M/L for short/medium/long approx blue/green/reddish-yellow sensitivity

Areas:

Retina: ~120 degree range, sensitive to movement
- ~210 degrees with both eyes
- Notifications popping up in corners etc. will distract user
Fovea: detailed vision, area of ~2 degrees

1 degree = 60 arcminutes, 1 arcminute = 60 arcseconds

Visual Acuity:

Point acuity; maximum angle between two dots before they become indistinct: 1 arcminute
Grating acuity; maximum angle between alternating bars before they become indistinct: 1-2 arcminutes
Letter acuity: 5 minutes of arc
Vernier acuity: given two parallel lines, minimum angle of separation in normal axis (e.g. ---___) before they are perceived as a continuous line (colinear): 10 arcseconds

Eye movement:

Fixations: visual processing occurs only when the eye is stationary
Saccades: rapid eye movements; about 900 degrees per second
- Blind while saccades are in progress
Eye movement as input; difficult as people don’t have much control over where they are looking (e.g. accidentally looking at ‘delete all my files’ button)
Smooth-pursuit: ability to tracking moving objects (up to 100 degrees per second)
- Cannot be induced voluntarily - can’t imagine a moving dot and track it
- Relevant in scrolling

Size/depth cues:

Familiarity
Linear perspective; straight lines getting closer together
Horizontal distance
Size constancy: if object gets bigger/smaller, it’s probably the object moving closer/further away, not the object changing size
Texture gradient: texture getting bigger/smaller
Occlusion: occluded items further away
Depth of focus: blurrier the further you go away
Aerial perspective: blurrier and bluer from atmospheric haze
Shadows/shading
Stereoscopy (best within 1m, ineffective beyond 10m)

Muller-Lyer illusion:

 <->
>---<

Bottom one looks further away and is subtending the same angle, so brain perceives it as bigger.

3D, depth-based UIs:

The world is 3D so all interaction should be 3D, right?
Occlusion, far-away things being smaller, navigation/orientation etc. impedes usability unless the domain is 3D (e.g. gaming, 3D modelling)
Zooming is useful though
- Overview of the data first
- Zoom in to progressively add detail about what they are interested in and filter information they are not
- Allows UI to provide details on demand.

Color:

8% males, 0.4% females have some form of color-deficiency:
Types:
- Protanomaly: red
- Deuteranomaly: green
- Tritanomaly: blue
Least sensitive to blue

Reading:

Saccades, fixations (94% of the time), regression
Approx. 250 words/minute initially
READING SPEED REDUCED BY ALL CAPS

Auditory

Used dramatically less than vision
About 20 Hz to 15-20 kHz
Can adjust many parameters; amplitude, timbre, direction
Filtering capabilities (e.g. cocktail party effect)
Problems with signal interference and noise

Haptics

Proprioception: sense of limb location
Kinaesthesia: sense of limb movement, often more of a conscious decision
Tactition: skin sensations

Haptic feedback: any feedback providing experience of touch

Human Output

Motor response time depends on stimuli:

Visual: ~200 ms
Audio: ~150 ms
Haptics: 700 ms
Faster for combined signals

Muscle actions:

Isotonic: little resistant to movement (e.g. mouse)
Isometric: force but little motion (e.g. keyboard, ThinkPad TrackPoint™)
- Better for velocity/rate control (e.g. self-centering joysticks)

Fitts’ Law

A very reliable model of rapid, aimed human movement.

Predictive of tasks descriptive of devices
Derived from Shannon’s theory of capacity of information channels
- Signal: amplitude $A$ of movement (or $D$ for distance) (middle of target)
- Noise: width $W$ of target

Index of difficulty (ID) measures difficulty of rapid aimed movement:

\mathrm{ID} = log_2\left(\frac{A}{W} + 1\right)

Measured in ‘bits’.

Fitt’s law: movement time (MT) is linear with ID:

\begin{aligned} \mathrm{MT} &= a + b \cdot \mathrm{ID} \\ &= a + b \cdot log_2\left(\frac{A}{W} + 1\right) \end{aligned}

$1/b$ , the reciprocal of slope, is called throughput, or the bandwidth of the device in bits/second.

$a$ and $b$ are empirically determined. For a mouse:

$a$ is typically 200-500 ms
$b$ is typically 100-300 ms/bit

Typical velocity profile, validated for many types of aimed pointing:

Speed 
  ^
  |   Open-loop,
  |balistic impulse
  |    /\
  |   /  \   slow, closed-loop
  |  /    \    corrections
  | /      \  /\
  |/        \/  \/\___
  +------------------------>
           Time

Input Devices; Pointing & Scrolling

Human output is received as system input. There must be some sort of translation hardware to achieve this, which have many properties:

Direct vs indirect
- Touchscreens have perfect one-to-one correspondence
- Trackpads indirect: mouse movement does not directly map to cursor movement
Absolute vs relative
- Touchscreens, pen-tablets
- Trackpads (mostly) relative; finger location on the trackpad does not matter when moving the cursor
Control
- Position (zero-order) e.g. absolute pointing, dragging the scrollbar
- Rate (first-order) e.g. holding down on mouse wheel and dragging up/down on Windows
- Acceleration (second-order)
- Note: having lots of modes and hence complexity may decrease number of interactions while making the task take longer due to the overhead of making decisions
Isotonic: force with movement
Isometric: force without movement e.g. 3D touch to control object size
Control-display gain/transfer functions
- Magic sauce of iOS inertial scrolling, Mac trackpads, etc.

The control-display transfer function:

The input device (e.g. capacitive trackpad) sends device units
The gain function scales the input in accordance with the user or environment settings
Persistence is used to continue output even when there is no ongoing input, adding features such as inertia

                                 Transfer Function
        +-------------------------------------------------------------------------+
        |                                                     e.g. scroll inertia |
        | device   ---------------  display   --------          ---------------   |
Device -+--------> | Translation | ---------> | Gain |  ------> | Persistence | --+---> Output
Input   | units    ---------------   units    --------          ---------------   |
        |                ^                       ^                     ^          |
        |                ------- Environment/User Settings ------------           |
        +-------------------------------------------------------------------------+

Scrolling transfer function for iOS:

When the finger is on the screen there is direct mapping
After the finger leaves, the speed slowly decays
After four quick scroll gestures in quick succession, the scroll rate increases in an almost vertical line after the finger leaves
- For all subsequent scroll gestures, the maximum velocity (immediately after the finger leaves) increases until a maximum scroll velocity is reached

Input Devices: Text Input

Alternative keywords (e.g. Dvorak)
Chord keys (e.g. stenographers)
Constrained keyboards (e.g. T9 keyboards on old mobile phones)
Reactive/predictive systems (autocomplete)
Gestural input (e.g. swipe keyboards)
Hand-writing recognition

Input expressibility: how well can you discriminate inputs? e.g. Google Glass had a tiny capacitive surface; doing text entry on that posed challenges.

Steering Law

Model of continuously controlled ‘steering’: moving an item across a given path, called a ‘tunnel’:

$\mathrm{MT} = a + b \cdot \frac{A}{W}$

Where $A$ is the tunnel length and $W$ is the width. If the thickness varies, use the integral of the inverse of path width.

This is important in cascading context menus, where hovering over an item overs a submenu to the left or right. Done naïvely, while travelling to the newly-opened submenu, the cursor must always stay above the item or the submenu will disappear. macOS appears to take into account the angle of travel to determine if the submenu should be hidden or not.

Human Processing

Visual Search Time

If a person has to pick out a particular item out of $n$ randomly ordered items, the average time $T$ taken to find the item increases linearly: $T = a + b \frac{n + 1}{2}$ . However, pop-out effects where one item is visually distinct, reduces this to $O(1)$ . However, this requires the interface to predict what the user wants to select.

This is slow, so the UI should aim to reduce the amount of searching the user must do. To achieve this, ensure there is spatial stability; items appear in the same place every time.

Hick/Human Law of Decision Time

Choice reaction time when optimally prepared:

T = a + b \cdot H

Where $H$ is the ‘information entropy’; $\sum_i^n{p_i H_i}$

For item $i$ with probability $p_i$ of being selected:

H_i = \log_2 \left( \frac{1}{p_i}\right)

For $n$ equally probable items, $H = \log_2(n)$ .

Implications:

Decisions are fast - $O(\log(n))$
Applies to name retrieval (commands) and location retrieval
In GUIs, replace visual search ( $O(n)$ ) with decision through stable stability
- Don’t order commands by most recently/commonly used - forces user to visually search

Spatially Consistent User Interfaces

Pie menu: items are sectors making up a circle centered around the cursor (possibly with multiple layers of items through nesting):

Minimum of one pixel of cursor movement required for fast selection
Allows for easy advancement from visual search to muscle memory

Ribbon: spatial stability within each tab, but requires visual search and mechanical interactions to find a new item. ‘Solution’: show all tabs at once.

Search: macOS menu bar search does not run searched command, only show you where the item is located. Menu items also show the keyboard shortcut.

Torus pointing: wraps cursor around screen, gives multiple straight paths to an item. Giving users choice may help with Fitts’ law, but increase decision time.

Power Law of Practice

Performance rapidly speeds up with practice:

T_n = Cn^{-\alpha}

Where:

$T_n$ is the time taken for trial $n$
$C$ is the time taken on the first trial
$\alpha$ is the learning rate

This applies both to simple and complex tasks.

Novice to Expert Transitions

People use the same tools for years/decades, but often continue to use inefficient strategies.

Shortcut vocabularies are small and are used infrequently. Factors:

Satisficing; good enough
Lack of mnemonics (for keyboard shortcuts)
Lack of visibility

How do you support transitions to experts?

When switching between modes, there is a performance dip. Since people use software to do their jobs, not use software as their jobs, this causes a chasm that the user must take the time to cross.

^  Performance          Modality
│                        Switch
│                          |                       xxxxx
│                                          xxxxxxxxx
│                          |           xxxxx
│                                   xxxx
│                          |      xxx
│                                xx
│                          |    xx
│                Ultimate     xxx
│              xxxxxxxxxxxx| xx  ─┐
│        xxxxxx Performance  x    │ Performance
│    xxxx                  |x     │    Dip
│   xx  Extended            x     │
│  x  Learnability         |x    ─┘
│ xx
│ x                        |
│x
│x Initial                 |
│Performance
│                          |
│     First Modality             Second Modality
└──────────────────────────-────────────────────────────>
                            Time

Domains of Interface Performance Improvement

Intra-modal improvement
- Make the user an expert within the mode the user is comfortable working in
- e.g. guidance techniques where you show items the user is likely to use
Inter-modal improvement
- Make the user aware of faster ways of doing the task (e.g. file, print to ctrl P)
- e.g. skillometers
- e.g. AutoCAD shows the text command being used in the background when using UI buttons
Vocabulary extension
- e.g. track and show community command use to let users learn the most useful commands
Task strategy:
- Intelligent UI that picks up on the task the user is trying to do suggests more efficient sequences of commands to achieve this

Human Pattern of Behavior

Zipf’s Law: given a cohort of text, $n$ th most frequently occurring word appears with a probability of:

P_n \approx n^{-\alpha}

where $\alpha \approx 1$ .

Pareto Principle/80-20 Rule: 80% of usage is made up of only 20% of items.

The UI should attempt to surface these 20% of items.

Human Memory

                         maintenance
                          rehearsal
                           ┌────┐
                           │    │
                           │    ▼       elaborative
  Sensory Memory          Short-term     rehearsal    Long-term
  iconic, echoic,──────►    memory    ──────────────►  memory
  and haptic                  │       ◄──────────────
       │                      │          retrieval
       │                      │
       │                      │
       │                      │
       │                      │
       ▼                      ▼
   masking decay       displacement or
                      interference decay

Sensory memory: stimulation decays over a brief period of time; loud noises, bright lights, pain persists for some time after the stimulation disappears.

Short-Term Memory

Input from sensory or long term memory
Capacity of 7 ± 2 ‘chunks’/abstractions
Chunks aid storage and reconstruction
Fast access: ~70 ms
Rapid decay: ~200 ms
Constant update and interference
Maintenance rehearsal: e.g. repeating a number a few times in your mind

Long-Term memory

Input through elaborative rehearsal and extensive repetition
- Elaborative rehearsal: restructuring information instead of just mindlessly repeating it
Slow access: > 100 ms, sometimes days (tip of the tongue phenomenon)
Decay?
Good at recognition but bad at recall
Supports spatial processing

Human Error

Mistakes

Errors of conscious decisions; when they act according to their an incomplete/incorrect model.

Only detected with feedback.

Human Error: Slips

Errors of automatic and skilled behavior.

Capture error:

Two action sequences with common starting point(s)
Captured into the wrong (and usually more frequent) path
Used to be common e.g. in dialogue boxes with generic button labels (‘Cancel’ and ‘Ok’)

Description error:

More than one object allowing the same/similar action
Execute the right action on the wrong object
e.g. lighting panel with multiple switches

Data-driven error:

External data interfering with short-term memory
e.g. entering unrelated file name when saving a document

Loss-of-activation error:

Goal is displaced/decayed from short-term memory before it is completed
e.g. walking into room then forgetting why you entered
Want to complete task that requires subtasks and sub-subtasks to be completed, overflowing short-term memory

Mode error:

Right action in the wrong system state
Modes are system partitions with:
- Different set of commands
- Different interpretation of the same commands/actions
- Different display methods
Ensure modes are visible and noticeable
Modal dialogues are example of bad modes

Motor slip:

Pointing/steering/keying error

Premature closure error:

‘Dangling’ UI actions required after perceived goal completion
e.g. forgetting to save, attach attachments to emails

Human Phenomena

Homeostasis

People maintain equilibrium:

If a system makes something easier, people will use it to do more difficult things
If a system makes something safer, people will use it to do more dangerous things

Satisficing

People are satisfied with what they can do now and don’t bother to optimize:

People that ‘hunt-and-peck’ instead of learning to touch type
People that don’t bother to learn keyboard shortcuts for tasks they do frequently