01. Introduction to Human-Computer Interaction

Andy Cockburn: Room 313, working Thursdays and Fridays

Tutors: team368@cosc.canterbury.ac.nz

Course breakdown:

Labs: 9%, 1% per lab
Usability analysis and storyboard
- 25%, 5pm 22 September
Design specification and rationale
- 15%, 5pm 20 October
Exam: 51%

Goals:

Understand key human factors influencing HCI
Know and apply guidelines, models, methods that aid in interface design

HCI: discipline concerned with the design, evaluation and implementation of interactive computing systems for human use.

There should be a cycle of evaluating, designing and implementation.

Usability

Three key pillars:

Learnability: rapid attainment of some level of performance
- Can be modelled as the inverse of time spent on the interface
Efficiency: can get a lot of work done per unit time
Subjective satisfaction: how much you enjoy using it

Two minor pillars:

Errors: should be few errors in an efficient interface.
Memorability: should be memorable if the interface is learnable.

Trade-offs: efficiency and learnability (inverse of time spent) are often at odds with each other. The performance/efficiency ceiling is often lower for more learnable interfaces.

Preliminary Factors

Safety considerations
Need for throughput (efficiency)
Frequency of use
Physical space, lighting, noise, pollution
Social context
Cognitive factors: age, fatigue, stress, focus

Usability is like oxygen: you only notice it when it’s absent. See: doors with handles that you need to push.

Managing Complexity

The job of HCI is to manage complexity: designing an object to be simple and clear; the relentless pursuit of simplicity.

Interface
Complexity
    ^
    |                                              ____
    |                                         ____/
    | Poorly designed                    ____/
    | UIs; complexity               ____/
    | amplified                ____/
    |                     ____/     Well designed UIs
    |                ____/
    |           ____/
    |      ____/
    | ____/
    |/
    +--------------------------------------------------> Domain
     Door      Word        CAD          Nuclear          Complexity
             Processor                power plant

Models

Models are simplifications of reality that (should) help with the understanding of a complex artifact.

Don Norman’s Model of Interaction

From ‘The Psychology/Design of Everyday Things’, 1988.

This helps understand the designer’s role in creating a system that is used by a thinking person.

               constructs
   Designer/ -------------> System/system image
designer model                ^
                    Provides  | Provides input based on
                    feedback/ | their prediction of how
                     output   |  to achieve their goal 
                              v
                             User/
                          user model

The designer tries to construct a system that they have not fully defined. The designer’s model is their conception of interaction; often incomplete, fuzzy or compromised in the actual implementation.

System image: how the system appears to be used (by the user); this does not necessarily reflect the truth of the system.

The user’s model begins very weak, coming from familiarity with the real world or other similar systems. They will use this experience to try and interact with the user system, building their model based on feedback from the system.

Ideally, there should be conformance between the designer and user’s model.

There is no direct communication between the designer and user; the designer can only communicate with the user through the system.

Execute-Evaluate Cycle

Execute:

Goal -> Intention -> Actions -> Execution
- The user has a goal and knows the outcome they want
- They form an intention to complete the goal with the system and translate this to the language of the user interface; one or more actions
- They then execute the actions
- ‘Gulf of Execution’: problems executing intentions/actions

Evaluate:

Perceive -> Interpret -> Evaluate
- Perceive the response/feedback by the system to their actions
- Evaluate; determine the effect of their action. Did it meet their goal?
- ‘Gulf of Evaluation’: problems assessing state, determining effect etc.

UISO Interaction and Framework

Abowd and Beale, 1991.

User, System, Input and Output.

Emphasizes translation during interaction:

Articulation: user translates task from task language to input language
Performance: system acts on the user input (callbacks etc.); translates input language into core language and modifies the system state
Presentation: show the new state to the user; translate the core (system) state into output language
Observation: user interprets the new system output

User has some low level task (e.g. saving a file); they need to translate their intention to an input language; this is one of the most difficult parts of user interface design.

              --> Output ---
Presentation /              \ Observation
            /                \
           /                  v
    System                      User
    (Core)                      (Task)
           ^                  /
Performance \                / Articulation
             \--- Input <---/

Mappings

Good mappings; the relationship between controls and their effects, increase usability.

Affordances

Objects afford particular actions to users; there is a strong correlation between how it looks like it should be used and how it is used:

Door handles afford pulling
Dials afford turning
Buttons afford pushing
Bush shelters
- Glass affords smashing
- Plywood affords graffiti

Poor affordances encourages incorrect actions, but strong affordances may stifle efficiency.

Over-/Under-determined Dialogues

Well-determined: natural translation from task to input language
Under-determined: user knows what they want to do but not how to do it
- e.g. command line
Over-determined: user forced through unnecessary or unnatural steps
- e.g. ‘Click OK to proceed’, lengthy wizards
- User turns into a robot; no freedom in what to do

Beginner user interface designers tend to think about the interface in terms of system requirements: the system needs x, y, z information so lets ask the user about these things up-front. These over-determined dialogues lead to horrible design.

Direct Manipulation

Visibility of objects
Direct, rapid, incremental and reversible actions:
Reversibility allows users no-risk exploration of the user interface
Rapid feedback
Syntactic correctness
- Disable illegal actions (e.g. greying buttons out when action not available)
  - Tooltips can help with the problem of not knowing why the action is not available
Replace language with action
- Language needs to be learned and remembered (e.g. command lines)
- Actions; see and point

Advantages:

Easy to learn
Low memory requirements
Easy to undo
Immediate feedback to user actions
Users can use spatial cues

Disadvantages:

Consumes more screen real estate
High graphical system requirements
May trap users in ‘beginner mode’

The Human

Input: vision, hearing, haptics
Output: pointing, steering, speech, typing etc.
Processing: visual search (slow), decision times (fast), learning
Memory
Phenomena and collaboration
Error (predictably irrational behavior)

Fun Example

A trivial task that many humans will get wrong.

Count the number of occurrences of the letter ‘f’ given a set of words:

Finished files are the results of years of scientific study combined with the experience of many years

Three phonetics Fs: ‘finished’, ‘files’, ‘scientific’, are easily found.

But three non-phonetic Fs in ‘of’ are often forgotten.

Click

Even a blank graphic has affordances on where people usually click: on or near the center, or along the diagonals or corners.

Human Factors

Psychological and physiological abilities hae implications for design:

Perception: how we perceive things
Cognitive: how we process information
Motor: how we perform actions
Social: how we interact with others

The Human Information Processor

Card, Moran, Newell 1983.

               Eyes/Ears
                   │
                   ▼
    ┌──── Perceptual Processor ────┐
    │                              │
    ▼                              ▼
Visual Image ──────┬─────── Auditory Image
 Storage           │           Storage
                   │
                   ▼
      ┌─────Working Memory ◄─────────┐
      ▼                  ▲           ▼
    Motor                │        Long-Term
  Processor              │         Memory
      │                  │           ▲
      |                  |           |
      ▼                  │           ▼
  Movement               └──────► Cognitive
  Response                        Processor

Human Input

Vision

Cells:

Rods: low light, monochrome, 100 million rods across the retina
Cones: color, 6 million rods in fovea
- S/M/L for short/medium/long approx blue/green/reddish-yellow sensitivity

Areas:

Retina: ~120 degree range, sensitive to movement
- ~210 degrees with both eyes
- Notifications popping up in corners etc. will distract user
Fovea: detailed vision, area of ~2 degrees

1 degree = 60 arcminutes, 1 arcminute = 60 arcseconds

Visual Acuity:

Point acuity; maximum angle between two dots before they become indistinct: 1 arcminute
Grating acuity; maximum angle between alternating bars before they become indistinct: 1-2 arcminutes
Letter acuity: 5 minutes of arc
Vernier acuity: given two parallel lines, minimum angle of separation in normal axis (e.g. ---___) before they are perceived as a continuous line (colinear): 10 arcseconds

Eye movement:

Fixations: visual processing occurs only when the eye is stationary
Saccades: rapid eye movements; about 900 degrees per second
- Blind while saccades are in progress
Eye movement as input; difficult as people don’t have much control over where they are looking (e.g. accidentally looking at ‘delete all my files’ button)
Smooth-pursuit: ability to tracking moving objects (up to 100 degrees per second)
- Cannot be induced voluntarily - can’t imagine a moving dot and track it
- Relevant in scrolling

Size/depth cues:

Familiarity
Linear perspective; straight lines getting closer together
Horizontal distance
Size constancy: if object gets bigger/smaller, it’s probably the object moving closer/further away, not the object changing size
Texture gradient: texture getting bigger/smaller
Occlusion: occluded items further away
Depth of focus: blurrier the further you go away
Aerial perspective: blurrier and bluer from atmospheric haze
Shadows/shading
Stereoscopy (best within 1m, ineffective beyond 10m)

Muller-Lyer illusion:

 <->
>---<

Bottom one looks further away and is subtending the same angle, so brain perceives it as bigger.

3D, depth-based UIs:

The world is 3D so all interaction should be 3D, right?
Occlusion, far-away things being smaller, navigation/orientation etc. impedes usability unless the domain is 3D (e.g. gaming, 3D modelling)
Zooming is useful though
- Overview of the data first
- Zoom in to progressively add detail about what they are interested in and filter information they are not
- Allows UI to provide details on demand.

Color:

8% males, 0.4% females have some form of color-deficiency:
Types:
- Protanomaly: red
- Deuteranomaly: green
- Tritanomaly: blue
Least sensitive to blue

Reading:

Saccades, fixations (94% of the time), regression
Approx. 250 words/minute initially
READING SPEED REDUCED BY ALL CAPS

Auditory

Used dramatically less than vision
About 20 Hz to 15-20 kHz
Can adjust many parameters; amplitude, timbre, direction
Filtering capabilities (e.g. cocktail party effect)
Problems with signal interference and noise

Haptics

Proprioception: sense of limb location
Kinaesthesia: sense of limb movement, often more of a conscious decision
Tactition: skin sensations

Haptic feedback: any feedback providing experience of touch

Human Output

Motor response time depends on stimuli:

Visual: ~200 ms
Audio: ~150 ms
Haptics: 700 ms
Faster for combined signals

Muscle actions:

Isotonic: little resistant to movement (e.g. mouse)
Isometric: force but little motion (e.g. keyboard, ThinkPad TrackPoint™)
- Better for velocity/rate control (e.g. self-centering joysticks)

Fitts’ Law

A very reliable model of rapid, aimed human movement.

Predictive of tasks descriptive of devices
Derived from Shannon’s theory of capacity of information channels
- Signal: amplitude $A$ of movement (or $D$ for distance) (middle of target)
- Noise: width $W$ of target

Index of difficulty (ID) measures difficulty of rapid aimed movement:

\mathrm{ID} = log_2\left(\frac{A}{W} + 1\right)

Measured in ‘bits’.

Fitt’s law: movement time (MT) is linear with ID:

\begin{aligned} \mathrm{MT} &= a + b \cdot \mathrm{ID} \\ &= a + b \cdot log_2\left(\frac{A}{W} + 1\right) \end{aligned}

$1/b$ , the reciprocal of slope, is called throughput, or the bandwidth of the device in bits/second.

$a$ and $b$ are empirically determined. For a mouse:

$a$ is typically 200-500 ms
$b$ is typically 100-300 ms/bit

Typical velocity profile, validated for many types of aimed pointing:

Speed 
  ^
  |   Open-loop,
  |balistic impulse
  |    /\
  |   /  \   slow, closed-loop
  |  /    \    corrections
  | /      \  /\
  |/        \/  \/\___
  +------------------------>
           Time

Input Devices; Pointing & Scrolling

Human output is received as system input. There must be some sort of translation hardware to achieve this, which have many properties:

Direct vs indirect
- Touchscreens have perfect one-to-one correspondence
- Trackpads indirect: mouse movement does not directly map to cursor movement
Absolute vs relative
- Touchscreens, pen-tablets
- Trackpads (mostly) relative; finger location on the trackpad does not matter when moving the cursor
Control
- Position (zero-order) e.g. absolute pointing, dragging the scrollbar
- Rate (first-order) e.g. holding down on mouse wheel and dragging up/down on Windows
- Acceleration (second-order)
- Note: having lots of modes and hence complexity may decrease number of interactions while making the task take longer due to the overhead of making decisions
Isotonic: force with movement
Isometric: force without movement e.g. 3D touch to control object size
Control-display gain/transfer functions
- Magic sauce of iOS inertial scrolling, Mac trackpads, etc.

The control-display transfer function:

The input device (e.g. capacitive trackpad) sends device units
The gain function scales the input in accordance with the user or environment settings
Persistence is used to continue output even when there is no ongoing input, adding features such as inertia

                                 Transfer Function
        +-------------------------------------------------------------------------+
        |                                                     e.g. scroll inertia |
        | device   ---------------  display   --------          ---------------   |
Device -+--------> | Translation | ---------> | Gain |  ------> | Persistence | --+---> Output
Input   | units    ---------------   units    --------          ---------------   |
        |                ^                       ^                     ^          |
        |                ------- Environment/User Settings ------------           |
        +-------------------------------------------------------------------------+

Scrolling transfer function for iOS:

When the finger is on the screen there is direct mapping
After the finger leaves, the speed slowly decays
After four quick scroll gestures in quick succession, the scroll rate increases in an almost vertical line after the finger leaves
- For all subsequent scroll gestures, the maximum velocity (immediately after the finger leaves) increases until a maximum scroll velocity is reached

Input Devices: Text Input

Alternative keywords (e.g. Dvorak)
Chord keys (e.g. stenographers)
Constrained keyboards (e.g. T9 keyboards on old mobile phones)
Reactive/predictive systems (autocomplete)
Gestural input (e.g. swipe keyboards)
Hand-writing recognition

Input expressibility: how well can you discriminate inputs? e.g. Google Glass had a tiny capacitive surface; doing text entry on that posed challenges.

Steering Law

Model of continuously controlled ‘steering’: moving an item across a given path, called a ‘tunnel’:

$\mathrm{MT} = a + b \cdot \frac{A}{W}$

Where $A$ is the tunnel length and $W$ is the width. If the thickness varies, use the integral of the inverse of path width.

This is important in cascading context menus, where hovering over an item overs a submenu to the left or right. Done naïvely, while travelling to the newly-opened submenu, the cursor must always stay above the item or the submenu will disappear. macOS appears to take into account the angle of travel to determine if the submenu should be hidden or not.

Human Processing

Visual Search Time

If a person has to pick out a particular item out of $n$ randomly ordered items, the average time $T$ taken to find the item increases linearly: $T = a + b \frac{n + 1}{2}$ . However, pop-out effects where one item is visually distinct, reduces this to $O(1)$ . However, this requires the interface to predict what the user wants to select.

This is slow, so the UI should aim to reduce the amount of searching the user must do. To achieve this, ensure there is spatial stability; items appear in the same place every time.

Hick/Human Law of Decision Time

Choice reaction time when optimally prepared:

T = a + b \cdot H

Where $H$ is the ‘information entropy’; $\sum_i^n{p_i H_i}$

For item $i$ with probability $p_i$ of being selected:

H_i = \log_2 \left( \frac{1}{p_i}\right)

For $n$ equally probable items, $H = \log_2(n)$ .

Implications:

Decisions are fast - $O(\log(n))$
Applies to name retrieval (commands) and location retrieval
In GUIs, replace visual search ( $O(n)$ ) with decision through stable stability
- Don’t order commands by most recently/commonly used - forces user to visually search

Spatially Consistent User Interfaces

Pie menu: items are sectors making up a circle centered around the cursor (possibly with multiple layers of items through nesting):

Minimum of one pixel of cursor movement required for fast selection
Allows for easy advancement from visual search to muscle memory

Ribbon: spatial stability within each tab, but requires visual search and mechanical interactions to find a new item. ‘Solution’: show all tabs at once.

Search: macOS menu bar search does not run searched command, only show you where the item is located. Menu items also show the keyboard shortcut.

Torus pointing: wraps cursor around screen, gives multiple straight paths to an item. Giving users choice may help with Fitts’ law, but increase decision time.

Power Law of Practice

Performance rapidly speeds up with practice:

T_n = Cn^{-\alpha}

Where:

$T_n$ is the time taken for trial $n$
$C$ is the time taken on the first trial
$\alpha$ is the learning rate

This applies both to simple and complex tasks.

Novice to Expert Transitions

People use the same tools for years/decades, but often continue to use inefficient strategies.

Shortcut vocabularies are small and are used infrequently. Factors:

Satisficing; good enough
Lack of mnemonics (for keyboard shortcuts)
Lack of visibility

How do you support transitions to experts?

When switching between modes, there is a performance dip. Since people use software to do their jobs, not use software as their jobs, this causes a chasm that the user must take the time to cross.

^  Performance          Modality
│                        Switch
│                          |                       xxxxx
│                                          xxxxxxxxx
│                          |           xxxxx
│                                   xxxx
│                          |      xxx
│                                xx
│                          |    xx
│                Ultimate     xxx
│              xxxxxxxxxxxx| xx  ─┐
│        xxxxxx Performance  x    │ Performance
│    xxxx                  |x     │    Dip
│   xx  Extended            x     │
│  x  Learnability         |x    ─┘
│ xx
│ x                        |
│x
│x Initial                 |
│Performance
│                          |
│     First Modality             Second Modality
└──────────────────────────-────────────────────────────>
                            Time

Domains of Interface Performance Improvement

Intra-modal improvement
- Make the user an expert within the mode the user is comfortable working in
- e.g. guidance techniques where you show items the user is likely to use
Inter-modal improvement
- Make the user aware of faster ways of doing the task (e.g. file, print to ctrl P)
- e.g. skillometers
- e.g. AutoCAD shows the text command being used in the background when using UI buttons
Vocabulary extension
- e.g. track and show community command use to let users learn the most useful commands
Task strategy:
- Intelligent UI that picks up on the task the user is trying to do suggests more efficient sequences of commands to achieve this

Human Pattern of Behavior

Zipf’s Law: given a cohort of text, $n$ th most frequently occurring word appears with a probability of:

P_n \approx n^{-\alpha}

where $\alpha \approx 1$ .

Pareto Principle/80-20 Rule: 80% of usage is made up of only 20% of items.

The UI should attempt to surface these 20% of items.

Human Memory

                         maintenance
                          rehearsal
                           ┌────┐
                           │    │
                           │    ▼       elaborative
  Sensory Memory          Short-term     rehearsal    Long-term
  iconic, echoic,──────►    memory    ──────────────►  memory
  and haptic                  │       ◄──────────────
       │                      │          retrieval
       │                      │
       │                      │
       │                      │
       │                      │
       ▼                      ▼
   masking decay       displacement or
                      interference decay

Sensory memory: stimulation decays over a brief period of time; loud noises, bright lights, pain persists for some time after the stimulation disappears.

Short-Term Memory

Input from sensory or long term memory
Capacity of 7 ± 2 ‘chunks’/abstractions
Chunks aid storage and reconstruction
Fast access: ~70 ms
Rapid decay: ~200 ms
Constant update and interference
Maintenance rehearsal: e.g. repeating a number a few times in your mind

Long-Term memory

Input through elaborative rehearsal and extensive repetition
- Elaborative rehearsal: restructuring information instead of just mindlessly repeating it
Slow access: > 100 ms, sometimes days (tip of the tongue phenomenon)
Decay?
Good at recognition but bad at recall
Supports spatial processing

Human Error

Mistakes

Errors of conscious decisions; when they act according to their an incomplete/incorrect model.

Only detected with feedback.

Human Error: Slips

Errors of automatic and skilled behavior.

Capture error:

Two action sequences with common starting point(s)
Captured into the wrong (and usually more frequent) path
Used to be common e.g. in dialogue boxes with generic button labels (‘Cancel’ and ‘Ok’)

Description error:

More than one object allowing the same/similar action
Execute the right action on the wrong object
e.g. lighting panel with multiple switches

Data-driven error:

External data interfering with short-term memory
e.g. entering unrelated file name when saving a document

Loss-of-activation error:

Goal is displaced/decayed from short-term memory before it is completed
e.g. walking into room then forgetting why you entered
Want to complete task that requires subtasks and sub-subtasks to be completed, overflowing short-term memory

Mode error:

Right action in the wrong system state
Modes are system partitions with:
- Different set of commands
- Different interpretation of the same commands/actions
- Different display methods
Ensure modes are visible and noticeable
Modal dialogues are example of bad modes

Motor slip:

Pointing/steering/keying error

Premature closure error:

‘Dangling’ UI actions required after perceived goal completion
e.g. forgetting to save, attach attachments to emails

Human Phenomena

Homeostasis

People maintain equilibrium:

If a system makes something easier, people will use it to do more difficult things
If a system makes something safer, people will use it to do more dangerous things

Satisficing

People are satisfied with what they can do now and don’t bother to optimize:

People that ‘hunt-and-peck’ instead of learning to touch type
People that don’t bother to learn keyboard shortcuts for tasks they do frequently

Hawthorne Effect

The act of measuring changes results (Heisenburg uncertainty principle of HCI).

People like being involved in experiments and change their behavior during experiments, complicating results.

Explaining Away Errors

Blaming the user is often easiest party to blame, but the user may have the mistake because the interface is designed poorly.

Peak-End Effects

Peak effect: people’s memories of experiences are influenced by the peak/most intense moments of an experience (e.g. combos attacks in games, casino games).

End effect: people’s memories of experiences are predominantly influenced by the terminating moments (e.g. good vaccation ruined by missed flight home, survey with many questions on the last page).

Negativity Bias

Magnitude of sensation with loss greater than the same amount of gain: bad is stronger than good.

e.g. single coin toss, win $110 on heads but lose $100 on tails, autocorrect ‘correcting’ a correct word feels much worse than how good it feels when it corrects a mis-spelt word.

Communication Convergence

Similarity with pace, gestures, phrases, etc. enhances communication. Could interfaces measuring (e.g. long press duration, mouse speed) and matching (e.g. animation speed, timeout, speech rate) help?

02. Interface Design

Design -> Implementation -> Evaluation -> Design -> …

Design Process

Saul Greenberg

Articulate

Articulate:

Who the users are
Their key tasks

Then design:

Task-centred system design
Participatory design
User-centred design

This should lead to user and task descriptions.

Then, evaluate the tasks and repeat the process, refining goals.

Brainstorm Designs

When designing, consider:

The psychology of everyday thing
User involvement
Representation and metaphors

Create low-fidelity prototyping methods.

Then, create throw-away paper prototypes.

NB: ‘prototype’ has multiple meanings, one of which implies executability.

Evaluate the designs:

With respect to the tasks identified
Participant interaction: get users involved
Task scenario walk-through: in order to do X, Mary will press this button …

Repeat steps if required, further brainstorming more designs.

A reviewer should be able to unambiguously understand how the interface operates and works.

Refined Designs

Create:

Graphical screen design
Interface guidelines
Style guides

Then use high-fidelity prototype methods and create testable prototypes.

Use usability testing and heuristic evaluation to further refine design if required.

Completed designs

Create alpha/beta systems or complete specifications. Do field testing if necessary.

Iterative Design

Iteratively refine design based on evaluative feedback.

A common mistake is to get an idea and hill climb on that single idea. Leads to:

Tunnel vision
Premature commitment
Local maxima
Stops early bad decisions from being fixed

Elaborative/Reduction Tension

Elaboration: get the Right Design; explore the full space of possible designs.

Reduction: get the Design Right; polish the solution. This may be done on the best solutions simultaneously.

The Design Funnel

                                          _______
                                   ------/
-----                   /---------/    Sales
---\ \___|--------------------\
    ---   Management/Marketing ----------\_______
       \_________________________________________
Design         ----------------------------------
       /------/                 |----------------
    --- /|              --------|
---/  -- | Engineering /
-----/   |----/-------/

Supporting Rapid Iterations

Fudd’s first law of creativity: to get a good idea, get lots of ideas.

Lots of ideas take lots of time to build/test, so we need rapid creation, evaluation and prototyping.

Prototyping

After user/task identification, prototyping can occur.

Low-fidelity paper prototypes (elaboration):

Brainstorm different representations
Choose a presentation
Rough out interface style
Task-centred walk-through and redesign

Medium-fidelity prototypes (reduction):

Fine-tune interface, screen design
Heuristic evaluation and redesign

High-fidelity prototypes/restricted systems:

Usability testing and redesign

Working systems:

Limited field testing
Alpha/beta testing

Low-Fidelity Prototypes: Sketches

Outward appearance and structure of intended design.

Necessarily crude and scruffy:

Focus on high-level concepts
Fast to develop
Fast to change
Low change resistance; you only put in a few minutes of effort
Delays commitment

Use annotations/sequences to show UI progression.

Cross reference with other zoomed in/out sketches.

Sequential sketches: show state transitions; what interaction causes the state change?

Focus on the main transactions (Zipf’s law) - clearly convey how the user achieves the 20% of the most frequent interactions.

Medium-Fidelity Prototypes: Wizard of Oz

Have a person emulate the functionality.

IBM speech editor (1984): user would give audible commands to edit a text document, which the wizard implement. This gave IBM a good understanding of the user experience, allowing them to see if the idea was any good without investing a large amount of effort into actually implementing it.

Walk-through evaluation:

Facilitator gives the user tasks and prompts them for their thoughts
User looks at current system state
Component updates system state following some pre-determined algorithm
- All UI states/components must be sketched/printed out
Observer takes notes

Refinement (e.g. PowerPoint):

Facilitates motion paths
Links between states etc.
Many wireframing tools available (eg. moqups, blsamiq, axure)

Precise medium-fidelity prototypes:

For very small but important portions of the UI
- e.g. slide to unlock animations etc.

Photo traces:

If you suck at sketching
Take a photo, trace it out; captures the essence of the interaction without the exact representation

Simulations and animations:

Works well for second round evaluation
Horizontal prototype: surface-layer/sketch prototype of entire range of functionality
Vertical prototype: much of the functionality for a small set of features
Scenario: intersection of horizontal and vertical prototypes
Beware of:
- Inflated expectations - perception of it being ‘nearly completed’
- Reluctance to change - the more it looks finished, the less willing stakeholders may be to recommend changes
- Excessive focus on presentation rather than approach

Task-Centered System Design (TCSD)

TCSD is the HCI equivalent of requirements analysis/use cases.

It asks exactly and specifically who are the users and what will they use the system for? There is a critical difference between:

The User - a pretend person who will adapt to the system and go on a two week training session to live with the designer’s pet system
A real, busy person doing their job

TCSD acts as a reality-based sanity check for designers.

Good book on system design: Task-Centered User Interface Design by Clayton Lewis and John Rieman.

How NOT to approach design:

Focus on system and designer needs
Ask what can we easily build
Ask what is possible/easy with the tools we know/have?
Ask the programmer what they find interesting?

UC SMS

UC’s student management system (from the mid 2000s) was a multi-million dollar, unusable disaster.

Example task: Andy is teaching COSC225 next semester; he wants to know how many students are enrolled to see how many printouts he needs. To achieve this:

Click on ‘Navigate’ button in the toolbar; opens ‘System Navigator’ window
Expand ‘Searches’ menu (hierarchical menu system)
Click on ‘Course Occurrence Search’; opens new window
Enter course code, hit return
Select the right occurrence
A window with a huge mess of text fields (mostly disabled) and 13 tabs opens
…

The company that delivered it had a system that was similar to what UC needed; they did what was easy, not what the end user needed.

TCSD Phase 1: User Identification

Identify categories of end-users with specific exemplars - typical and extremes.

Talk to them!

If they won’t give you the time to talk to you, they probably won’t use your system either
If they really don’t exist (no existing system):
- Worry
- Describe your assumed users and tasks
Learn about people in the task chain: who do inputs come from, where do outputs go?
- Why does the user need to do this? What do they do with the information?

TCSD Phase 2: Task Identification

Record what the user wants to do, minimizing the description of how they do it
- No interface assumptions; tasks are independent of the interface they will use to complete it
- Can be used to compare alternative designs
- Don’t write ‘display ${something} to the user’, write ‘do ${something}’: the user wants to get information about something; the system displaying it is just a way they can do it
Record the complete task: input source, output identification
Identify users
- Design success depends on what users know
- Test against specific individuals; name names
Uniquely enumerate tasks for identification
- Giving tasks a unique identifier helps with communicating problematic tasks with the team
Identified tasks can be circulated for validations
- Interview the users with the tasks you identified; they can help spot omissions, corrections, clarifications and unrealistic tasks
Identify broad coverage of users and tasks
- Create matrix with the axes of unimportant/important and infrequent/frequent tasks/users

Example: John Smith arrives at student services trying to enrol in a course online, but refused as he lacked a pre-requisite course. He has a letter from the HoD allowing him to enrol. He has forgotten his ID card and cannot remember his student ID or user code (<- this is an interface assumption; does the system have IDs or user codes?).

TCSD Phase 1/2 Outcomes

A report should state:

User categories (and their priorities)
Specific personas exemplifying each category
Task categories and priorities
Concrete representative task scenarios (with name of the owner)
- Enumerated with unique identifiers for use in UI validation
Explicit identification of groups/tasks that will not be supported and reasons for this

TCSD Phase 3: Design

Use task categories/scenarios to generate and evaluate designs.

Strive to make the workflow natural to the user. For each design and task scenario ask how the user would complete the task.

TCSD Phase 4: Walk-through Evaluation

Interface design debugging: select a task scenario and for each step:

Ask what the user would do given what they know
Ask if the task is believable
If not, it is an interface bug. Record it and assume it is fixed when going through the next steps

Cautions on TCSD

It is hard to record and identify task scenarios that are independent of the interface
The more the interface and task are interlinked, the more difficult it is to identify alternative/better ways of achieving the task
It can be hard to find people ‘responsible’ for new tasks in a system: who do you interview, how do you validate the interface?

User-Centred System Design

Know the user: design should be based around user needs, abilities, context, tasks etc. and should be involved in all stages of design: requirements, analysis, storyboards, prototypes etc.

UCSD/Participative Design: Involving the User

Talk to users:

Interview them about culture, requirements, expectations
Contextual inquiry: observe them doing their job; a few hours of observations can give a lot of insight
Explain designs: get input at all stages, show visual prototypes and demos
Walk-throughs: the user knows what they will do the best

UCSD: Participatory Design

Problem:

Designers’ intuitions can be wrong
Interviews lack precision/context and can mislead
Designers cannot know user needs well enough to answer all questions that are likely to arise during design

Solution:

Designers having access to a pool of representative end users: not management; real users
These users are full members of the design process

The users:

Are excellent at responding to suggested designs (they must be concrete and visible)
Bring in important knowledge of work context that only someone that has lived in the role can learn
Will often have greater buy-in into the system

However:

It is difficult (and expensive) to get a good pool of representative end users - you are taking people out of their regular jobs
They are not expert designers - they probably won’t be able to come up with design ideas from scratch with an understanding of the constraints of the technology, budget, time etc.
The user is not always right - they may not know what they want

Erskine: member of Math/COSC departments became members of the design and judging team, gave suggestions to architects etc. (e.g. less glass - too much glare). When finished the staff had buy-in into the building; it was their building, not one built by management.

Usability Heuristics

AKA User-Interface Guidelines, Style Guides.

Usability heuristics:

Encapsulates best practices and ‘rules of thumb’
Identify common pitfalls
Define simple ‘thinking hats’ - specific areas (e.g. memory load) to evaluate the interface

Formative heuristics guide design decisions while summative heuristics evaluate existing systems.

Advantages:

Minimalist: easy remembered and applied, with just a few guidelines covering most problems
Cost: cheap and fast, and can be done by novices (e.g. end users)

Disadvantages:

Heuristics can be broad, redundant and obvious
Some subtleties in their application

Nielsen’s Ten

The original set defined Jakob Nielsen’s Usability Engineering:

01. Simple and Natural Dialogue

Manage complexity: make it as simple as possible, but no simpler (match the complexity of the domain).

Organization of the interface: make the presentation (appearance of each state) and navigation (between states) simple and natural.

Graphic design: organize, economize, communicate.

Use windows frugally - fewer windows are almost invariably better.

See: Google vs Yahoo search page, iPhone vs feature phones

02. Speak the User’s Language

Affordances, mappings and metaphors.

Terminology (words, colors, graphics, animations etc.) should be based on the user’s task language (and not based on system internals).

e.g. error messages should be useful to the user, not just the programmer/designer.

‘Language’ is textual and iconic (e.g. ‘Save’ (natural language) can be Ctrl-S , floppy disk icon).

03. Minimize The User’s Memory Load

Recall is slow and fragile; use recognition wherever possible:

In font menus, show the font name using that font
Show input formats and provide defaults
- e.g. date inputs with defaults tell you the format the date should be in
Support reuse and re-visitation
- e.g. browsers show commonly visited pages in omni-bar
Support exchange of units - don’t force the user to do unit conversion themselves
Support generalization techniques:
- The same command should be able to be applied to all objects (e.g. cut/copy/paste on characters, text boxes)
- The same method/modifier being generalized (e.g. circles are constrained ellipses, squares constrained rectangles)

04. Consistency

Consistency everywhere:

Graphic design
Command structure (e.g. always select object then command to act on it)
Internally (within the application)
Externally (within the platform)
Beyond computing (e.g. red for stop, green for go)

05. Feedback

Continually inform the user about what the system is doing and the system’s interpretation of their input.

e.g. in PS, cursor icon matches selected tool

The feedback should:

Be specific (e.g. name of file being opened/saved)
Consider the context of the action - only disrupt the user when necessary
- e.g. save progress bar at the bottom of the window
Consider feed-forward: show the effect of the action before they commit to it
- e.g. in Word, on hover over font, update selected text with that font (although this particular case was distracting)
Offer choices based on partial task completion
- e.g. autocomplete
- This should be relatively stable and predictable, allowing the user to act on muscle memory rather than reading

Response times:

< 0.1s: perceived as instantaneous
< 1s: delay noticed, flow of thought uninterrupted
10s: limit for keeping attention on the dialogue
- 1-5s: e.g. spinning cursor
- > 5s: percentage
  - If just guessing progress, prefer speed up near the end rather than a slow down
- ‘Working’ dialogues for unknown delays (e.g. throbbers)
> 10s: user will want to perform other tasks and may have lost their train of thought

Consider feedback persistence: how heavy/disruptive and enduring should it be?

06. Clearly Marked Exits

Avoid trapping the user; offer a way out whenever possible:

Cancel button
Universal undo (return to previous state)
Interrupt (mostly for longer operations)
Higher precedence for more recent actions - if user does one action then another action that overrides the previous one, fulfil the latter action
Quit
Defaults (e.g. losing form data after submitting with one bad field)

e.g. ‘Do you want to save the changes made to ${}’: Don’t Save, Cancel, Save (don’t just use ‘yes’/‘no’/‘cancel’)

Windows 10 volume control: area around the volume bar is untouchable for a few seconds Also placed in the top left corner where a lot of important user elements are.

07. Shortcuts

Enable high performance for experienced users:

Keyboard accelerators
Command completion
Function keys
Double clicking (shortcut for some menu item)
Type-ahead (offer most likely prediction)
Gestures
History (repeat actions done by the user previously)
Customizable toolbars

08. Prevent Errors and Avoid Modes

People will make errors:

Mistakes: conscious deliberation leading to incorrect action (bad mental model)
Slips: unconscious behavior that gets misdirected (or mis-click/typo)

General rules:

Prevent slips before they occur (e.g. syntactic correctness, disable items that can’t currently be used)
Feedback: allow slips to be detected when they occur
Support easy correction (e.g. universal undo)
Commensurate effort: difficult states (e.g. document with of unsaved work) should be hard to irreversibly leave (e.g. warning dialog box)

Forcing functions (syntactic correctness):

Prevent continuation of a wrong action

Warnings:

Can be irritating when overused
Can be ‘heavy’ (e.g. alert box)
Make them subtle unless there is a really good reason for it to be heavy

Ignore illegal actions:

Not great as user must infer what happened
e.g. typing alphabetical character in number input

Mode errors:

Have as few modes as possible
- Distinct states of the system where the commands available to the user are different or where the commands produce different results
Allow user to easily determine current mode
Spring-loaded modes: ongoing action maintains mode
- e.g. user must hold down control key to stay in a mode
- Good solution to people forgetting they are not in the default mode

Bad behavior:

UCSMS Web student search has radio buttons for search via username and student number - two different modes because there is clearly no way for the program to determine the input is a 8 digit student number or a username beginning with alphabetical characters.

Good example:

World used to ‘unnatural’ scrolling direction, iPhone’s rubber-banding acted as feedback when the user accidentally scrolled down from the top of a list
User can swipe between photos and also drag when zoomed into a photo - what should the behavior be when swiping to the edge of a photo when zoomed in?

Possible solutions:

Self-correct/auto-correct
- Requires trust in the system
- Negativity bias - incorrectly correcting correct input is far worse than not correcting incorrect input
Auto-suggest
- Dialog that allows the user to fix an issue
- e.g. Squiggly line under mis-spelt text
User instructs system
- System asks if the input was intended
- e.g. add to dictionary
System instructs user
- System guesses user intentions and instructs user on the proper way to achieve it
- e.g. Clippy! - condescending, wrong, tedious, boring

09. Deal with Errors in a Positive and Helpful Manner

Error messages should:

Use clear language, not codes
Be precise - rephrase user input (e.g. cannot open ${document name} because ${it is not a supported file})
Be constructive - suggest and offer solutions where possible

10. Help and Documentation

Documentation and manuals:

Documentation is no excuse for interface hacks
Write the manual before the system
Task-centred manuals (especially for beginners)
Quick reference cards as a reference to aid novice to expert transition

Tutorials:

Short introductory guides and overviews
Video walk-throughs
Simple task walk-throughs

Reminders:

Tooltips
Short reference cards

Wizards:

Walk user through typical tasks
Don’t overuse - system in control
Dangerous if the user gets stuck

Māori Issues and User Interface Design

Te Taka Keegan - University of Waikato

Usability principles

Shneiderman, Nielson have a few relevant usability principles:

Strive for universal usability
Match between system and the real world
Recognition, not recall

Know your audience:

Ethnicity: group people who identify with each other by genealogy, language/dialect, history, society, culture etc.
How is the Māori world view different from the Pākehā perspective? A few important values:
- Manaakitanga: showing respect, care for others
- Whanaungatanga: building up relationships/kinship/closeness (e.g. introductions include mountain/rivers as a point of similarity and a way to bond with each other)
- Tiakitanga: looking after the world and each other
- Rangatiratanga: acknowledging/respecting chieftainship
- Aroha: love
Language:
- ‘Soft’ - every syllable ends with vowel
- 10 vowels: a/e/i/o/u/ā/ē/ī/ō/ū (short and long/accented with macron)
  - ‘au’ sounds more like an ‘o’
- 10 consonants: h/k/m/n/ng/p/r/t/w/wh (wh sound differs with dialect)
Vowel length important for pronunciation and meaning

Something is usable if person of average ability and experience accomplish the task without more trouble than it’s worth.

Default languages are important
Interface language affects software usage patterns
Lack of vocabulary is barrier to using Māori interfaces
Māori has long words which can cause UI issues
When using Māori imagery, get feedback and ensure it is appropriate

Inspection Methods

Systematic inspection of a user interface. It:

Attempts to find usability methods
Works at any stage in the design process
Most commonly heuristic evaluation, where 3-5 evaluators inspect the system

Heuristic Evaluation

Each inspector initially works alone. They traverse the interface several times with a specific scenario/task in mind and:

Inspect UI components and workflow
- Flow between UI states
Compare them with heuristics
- Find non-compliance/problems
Add notes, magnitude of problems, frequency

It often uses a two pass approach, focusing on specific UI elements/states in the first pass while the second focuses on integration and flows between states.

Results Synthesis

After each inspector does their individual evaluation, the inspectors come together and assess the overlap in problems they found.

Severity rankings can be reviewed and compared, and problems ranked in order of importance.

Severity: (small impact on those encountering it, few users) = low severity. (large impact, many users) = high severity.

Inspectors

Different perspectives will catch different problems so the inspector team should be diverse.

Example:

Developer
Designer
- Beware of vested interest in their design
Usability expert
Domain expert
User

All inspectors should be trained in Nielson’s heuristics.

Nielson claims 3 inspectors should be able to find ~60% of problems, 5 ~70%.

Graphical Screen Design

Gestlat Laws of Perceptual Organization

How humans detect visual groups/relationships/patterns:

Proximity
Similarity
- In shape, color etc.
Continuity
- Dots placed along a curve: brain sees the curve as a object
Symmetry
- Objects seen as closed when placed in symmetric boundaries
Closure
- Brains automatically attempt to ‘close’ objects e.g. semi-circle

Smooth continuity (e.g. smooth curves vs straight lines with right angles) easier to perceive, but less ‘neat’.

PARC Principles

From The Non-Designer’s Design Book by Robin Williams.

PARC:

Proximity
- Group related elements
- Separate unrelated elements
  - Instead of putting ugly borders around two groups, separate them!
Alignment
- Visually connect elements to create a visual flow
  - This is why grids are useful
- And mis-align unconnected elements (use with caution)
Repetition
- Repeat design aspects (e.g. font, color, shape) throughout the interface for unity/consistency
Contrast
- Different things should look different
- Bring out dominant elements, mute lesser ones

Misc

Grids:

Use horizontal and vertical alignment to group related components
Make minimal use of explicit structure (i.e. borders and boxes)

Navigational Cues:

Provide an initial focus (top left for western cultures)
Group related items
Visual flow should follow logical flow

Economy of visual elements:

Minimize the number of controls
- Include only those necessary and relegate others
Minimize clutter
Experiment with whitespace
- e.g. headings/labels above or to the left

03. User Interface Evaluation

Designers have complete and comprehensive knowledge of their interface and hence are uniquely unqualified to assess usability.

This makes them blind to the mismatch between the user and designer models. In order to find these, it is important to record realistic interactions; simple observation is insufficient.

Designers must mistrust their interfaces; what is difficult for a user may be obvious to them.

“Think Aloud” Evaluation

Prompt subjects to verbalize their thoughts as they work through the system:

What they are trying to do
Why they did the action they did
How they interpret feedback from the systems

It is hard to talk and concentrate on the task at the same time - you may get a lot of incomprehensible mumbling so the facilitator must ensure they give good and continual prompts to the user.

Apart from the prompts, it should be one-way communication from the subject - otherwise, you will pollute the user’s model.

It is also likely to be very uncomfortable, unpleasant and difficult for the subjects - do your best to make them comfortable.

Cooperative Evaluation

A variation of “think aloud”. In “think aloud”, it feels as if the user is being studied while with cooperative evaluation, two subjects study the system together (with natural two-way communication).

Sometimes, one of the subjects is a confederate - someone involved with the system.

The two subjects work together to solve the problem. It is more comfortable to the subjects and comments about failures of the system emerge much more naturally.

Interviews

The more obvious the technique appears, the less preparation designers intuitively think they need to put into it: designing good interviews (and questionnaires) is difficult and are expensive in terms of time for both the designers and users.

Interviews are:

Good for probing particular issues
Can lead to constructive suggestions
Prone to post-hoc rationalization

Plan a central set of questions in for consistency between interviewees and to focus the interview, but still be willing to explore interesting leads.

Questionnaires

Expensive to prepare but cheap to administer - evaluator not required.

NB: ~20% response rate.

Questionnaires can give quantitative (e.g. 30% of users xyz) and qualitative (why did you like x). Question types:

Open-ended comments give important insights
Closed questions restrict responses and give qualitative data - make sure there is no ambiguity in the options
Likert items: level of agreement with a statement
Ranked choice questions are good for forcing comparisons
- e.g. ‘Was A better than B?’ is preferred over 'How much did you like ‘A’ and 'How much did you like ‘B’ asked together; comparison on the latter often contains a lot of noise

Questionnaires are over-determined user interfaces - a badly-designed question may ‘box in’ the user. Hence, when designing questions:

What purpose does the question serve? What information are you hoping to get?
Know how will you analyze the results
For each quantitative question, consider adding a qualitative one asking why they picked the result
Iterate
Know the dissemination method

Continuous Evaluation

Monitoring actual system use:

Field studies
- Design team goes to users and see if they use the system as you expected
Diary studies
- Users write out a few lines describing their experience with the system over the last few hours
Logging and ‘Customer Experience Programs’
- LOG EVERYTHING!
  - Exploratory questions: hope something interesting shows up
  - Difficult to analyze
  - Aside: in controlled experiments, log everything (until the point at which it slows down the UI)
- Targeted data collection
  - How often are specific features used?
  - Characterize their activities
User feedback and gripe lines

Crowd-Sourced Experiments

Mechanical turk et al.:

Workers complete ‘Human intelligence tasks’
- They have a HIT approval rating that can be used for filtering
Problems with noisy data and criteria for exclusion
Include ‘attention check’ questions
- A significant proportion of ‘workers’ are bots
Great with COVID - can’t do face-to-face studies

Formal Empirical Evaluation

When you want to see how a small number of competing solutions perform.

This requires strict, statistically testable hypotheses: better/worse or no evidence/difference.

Measure the participants’ response to manipulation of experimental conditions.

The results should be repeatable - the experimental methods must be defined rigorously, but are also time-consuming and expensive.

Ethics

Testing can be distressing.

As an experimenter you care about overall, not individual results, but if a subject makes a mistake, it can make them feel embarrassed and inadequate, especially if there are other other subjects that can see what they are doing.

Treat subjects with respect; at the very least, ensure the experience is not negative.

Before the test:

Don’t waste their time; use pilots to debug experiments/questionnaires and ensure everything is ready when they arrive
Make them comfortable
- Emphasize the system, not the user is being tested
- Let them know they can stop at any time
Privacy: let them know individual test results will be confidential
Inform: explain what is being monitored and answer their questions
Only use volunteers: informed consent form required

During the test:

Make them comfortable
- Relaxed atmosphere
- Never indicate displeasure with the subject’s performance
- Avoid disruptions
- Stop the test if it becomes too unpleasant
Privacy: do not allow management to observe the test

Controlled Experiments

Characteristics:

Lucid and testable hypothesis
- Know exactly why you are conducting it and what data you are hoping to get out of it to expose the success/failure of the hypothesis
Quantitative measurements
Measure of confidence in results (statistics)
- Is A > B, A < B or is there no discernable results
- Does the experiment successfully discriminate between outcomes?
Replicability
Control of variables and conditions
Removal of experimenter bias; ensure it is objective

Research Questions

Congratulations! You have invented ABC. Now you need a research question/hypothesis:

~~Lets do a user study of ABC because it’s required for my PhD~~
~~Is ABC any good?~~
~~Does ABC beat the competition?~~
~~Is ABC faster than the competition?~~
Is ABC faster than XYZ after 10 minutes of use?
Is ABC faster and less error prone than XYZ after 10 minutes of use?

Most research questions are comparative:

Is it faster, more accurate, preferred etc. (in relation to the baseline(s))
Is there a difference when compared to the baseline?
- How big is the difference (and is it a practical difference)?
- How likely is it that the results were due to chance

Null Hypothesis Significance Testing (NHST):

Widely used set of techniques for dichotomous testing
The hypothesis should be expressed as a negative (e.g. XYZ is not faster than the baseline, ABC)
- $H_0: \mu_1 = \mu_2$ (average performance of 1 and 2 are the same)
Reject null hypothesis, $H_0$ , when $P(D|H_0) < \alpha$
- Given that the null hypothesis is true, the probability of observing data as extreme as what we saw should be very low ( $\alpha$ )
- $\alpha$ is usually $0.05$
Failure to reject does not mean that ‘they are the same’
- It could be that they are the same or that the experiment was not sensitive enough (e.g. too few participants)
- Reject or fail to reject; never accept the null hypothesis

\frac{\hat{\mu}}{\hat{\sigma}/\sqrt{n}}

Where:

$\hat{\mu}$ is the signal: the magnitude of difference between the means
$\hat{\sigma}$ is the standard deviation
$n$ is the number of data points

We want to increase the signal-to-noise ratio, so we need to reduce the denominator:

Reduce $\hat{\sigma}$
- Better training:
  - There will be a large amount of variance in the first few trials (power law of learning)
  - If you only care about performance in proficient users, the first few trials are just noise
  - Hence, more training will get the participants out of this area, reducing noise
- Outlier removal
- Log transformation
Increase $n$
- Diminishing returns due to square root

Aside - the ‘file drawer’ effect:

‘Unsuccessful’ experiments - those that fail to reject the null hypothesis, are ‘uninteresting’ and tend to go unpublished
Survivorship bias: 19 studies correctly failing to reject null hypothesis go unpublished while one study that (by chance) that claims a significant effect gets published
- Do enough experiments, some will get lucky
- https://xkcd.com/882
- https://xkcd.com/1478

Internal vs external validity:

External validity:
- Broad truth of the interface: if people used ABC, would the world be better?
- Findings are broad/real (e.g. is ABC any good?)
- Makes the world better
Internal validity:
- Precise and replicable, but gets away from the fundamental truth we are trying to get too
- Findings are valid under specific circumstances that may not reflect real world usage by the general population
  - e.g. valid for undergraduate psychology students at UC

Using multiple experiments, some with high internal validity and others with high external validity, can be used to overcome the shortcomings of both.

Be careful in generalizing conclusions:

ABC was better than XYZ; ensure you identify the right cause for the improvement
When generalizing, identify the human factor underlying the difference and rephase the research question around the human factor
e.g. list of bookmarks vs 3D, spatial layout where all the items were shown at once
- Can’t conclude that 3D is better than 2D; would need to compare against a 2D, spatial layout rather than a list

Point analysis versus depth/theory/model:

Identify and include salient secondary factors: is the result generally true or only true under the tested conditions?

Experimental Terminology

Independent variables:

Controlled conditions
Manipulated independent of behavior
May arise from participant classification
- e.g. male/female
Discrete values: independent variable levels
Called ‘Factors’ in ANOVA

Dependent variables:

Measured variables
Dependent on participant’s response to manipulation of IVs

Within vs. between subjects:

How IVs are administered between/within subjects
Within subjects: each participant tested on all levels
- Use this whenever you can
- Participants act as a control for their own variability (some people just fast, some people just slow)
  - Can measure relative performance for each subject
- Fewer participants required
- But need to account for learning/fatigue effects
- Every participant must be tested on every single level; their data must be thrown out otherwise
Between subjects: each participant tested on a single level
- Sometimes necessary if using participant classification (e.g. male/female)
- Unmoderated variability
Don’t mix within and between subject treatments within a single factor

Counterbalancing:

When using within-subjects, need to control order of exposure to control for learning/fatigue effects
Participants divided into groups; different order for each group
- Group becomes a between subjects factor

                      Tiny
Population --------> Sample
                    + noise
  ^                     |
  | Inference           |
  | about the           |
  | population          |
  |                     |
   ---- Statistics <-----

Data Analysis

T-Test

Determines if two samples are likely to be from different populations?

Paired T-Test (within subjects): each participant is tested under both conditions.

Unpaired T-Test (between subjects): independent samples; each participant is only tested under one condition.

data <- read.table('filename', header=TRUE)

t.test(data$conditionA, data$conditionB, paired=TRUE|FALSE)
# If paired=TRUE, values on each row must belong to the same participant

# t-ratio: signal to noise. The bigger (the absolute value), the better
# p-value: can reject null hypothesis if p is less than $\alpha = 0.05$

Lots of additional information available through pairing, dramatically increasing sensitivity: t-ratio will usually be much larger and p-value smaller.

Correlation: Relating Datasets

Determining the strength of the relationship between variables (e.g. is typing and pointing speed correlated?).

Many different models available (e.g. linear, power, exponential), but always look at the graph to see if the model fits.

Common models:

Pearson’s $r$ (for linear correlation)
- Correlation coefficient between -1 and 1
- Cohen’s rule of thumb:
  - 0.1 - 0.3 is ‘small’
  - 0.3 - 0.5 is ‘medium’
  - 0.5 - 1.0 is ‘large’
Spearman’s $\rho$ (for ranked data)

Remember that correlation does not mean causation.

Regression: Relating Datasets

Predicting one value from another.

Line of best fit:

Linear
$R^2$ : coefficient of determination
- (same $r$ as Pearson’s, but upper case for some reason)
Between 0 and 1
Proportion of variability explained by the model
A value of 0.8 or larger is good for human performance
- Fitts’ law experiments usually give values around 0.95

Analysis of Variance (ANOVA)

T-tests allow us to compare between two samples with different values for an independent variable. But what about if the independent variable (factor) can take on more than two values?

We could simply exhaustively compare all pairs, but if the IV can take on $n$ values, there will be $\frac{n(n - 1)}{2}$ comparisons. Each comparison may find a statistically significant difference by chance (Type I error), so as $n$ increases, the chance of falsely finding at least one statistically significant difference between pairs increases quadratically.

ANOVA supports factors with more than two levels of a factor and handle multiple factors, while reducing the risk of incorrectly rejecting the null hypothesis by asking if all conditions are from the same population: $H_0: \mu_1 = \mu_2 = \dots = \mu_n$ . Invert this to see if at least one condition is different.

If there is only one factor (independent variable), it is called one way ANOVA. Factors can be either within or between subjects (although you cannot do both within a factor).

COSC368 Exam Notes

Pillars of usability:

Learnability: users rapidly attain some level of performance
Efficiency: users can get a lot of work done per unit time
- Often, more learnable = lower efficiency ceiling
Subjective satisfaction: users enjoy using the software

HCI should aim for simplicity; aim to make the UI match the complexity of the domain.

Don Norman’s Model of Interaction:

               constructs
   Designer/ -------------> System/system image
designer model                ^
                    Provides  | Provides input based on
                    feedback/ | their prediction of how
                     output   |  to achieve their goal 
                              v
                             User/
                          user model

Designer model:

Conception of how the interface works
May be fuzzy and not fully defined
May be compromised in the actual system

User model:

User’s conception of how the system works
Initially based on previous experiences with similar systems
Grows with use and feedback from the system

System Image:

How the system appears that it should be used to the user
The system is the actual hardware or the software

Execute-Evaluate Cycle:

Execute:
- User has goal and forms an intention to complete the goal
- Intention translated to multiple actions in the language of the user interface
- The user executes the actions
- Gulf of Execution: problems executing intention/action
Evaluate:
- User perceives response from the system
- User interprets the response
- User evaluates the response with respect to their goal and expectations
- Gulf of Evaluation: problems assessing the system’s state, determining its effect

UISO:

Task is a low-level task (e.g. save file as PDF)
Articulation: user task language -> system input language
Performance: system acting on the user input
Presentation: system updates it’s state (and visible state)
Observation: user views and interprets the new visible state

Mappings:

Affordances:
- How it looks and how it works are similar
- e.g. door handles affords pulling, plate affords pushing
Over/Under-determined dialogues:
- Under-determined: gulf of execution (e.g. CLI)
- Over-determined: forced through lengthy, unnatural or unnecessary steps (e.g. wizards)
Direct Manipulation
- Rapid, incremental, reversible actions
  - Encourage exploration
- Syntactic correctness: disable illegal actions
- Fast to learn but not always the most efficient
- Requires more screen space and system resources

Humans Input:

Eyes:
- Sensitive to movement
- Fixations: when eye stationary
- Saccades: rapid eye movements; blind
- Smooth-pursuit: tracking moving objects
- Reading speed reduced by all caps
Auditory:
- 20 Hz to 15-20 kHz
- Filtering (e.g. cocktail party effect)
Haptics:
- Proprioception: sense of limb location (mostly unconscious)
- Kinaesthesia: sense of limb movement
- Tactition: skin sensations

Human output:

Response time: ~200 ms for visual, ~150 for auditory, ~700 for haptics
Isotonic: input that movement (e.g. moving mouse)
Isometric: input through force (e.g. keyboard)

Fitt’s Law

$A$ is amplitude/distance of movement
$W$ is width of target
Index of Difficulty $\mathrm{ID} = \log_2\left(\frac{A}{W} + 1\right)$
Movement Time $\mathrm{MT} = a + b \cdot \mathrm{ID}$
- $1/b$ called throughput or bandwidth

Steering Law:

Steering a mouse cursor across a path/tunnel of width $W$ and length $A$
$\mathrm{MT} = a + b \cdot \frac{A}{W}$

Hick/Hyman Law of Decision Time:

Visual search time usually $T = a + b\frac{n + 1}{2}$ - that is, $O(n)$
Hick/Hyman models reaction time when optimally prepared (i.e. expert with a spatially stable UI)
$T = a + bH$
- For $n$ equally probable items, $H = \log_2\left(n\right)$
- To pick item $i$ with probability $p_i$ : $H_i = \log_2\left(\frac{1}{p_i}\right)$
- Average time: $H = \sum_i^n{p_i \log_2 \left(\frac{1}{p_i}\right)}$

Power Law of Practice:

$\text{Trial}_n = \text{Trial}_0 \cdot n^{-\alpha}$
$\alpha$ is learning curve
Applies to both simple and complex tasks

Novice to Expert:

Stagnation at some point:
- Satisficing: good enough
  - And performance dip when switching to a new mode
- Lack of mnemonics
- Lack of visibility
Supporting transitions:
- Intra-modal: guidance to help user move towards the ceiling of performance within a mode
- Inter-modal: make user aware of existence of different, faster modes
- Vocab expansion: make user aware of most common commands
- Task strategy: intelligent UIs that figure out what the user is trying to do and suggests more efficient strategies to achieve it

Human Memory:

Short-term:
- $7 \pm 2$ ‘chunks’
- Fast access: ~70 ms
- Rapid decay: ~200 ms
  - Maintenance rehearsal: repeat chunk a few times to prevent decay
  - Displacement/interference decay
Long-term:
- Short-term -> long-term through elaborative rehearsal + extensive repetition
- Slow access: > 100 ms
- Good at recognition but non recall
- Spatial processing

Slips:

Mistake is a conscious decision; bad user model
Slip is automatic behavior:
- Capture error:
  - Two action sequences, user captured into wrong (more frequent) sequence
- Description error:
  - Multiple objects allowing same/similar action
  - Right action, wrong object
- Data-driven error:
  - Correct value kicked out of short-term memory by external data
  - Incorrect value entered
- Loss-of-activation error:
  - Forget what you are doing mid-flow
- Mode error:
  - Right action, wrong state
  - Make states highly visible and noticeable
  - Reduce states where possible
- Motor slip:
  - Problem between brain and input device
- Premature closure error:
  - ‘Dangling’ UI action after user’s perceived goal completion

Human phenomena:

Homeostasis; equilibrium
- Make a task easier; people will attempt harder tasks with the system
Satisficing
- Making do; why improve?
- e.g. hunt-and-peck typing, not bothering to learn keyboard shortcuts
Hawthorne effect:
- People like being involved in experiments; behavior here not reflective of behavior in real world
Peak-end effects
- Most intense or terminating moments of an experience have an excessive influence over people’s memories of the experience
Negativity bias:
- Bad is stronger than good
Communication convergence
- Similarity in pace, gestures, phrases etc. enhances communication

Top-level design process:

Articulate
- Who are the users and what are the key tasks?
- Task-centered, participatory and/or user-centered design
- Generate user and task descriptions, then evaluate
Brainstorm
- User involvement, representations/metaphors, the psychology of everyday things
- Low-fidelity sketches:
  - Focus on high-level concepts
  - Fast to develop, change, little change resistance
  - Delays commitment
  - Sequential sketches: shows state transitions and actions that trigger state change
  - Zipf’s law (or Pareto principle): focus on 20% of most frequent interactions; they account for 80% of usage
    - $n$ th most frequent item appears with probability $n^{-\alpha}$ where $\alpha \approx 1$
- Medium-fidelity, paper prototypes:
  - Fine-tune interface, screen design
  - Do heuristic evaluation and redesign
  - Walk-through evaluation:
    - User tasked to do some task
      - Is the story believable?
      - If so, ask how they will do it
- Further evaluation:
  - Participatory interaction
  - Task scenario walk-through: to to X, A will press this button then…
Refinement
- Graphical screen design, interface guidelines and style guides
- Generate high-fidelity, testable prototypes, then:
  - Usability testing
  - Heuristic evaluation
Completion
- Generate alpha/beta systems or a complete specification
- Then do field testing

Iterative design: don’t find a single idea and improve on that: leads to premature commitment, local maxima, and tunnel vision

Elaborative/reduction: first explore the full design space, then refine the design(s)

Task-Centered System Design (TCSD):

User identification:
- Talk to users
  - Difficult if the system/task is new
- Learn about the task chain; what are the inputs, where do the outputs go?
- What purpose does the task achieve?
Task identification
- What the user wants to do
  - Not a description of how they (will) do it
- Identify users
  - Name individuals
- Give each task a unique ID
- Validate tasks: talk to relevant users to help spot issues
- Determine what tasks and users will be covered; rank based on importance and task frequency
Design:
- Iterative design, walk-through evaluations

User-Centered System Design (UCSD):

Users know their own needs better than anyone else
Involve representative end-users as full members of the design process
Great at:
- Responding to suggested designs
  - Not so great at coming up with new designs
- Bringing in invaluable knowledge of work context
- Leading to greater user buy-in
The user is not always right - they may not know what they want

Nielson’s Ten Heuristics:

Simple and natural dialogue
- Make it as simple as possible but no simpler
- Presentation + navigation should be natural and consistent
- Design: organize, economize, communicate
Speak the user’s language
- Affordances (it is used the way it looks like it should be used)
- Mappings
- Metaphors
- Base terminology on user’s task language, not implementation
Minimize memory load
- Recall slow; use recognition where possible
- Show input formats, provide defaults (e.g. date fields - what format is it supposed to be entered in, can a sensible default be provided?)
- Support reuse/re-visitation (e.g. show a few of the most commonly or recently used)
- Support unit exchange
- Support generalization: universal commands, modifiers
Consistency
- In graphic design
- In command structure (e.g. pick command then select object or select object and run command)
- Internally
- Externally (within the platform)
- Beyond computing
Feedback
- Continuous feedback about the system state and system’s interpretation of user input
- Feedback should be:
  - Specific
  - Consider feed-forward: show effect of action before it is committed
- Autocomplete
  - Must be stable and predictable - muscle memory, not reading
  - Consider persistance: how disruptive and enduring should the feedback be?
Clearly-marked exits; don’t trap the user
- Cancel buttons, universal undo, interrupt long-running operations etc.
- More recent actions should override older ones
- Quit
  - ‘Do you want to save changes to ${filename}?’: ‘Don’t Save’, ‘Cancel’, ‘Save’; should be specific
Shortcuts
- Keyboard accelerators
- Command completion, type-ahead
- Function keys
- Double clicking
- Gestures
- History
- Customizable toolbars
Prevent errors, avoid modes
- Syntactic correctness - disable items that aren’t valid
- Feedback reduces chance of slips
- Easy correction - universal undo
- Commensurate effort: states difficult to get to should be difficult to irreversibly leave
- Forcing functions: prevent behavior until problem corrected
  - Interlocks: force right order of operations (e.g. remove card before ATM dispenses cash)
  - Lock-ins: force user to remain in space (e.g. would you like to save changes dialog on close)
  - Lock-outs: force user leaving space or prevent event from occurring
  - Don’t just ignore illegal actions - user must infer what is wrong
- Mode errors:
  - Have as few modes as possible
  - Make current mode easily apparent
  - Spring-loaded modes: ongoing action required to stay in mode
Deal with errors positively and helpfully
- Clear language, not codes
- Precise
- Constructive - offer solutions
Help and documentation
- Documentation is not permission to design a crappy UI
- Write the manual before the system
- Reminders: tooltips
- Wizards: puts system, not user in control. Don’t overuse
- Tutorials

Heuristic evaluation:

Inspectors: developers, usability experts, domain experts, users, designers
- Warning for designers: vested interest in their own deigns
With a specific scenario in mind:
- Inspect UI components, workflow, state transition
- Compare against heuristics
- Two-pass approach: focus on specific UI elements on first pass, then integration and state transitions
Result synthesis: inspectors come together and assess overlap

Gestalt Laws of Perceptual Organization:

Proximity
Similarity (color, shape, etc.)
Continuity: brain see dots etc. form a larger shape
Symmetry: objects seen as being ‘closed’ when placed in symmetric boundaries
Closure: brain automatically ‘closes’ objects

PARC Principles:

Proximity
- Group related elements, separate unrelated
- Use whitespace over borders
Alignment
- Grids, tables etc. visually connect elements
- Mis-align unconnected elements
Repetition
- For consistency
Contrast
- Different things should look different

Misc:

Visual flow should follow logical flow
Controls: minimize, include only what is necessary
Smooth continuity (e.g smooth curves vs right angled lines) less ‘neat’ but easier to parse.

UI Evaluation:

Designers uniquely unqualified to assess usability; can’t fathom what a typical user’s model is like
‘Think Aloud’ evaluation:
- Subjects prompted to verbalize thoughts while using a system
  - What they are trying to do
  - What the action did
  - How they interpret feedback from the system
Cooperative evaluation:
- Feels less like the subject is being study; the two subjects are studying the system together
Interview:
- Prepare: have a central set of questions for consistency between interviews
  - Be willing to explore interesting leads
- Good for probing particular issues
- Prone to post-hac rationalization

TODO: 03. User Interface Evaluation.