01. Introduction

Course Administration

Course objectives:

MR theory
MR technology (mainly VR + AR)
Designing and building MR experiences
- Learn about the theories and technologies used to create MR experiences

Topics:

Introduction to Mixed Reality
AR development tools, and designing effective AR experiences
VR development tools, and designing effective VR experiences
Tracking, calibration and registration for AR
- With Richard Green
Mixed reality displays
Interaction in VR
Interaction in AR
Collaboration in mixed reality
Creating multiple-sensory VR experiences
- Haptics etc.
Human Perception and Presence in mixed reality
Data Visualization in Mixed Reality
- Multi-dimensional data sets
Evaluating immersive experiences

Staff:

Stephan Lukosch:
- Course coordinator, main lecturer, currently acting director for HIT lab
Adrian Clark
Rory Clifford
Richard Green
Rob Lindeman (HIT lab director, on sabbatical)
Tham Piumsomboon
- Lecturer in product design
Yuanjie Wu

Labs:

HIT lab 2nd floor (John Britten building)
- Limited lab assignments: focus on project work
- 9 workstations (27 students, 3 people per group)
  - Admin rights
  - Must sign HIT lab equipment-use policy
- TAs will provide technical support of setting up development environment (Unity 3D)
  - Extremely full-featured: focus on the features that are important
- Stephan will give general feedback on research project

Assessment:

Research project:
- 30% of course grades
- Max. 3 students
- Commented, documented source code
- Demonstration/video of project
Research paper:
- 30% of course grades
- 6-page conference-style paper (< 4000 words)
No contribution sheet for groups: assume equal effort
Exam: 2 hour open-book exam

Research project:

Teams of 3
Hybrid tabletop game: physical game elements augmented with virtual information (e.g. 3D objects, animations)
Requirements:
- Visualize several digital game elements anchored in the real world
- Support some form of interaction with the digital game elements (touch interaction, interaction between multiple targets, based on distance between device and marker)
- Players can see and interact with the digital game elements via their smartphone
  - Not enough HMDs for everyone
  - Unity runs on macOS, Linux, Windows
  - Vuphoria runs on Android and iOS, but iOS deployment requires a Mac
  - Unity integrates with Plastic SCM: free for <= 3 people
  - Can borrow a webcam if required
  - Can try use HMDs, but probably difficult

Mixed Reality

A continuum:

Real environment
AR: augmented reality
- Add digital information to the real world
AV: augmented virtuality
- In a virtual world, apart from a few things
  - e.g. VR car simulator, but user can see the real steering wheel
VR: virtual environment
- Everything is virtual

In terms of interactions:

Reality: ubiquitous computers which you interact with alongside the real world
AR: augmented using input from both the user and the environment
VR: completely cut off from the real world: only interaction with the computer

Virtual Reality

VR:

Replicates an environment, real or imagined
Simulates a user’s physical presence and environment to allow for user interaction

Defining characteristics of VR:

Environment simulation
Presence
Interaction

AIP Cube (Zeltzer, 1992)

Three axes:

Autonomy:
- User can react to events and stimuli
- Head tracking, body input
  - User can change their viewpoint
Interaction
- User can interact with objects environment
- User input devices, HCI
Presence
- User feels immersed through sensory input and output channels

VR is at extreme end of all three axes of the AIP cube.

Very hyped in 1980s/1990s:

Lagging technology
Lack of understanding, usability
No ‘killer app’
Except some specific scenarios
- Surgical simulation
- Military training
- Phobia therapy

Keys to success:

High fidelity/realism: graphics, audio, haptics, behaviors
Low latency: tracking, collision detection, rendering, networking
Ease of use: for programmers and users
Compelling content
Responsive expressiveness (natural behaviors)

Current state of senses:

Visual: good
- Hard to match eye’s FoV though
Aural: good spatialized audio
Olfactory (Smell): too many types of receptors; very hard
Haptics: application-specific and cumbersome
Gustatory (taste): base tases are known, but very hard

Simulator sickness:

General discomfort
Fatigue
Headache
Eye strain
Difficulty focusing
Salivation increasing
Sweating
Nausea
Difficulty concentration
‘Fullness of the head’
Blurred vision
Dizziness with eyes open
Dizziness with eyes closed
Vertigo
Stomach awareness
Burping

Factors negatively influencing VR:

Latency
Miss-calibration of tracking
Low-tracking accuracy
Low-tracking precision
Limited FoV
Low refresh rate
Low resolution
Flicker/stutter
Real-world stimuli
Lack of depth cues
Device weight
Heat
Fogging of screens

Delay/latency is one of the main contributing factors to simulator sickness. The system must complete several tasks in series, which can lead to noticeable high latency:

Tracking delay
Application delay
Rendering delay
Display delay

VR Output

Sound:

Display techniques:
- Multi-speaker output
- Headphones
- Bone conduction
Spatialization vs localization
- Spatialization: processing of sound signals to make them seem to emanate from a specific point in space
- Localization: our ability to identify the source position of a sound

Smell:

Two main problems
- Scent generation
  - The nose has tens of thousands of receptor types
- Delivery
  - How to deliver the scent to the user (and hopefully only to them) and remove it quickly

Touch:

Haptic feedback comes in many difference senses:
- Force/pressure
- Slipperiness
- Vibration
- Wind
- Temperature
- Pain
- Proprioception
- Balance (?)
Most density populated area: fingertips, lips, tongue
Two-point discrimination: how far away do two points need to be in order to sense them as two separate touches rather than one single touch?
- 2-3 mm in finger
- 6 mm on cheek
- 39 mm in back
Cyberglove:
- ~100K
- Tracks hand motion
- Contains motors to block finger movement: creates impression of actually grabbing something
Force-feedback arms:
- Stylus attached to robot arm
- Can be used for sculpting: resistance varies with the material

VR Interaction

Interaction with VR:

Keyboard/mouse not very attractive:
- Cannot see them
- Don’t want to be anchored to a desk: want to move around
- No good 3D mappings

Basic VR interaction tasks:

Object selection and manipulation
- Problems:
  - Ambiguity
  - Judging distance
- Selection approaches
  - Direct/enhanced grabbing
  - Ray-casting techniques
  - Image-plane techniques
- Manipulation approaches:
  - Direct position/orientation control
  - Worlds in miniature (God mode)
  - Skewers
  - Surrogates
Navigation
- Wayfinding: how do I know where I am, how do I get there?
- People get lost/disoriented easily: need maps
- Limited physical space; possibly infinite virtual space
  - Not a 1:1 mapping between their physical and virtual position, making it easy to get disoriented
- Different types of travel
  - Walking/running
  - Turning
  - Side-stepping
  - Back-stepping
  - Crawling
  - Quick start/stop
  - Driving
  - Flying
  - Teleporting
- Need to do other things while traveling
- Impossible spaces
  - Change blindness redirection: change the geometry of the space behind them
    - Suma, E. A.; Lipps, Z.; Finkelstein, S. L.; Krum, D. M. & Bolas, M. T., Impossible Spaces: Maximizing Natural Walking in Virtual Environments with Self-Overlapping Architecture, IEEE Trans. Vis. Comput. Graph., 2012, 18, 555-564
  - Humans trust their visual sense more than their memory
  - Changing rotation angle?
System control:
- Changing settings
- Manipulating widgets:
  - Lighting effects
  - Object representation
  - Data filtering
- Approaches:
  - Floating windows
  - Hand-held windows
  - Gestures
  - Menus on fingers
Symbolic input: typing/inputting text/numbers
Avatar control
- Body sensors to accurately map avatar to real user?
- Or approximate with head and hand position?

The “optimal” interface depends on:

The capabilities of the user
- Dexterity
- Level of expertise
The nature of the task being performed
- Granularity
- Complexity
The constraints of the environment
- Stationary, moving, noisy, etc.

Augmented Reality

Azuma (1997):

Fundamental article on AR
Defined AR as:
- Combining real and virtual images
- Interactive, real-time
- Registered in 3D: positioned at a real position in the world

AR feedback loop:

User:
- Observes AR display
- Controls the viewpoint
- Interacts the content
System:
- Tracks the user’s viewpoint
- Registers the pose in the real world with the virtual environment
- Presents situated visualization

Requirements:

Display: must combine real and virtual images
Interactive in real-time
Registered in 3d: viewpoint tracking

History:

1968: Sutherland HMD system
- System would hang from the ceiling
1970-80s: UC air force SuperCockpit program
1990s: Boeing wire harness assembly
Now:
- Magic books: virtual content shown over pages
- Magic mirror: ‘mirror’ overlays X-ray image over color image
- Remote support

Display types:

Head-attached
- Head-mounted display/projector
- Two types:
  - Occluded/video: essentially a VR headset with a camera feed streamed through it
    - e.g. Varjo XR-1:
      - Low-latency (~20 ms) cameras
      - 1080p resolution per eye
      - Tethered
      - 87 degree FoV
  - Optical see-through: transparent display that overlaying content
    - No/lower distortion
    - Safer: use will always be able to see real world
      - No latency for real content
    - Images will be a bit transparent as well
    - More connected with the real world
    - e.g. Hololens, Magic Leap
      - Hololens has ~30 degree FoV: very limited
Body-attached
- Hand-held display/projector (smartphones)
Spatial
- Spatially-aligned projector/monitor
  - Project images onto a real object
  - e.g. pool table
  - Everyone can see the content: not as awkward

Tracking:

Continually locating the user’s viewpoint when moving
Position (XYZ) and orientation (RPY)

Registration:

Positioning virtual objects in relation to the real world
Anchoring a virtual object to a real object when a view is fixed

Tracking requirements:

Augmented reality information display
- World-stabilized: hardest, must track position + rotation
- Body stabilized: fixed distance from your body: must track rotation
- Head stabilized: easiest

Tracking technologies:

Active:
- Mechanical, magnetic, ultrasonic
- GPS, Wi-Fi, cellular
Passive
- Inertial sensors (IMU: compass, accelerometer, gyro)
- Computer vision:
  - Marker-based tracking
    - ARToolkit
    - Research project will use marker-based tracking for reliability
  - Natural feature tracking
    - Vuforia texture tracking
      - Can handle partially-occluded markers
Hybrid tracking
- Combined sensors
- e.g. MonoSLAM

Evolution of AR interfaces (more expressive/intuitive going down):

Browsing:
- Simple input:
  - Very limited modification of virtual content
  - e.g. placing furniture in room: can control position and rotation
- Viewpoint control
- Handheld AR displays
- Information registered to real-world context:
  - e.g. AR map UI
3D AR:
- 3D UI
- Often use HMDs, 6DoF head-tracking
- Dedicated controllers (6DoF)
- 3D interaction: manipulation, selection, etc.
Tangible UI
- Augmented surfaces
- Object interaction
- Familiar controllers
- Indirect interaction
- Based on Tangible Bits vision (Ishii and Ullmer, 1997):
  - Give physical form to digital information
  - Make bits directly manipulable and perceptible
  - Seamless coupling between physical objects and virtual data
Tangible AR
- Tangible input principles applied to AR
- AR overlay
- Direct interaction
- Physical controllers for moving virtual content
- Support for spatial 3D interaction techniques
- Time and space multiplexed interaction
- Multi-hand interactions possible
Natural AR
- Interacting with AR content in the same way as real world objects
- Natural user input: body motion, gesture, gaze, speech
- e.g. overhead depth sensing camera
  - Create real-time hand model, point-cloud
  - Overlay graphics (spider)
  - Gesture interaction
  - Demo: spider on desk: occluded by hand, can crawl over hand

02. Developing Augmented Reality Experiences

Adrian Clark, senior lecturer, School of Product Design.

Introduction to Unity

Unity: ‘real time development platform’. Not just for games.

Unity is so big, no one knows the full extent of what it can do.

Resources:

Unity Learn
- AR stuff now under the XR section
Community
User manual
- Pretty decent docs
Asset Store
- Also third-party stores like Turbo Squid
- Prefer fbx files
- To download assets purchased from the store: Window -> Package manager -> My assets

Unity:

Use 2020.3 LTS release
Use 3D template

Editor:

Many windows
- Scene
  - See and position all GameObjects in the scene
  - Top left, switch between translation, rotation and scale tools
  - Add Rigidbody to a GameObject to add physics
- Game
  - What the camera sees
  - Any changes made during play mode are not saved
- Project
  - Assets folder: contents update automatically when FS changes
- Hierarchy
  - All game objects in the scene in a hierarchy
  - Parent nodes affect child nodes
- Inspector
  - Modification of GameObject properties
  - Components: behaviors or extensions to the game objects
    - Game objects are essentially containers for components
    - UI objects have their own event systems: cannot attach click listeners etc. to 3D objects
  - Examples:
    - Renderer
    - MeshFilter
    - Camera
    - Light
      - Directional by default:
        
        Infinitely far away: rays are parallel
        
        Node position does not matter: only rotation
      - Spotlight: cone of influence
      - Point: sphere of influence
    - Collider
- Console
  - Warnings, errors

Shader rendering mode:

Opaque: alpha ignored
Cutout: binary transparency; on or off. Use if the object has holes
Fade: change transparency of all aspects of the material based on alpha
Transparent: TODO

Scripting:

// Behavior script is another component that attaches to a GameObject

// https://docs.unity3d.com/Manual/ExecutionOrder.html

// A massive number of lifecycle callbacks
public class NodeBehaviorScript: MonoBehaviour {
  // Called before first frame update
  void Start() {
    Debug.log("Instantiated");
  }

  // Called every frame
  void Update() {
    if (Input.GetKey(KeyCode.UpArrow)) {
      // transform: transform of the object the script is attached to
      // localPosition: position relative to parent
      transform.localPosition += new Vector3(0, 0, 0.1f);
    }
  }

  void onMouseDown() {
    // Set up a collider on the game object
    // e.g. box collider: invisible box (hopefully) around the object
  }

  void onCollisionEnter(Collision collision) {
    // Use collider that is larger than the object: when two objects
    // come close together you can add custom behavior (e.g. 'picking up'
    // the object in AR
  }

}

Unity Remote:

App installed on your phone
Project Settings -> Editor -> Unity Remote
Set game resolution to match device screen
Game view is streamed to your phone screen
- But not the camera
Touch events etc. on phone are sent the host computer

AR

Many different SDKs available. Some deciding factors:

Price (free, paid per scan/month/app/licence period)
Supported hardware platforms (iOS, Android, desktop, HMDs, web)
Tracking (Fiducial, 2D natural feature, SLAM, 3D object, face, GPS/IMU)
Performance

Unity also has AR foundation: a common interface to platform-specific AR frameworks. No way of running it in the editor, which makes development very frustrating (although there are some rumblings of a Unity Remote-like app which does on-device processing).

This course will use Vuforia:

Initially developed by Qualcomm (and optimized for their chips), bought in 2015 by PTC and slowly becoming monetized
Works on black and white
Available as a UnityPackage
Each target is its own separate game object
- Add game objects as children of the target in to anchor them to the target
- Except for ground/mid-air planes: ‘finder’ objects to find the plane, and ‘stage’ objects containing content
Target types:
- 2D Image
  - Single image: import image into Unity, drag image into texture field
    - These have Image Target Preview component which displays a preview of the image
  - Can create databases in the cloud, then download
  - Can have cloud image target which does processing online:
    - Useful for databases with large (hundreds) numbers of images
    - Requires paid license
- Cylinder
- Multi
  - Box: six images, one for each face
  - Add occlusion object: create 6 planes with depth mask material which is rendered before the game objects
  - Add target representation: renders the box on top of (where Vuforia thinks) the actual box (is)
- 3D models:
  - CAD models (model targets)
    - Being deprecated
  - Scanned 3D objects (object targets)
    - Supposedly getting better
  - Scanned 3D environments (area targets)
    - Doesn’t work that well
  - Ground planes (ground or mid-air)
- VuMarks
  - Vuforia’s custom fiducial markers
  - Can have multiple VuMarks with similar visual content but with different data
  - Created as SVG files: Illustrator template available
Ground plane targets
- Anchoring virtual content to ‘ground planes’ - horizontal planes in the environment
- Uses SLAM: requires IMUs etc., so not supported on all devices
  - Can emulate in editor with a PDF print-out of a texture
  - Project -> Packages -> Vuforia Engine AR -> Vuforia -> Database -> ForPrint -> Emulator -> Emulator Ground Plane pdf file
- Requires ‘track device pose’ to be enabled in Vuforia engine configuration
- Ground plane finder:
  - Places a reticle on ground planes (think crosshair)
  - Interactive hit test: on tap, instantiates a prefab on the ground plane
  - Can also use automatic hit tests
- Create ground plane stage:
  - Link to the ground plane finder (content positioning behavior, anchor stage)
    - Can enable duplication to have multiple copies of the ground stage content
  - Size is in real-world units (1m x 1m)
  - Can save the ground plane as a prefab (works somewhat)
Mid-air positioner:
- Fixed distance from the ground
- Position is tracked relative to the ground plane
- No automatic hit-tracking
- Ground plane stage: change anchor behavior to MID_AIR
Adding custom targets:
- Requires license
  - In AR camera, go to Vuforia configuration
    - Enter license
    - Can also change settings such as scale

Vuforia AR camera:

GameObject -> Vuforia Engine -> Camera
Replaces default camera
Vuforia engine configuration
- Can also just edit configuration as a text file
  - Asset/Resources/VuforiaConfiguration.asset
  - Global settings: apply to all scenes
- World center mode:
  - First target: first target detected is the world origin
  - Device: camera always at origin
    - Need to turn off track device pose
  - Origin impacts things such as physics simulations

Targets:

Default observer event handler:
- Responsible for turning on/off AR content when target is visible
  - Can run custom scripts or change properties of GameObjects when a target is found or lost
- Can choose definition of Visible for each target:
  - Tracked: visible to the camera
  - Extended Tracked: the area immediately surrounding the target is visible
  - Limited: vague idea of position using IMU or something

UI:

Can position element on screen anchored to a corner, center etc.
Elements not scaled for different pixel densities: may appear tiny on phones
- Game view, set display resolution to some high portrait resolution to preview
UI elements automatically have a ‘Canvas Scaler’ component
- Change UI scale mode from ‘constant pixel size’ to ‘scale with screen size’
- Set a reference resolution and an axis which it scales along
- Can also use ‘constant physical size’
Button click events:
- Can call any public method from any (instance of a) script assigned to a GameObject

Prefabs:

Saved chunk of a scene (like a snapshot)
Drag the game object into the project window to create a new prefab
- Saves components, children etc.

Creating elements from script:

GameObject -> new empty object

Add component -> new script

public class SomeCreator: MonoBehaviour {
  // In the inspector, script component, can assign any GameObject (including prefabs) to the property
  public GameObject SomeGameObject;
  // Works with a lot more types to (e.g. int, Rect)
  void Start() {
    // Create new copy of the `SomeGameObject` object
    GameObject someNewCopy = GameObject.Instantiate(SomeGameObject);
    // Can optionally pass in position, rotation (as a quaternion)
    
    // Primitives can also be created programatically
    GameObject cube = GameObject.CreatePrimitive(PrimitiveType.Cube);

    // Can set the material properties for the cube's default MeshRenderer
    // GameObjects start off sharing the same default material
    // Setting the color creates a new instance of the material.
    cube.GetComponent<Renderer>().material.color = Random.ColorHSV();
    // `Color` components have a `[0, 1]` floating point range,
    // whereas `Color32` has a [0, 255] integer range

    // used `sharedMaterial` to modify the material instead, (potentially)
    // affecting multiple objects. Don't call if it uses the default material
    cube.GetComponent<Renderer>().sharedMaterial

    // For complex models (or prefabs) composed of multiple GameObjects,
    // we need to access the children components.
    // Find the first child object in the tree with the given name
    baseObject.transform.Find("ChildName");
  }
}

Building for Mobile:

File -> Build Settings
- Set to iOS/Android
- Click switch platform (and wait a while)
- iOS:
  - Builds: recommend new folder for each project
  - Set signing team
Project settings -> player
- Can set company, product name, icons etc.
- Device orientation: typically just portrait
- Other settings:
  - Can set build number
  - Configuration:
    - Camera usage description: camera privacy description text
    - Scripting backend: IL.2CPP (intermediate language which compiles C# rather than interpreting it)
  - For Vuforia:
    - iOS:
      - Target iOS version >= 11
      - Architecture ARM64
      - Product name cannot be ‘vuforia’ (there is a library called ‘vuforia’ which it gets mixed up with)
    - Android:
      - ARCore: if available, Vuforia will use it
        
        Minimum API level 24, remove Vulkan (check if this is still the case)
Android logs: adb logcat -s "Unity:*" (s for silence all but logs that match the given string)

Adrian:

Co-founded AR company
AR overused and often misunderstood
Before developing AR applications, ask if there is benefit to doing it in AR:
- Could it be done in VR?
- As a desktop/mobile app?
- As a webpage?
AR useful for visualizing spatial data, especially if it has an intrinsic link to the real world-environment
- If it fails the latter, it should be done in VR instead (or even just on a flat screen)
- Data should have at least 3 spatial dimensions
  - You move around in 3 dimensions and hence, the data should have at least that many dimensions
Awkward interactions:
- Holding your phone up with one hand while tapping the screen
  - Requires powerful phones and drains battery
- Trying to tap targets in mid air while wearing a heavy HMD
Hard to find a balance for visualization realism:
- Shouldn’t perfectly match the environment: people need to know they can interact with it
- Shouldn’t be so out-of-place to be jarring
- AR visualization considerations:
  - Real/virtual object occlusion
    - Never pixel perfect
  - Lighting which matches the real world (brightness, color, reflection)
    - Can’t get high-quality reflections: don’t know what’s behind the camera
    - Shadowing to give the perception of distance
    - Clutter/contrast between real/virtual objects
Interactions
- Still in its infancy
- With touch:
  - How do we ensure people can accurately touch?
    - Fat fingers, and phone is held in one arm outstretched
  - How do we choose where they touch in 3D space?
- With gestures:
  - What gestures are intuitive? How does it vary by culture
  - How do we combat fatigue?
  - How do we stop it from looking embarrassing?
  - How do we deal with a lack of haptic response?
    - Is my finger past the target? In front of it? On it?
    - Without good occlusion, haptics is important
- Forget about the WIMP (windows, icons, menus and pointers) metaphor
  - Why bring an intrinstically 2D metaphor into an intrinstically 3D experience?
  - Think about new interactions and visual affordances
    - Tangible user interfaces
      - e.g. magic lenses: https://www.youtube.com/watch?v=PKegByAZ0kM
    - AR should be seamless: we should forget it even exists

03. Developing Virtual Reality Experiences

Dr Tham Piumsomboon, School of Product Design.

Current VR Development Tools

2016: rise of consumer HMDs. Oculus, HTC Vive.

XR Fragmentation: different vendors all had their own proprietary APIs (e.g. Steam VR, Hololens, Oculus, HTC Vive, Magic Leap).

Kronos group (which created OpenGL, Vulkan, etc.) developed the OpenXR standard: cross-platform API supported by many hardware vendors.

Toolkits:

VRTK
- Open source, Microsoft
MRTK
XRTK
- Fork of MRTK developed by Unity
[Oculus Interaction SDK]
- May be depreciated

Game Engines:

HITLab using Unity engine
A few others available: Unreal, CRYENGINE, GameMaker Studio, Amazon Lumberyard
Social platforms (e.g. Breakroom)

Developing VR Experiences

Immersion:

Feel like you are physically and mentally in the virtual world
How to simulate a large world in a limited physical space?
In non-VR games:
- Suspension of disbelief: enough realism in the experience that you can ignore the issues

Models of immersion:

Three types
- Sensory immersion:
  - Disassociation with the real world
- Challenge-based immersion:
  - Control:
    - Ask what made great games from the past (e.g. Mario) great?
    - Input mechanisms (e.g. game controllers, hand gestures)
  - Challenge:
    - Challenges must be achievable: too difficult -> frustrated; too easy -> bored
    - Hence, the difficulty must be balanced to make the experience rewarding
  - Cognitive involvement:
    - What you get out of the experience:
      - Fun
      - Social aspects
      - Learning/training
      - Work
- Imaginative immersion:
  - Emotional involvement

Unity:

Unity Settings -> XR Plug-in Management: use OpenXR (e.g. to support Windows MR headset)
- Windows MR app needs to be running in the background
Add interaction profile
Package manager -> XR Interaction toolkit (com.unity.xr.interaction.toolkit)
Add XR Origin GameObject: virtual camera maps to headset position and orientation
- Add input action component: XR default input action
- Also adds controller game objects

Analysis -> Profiler

Visualize frame rate of application and components (e.g. rendering, scripting, physics, display sync) that make up the processing time
High frame rate important to prevent motion sickness
Project Settings -> Quality to increase/reduce quality and decrease/increase frame rate

Game engines components:

Core functionality:
- Rendering engine
- Physics engine
  - And collision detection
Sound, scripting, animation, networking, streaming, memory management etc.

Game loop:

Read HID (human input device) state
Update scene state
- Physics engine
- User input
- Multiplayer networking
- Collisions
- Animations
- NPC state
- Audio
Render the scene

The subsystems will often update at different rates (e.g. NPC behavior may update at ~1 FPS, physics engine at 120 FPS, renderer at 60 FPS).

Camera placements and control mappings:

Relatively easy to convert 2.5D to 3D
FPS games: camera mapping can be done easily by adding XR Origin
- Controllers need to be mapped to existing controls
- XROrigin:
  - Copy tracking origin, put it in player
  - Put controllers under player
  - Controllers: Add default SOMETHING
  - Player: add locomotion, input system
    - XR origin set
    - Add reference to XRI default input actions
  - Player: add continuous move provider, left hand XRI LeftHand LocomotionMove
  - Player: add snap turn provider, left hand XRI SnapMove
XR Grab Intractable: add to a game object to allow a user to grab it from afar
XR Socket Interactor: allows interactables to be ‘docked’ to the game object
- Box collider: disable Is trigger TODO
Scripts: using UnityEngine.InputSystem, create InputActionReference property (or private property with [SerializeField] annotation)

ProBuilder package: allows you to create new primitive shapes.

VR Interaction Design

Even if the graphics are good, you need to be able to interact with the environment.

Seven principles of fundamental design (Norman):

Discoverable: users can discover what actions are possible
Feedback: full and continuous feedback about the results of your actions and the current state of the world
Conceptual model: design informs the user of the system’s conceptual model, making it seem intuitive
Affordances:
- The perceived affordance should match the actual affordance
- Clues as to how something works:
Signifiers: something to indicate the existence of affordances (e.g. arrows, sound)
Mappings: relationship between the controls and actions is understood
Constraints: physical, logical, semantic, cultural constraints which guide actions

VR: mappings are from games, not reality (e.g. using scissors: clicK a button to use, not picking it up with your fingers). Chairs: cannot sit on VR chairs in real life

VR affordances:

Use visual cues to show possible affordances
Perceived affordances should match actual affordances
Good cognitive model: map object behavior to expected behavior
- May vary by culture
Controllers have different controls:
- Some have joysticks, some have touchpads, some have buttons
- Trigger buttons
Examples:
- Buttons that can be pushed
- Objects that can be picked up
- Doors that can be opened and walked through
- Mutual human actuation

User groups:

Age:
- Children require different interface designs
- Older people have different needs
Prior experiences with HMDs
Different physical characteristics: left/right-handed, height, arm reach
Perceptual/cognitive/motor abilities
- e.g. color perception
- Cognitive/motor disabilities

Whole user needs:

Social: don’t make them look stupid
Cultural: follow local cultural norms
Physical: can they physically use the interface
Cognitive: can they understand the interface
Emotional: make the user feel good and in control

Summary:

High-fidelity graphics in VR is possible if we can afford it computationally
- But not sufficient for immersion
SCI immersive model: engage user through sensory, challenge-based, and imaginative immersion
General design principles can be applied to VR design
Plenty off opportunities of richer VR interaction beyond what general design principles could govern

04. AR Tracking, Calibration and Registration

Optical tracking:

Specialized
- e.g. IR lights for VR controllers
Marker-based
TODO
Markerless:
- Edge-based
- Template-based
- Interest point

Trackable managers:

AR Foundation: a Trackable is anything that can e detected and tracked in the real world
- Planes, point clouds, anchors, images, environment probes, faces, 3D objects
  - We are interested in planes, point clouds, anchors, and images
- Each Trackable has a Trackable Manager, which is on the same GameObject as the AR Session Origin
- Each Trackable Manager keeps a list of its Trackables

Computer vision: detecting objects and tracking their movement in 6 degrees of freedom

Vision is inferrential: context, prior knowledge etc. is required to come up with a reasonable interpretation of the scene: An infinite number of 3D objects can lead to the same image.

3D information recovery:

Motion
Stereo vision
- Works up to ~3m
Texture
Shading
Contour
- Can understand depth from a line drawing
Time-of-flight sensors

TODO

The human visual system is really good and is often taken for granted; replicating this with a computer is very difficult.

Cognitive processing of color is dependent on context: neighboring colors, not absolute values. Hence, using a color mask for filtering is likely to fail unless you have control over lighting.

Low-level image processing:

Image compression
Noise reduction
Edge extraction
Contrast enhancement
- Good for humans but for computers, it is just throwing away information
Segmentation
Thresholding
Morphology
Image restoration
- e.g. if camera velocity known, can correct for motion blur

TODO

Recognition:

Shading

TODO:

Sports:
- American football: touch down line
- Swimming: flags

Perfect 3D point cloud -> 3D model is very difficult

Modelling the natural world: extremely difficult as there is variation. Manufacturing produces many copies of a single product, but nature does not.

Vision systems:

Active:
- Laser scanner
- Structured light
  - Project lots of dots; use dot size to determine distance
- Time of flight:
  - Use time it takes for light to return to camera to determine distance: gives distance value for every single pixel
Passive
- Stereo
  - Cheap, works well in good lighting
Structure from motion/3D reconstruction:
- Deep learning with moving camera to reconstruct 3D scene

COLOR:

Visible spectrum is a tiny part of the electromagnetic spectrum
Sun: greatest energy output at visible wavelengths

TODO

Natural Feature Tracking:
- Keypoint detection
  - SIFT, SURF, GLOH, BRIEF, FREAK etc.
- Descriptor creation and TODO
- Outlier removal
- Pose estimation and refinement

Fidutial:

No databases required
Intrusive: environment
Must be fully in-view

05. Mixed Reality Displays

Rob Lindeman: Professor & Director of HIT Lab.

Displays

Definitions

Virtual Reality

Rob first defined VR as:

Fooling the senses into believing they are experiencing something they are not actually experiencing

Lindeman, 1999 (PhD)

Today, he has a new definition:

Fooling the brain into believing it is experiencing something it is not actually experiencing

Mixed Reality

Mixing (not overlaying) of real-world (RW) and computer-generated (CG) stimuli.

This requires matching attributes such as:

Visual: Lighting, shadows, occlusion, level of fidelity
Aural: Sound occlusion, reflection
Other senses?

Milgram’s Reality-Virtuality continuum: different displays influence the quality of the experience.

General Display Types

NB: humans are animals and as such, were evolutionary pressures have guided the development of ours senses. Displays that leverage the different strengths and weaknesses are more likely to be effective.

Senses:

Visual
- Very good visuals: high framerate, good lighting simulation
Auditory
- Very good spatialized audio
Haptic
- Application-specific, cumbersome
- Catch-all for many different senses:
  - Force/pressure
  - Slipperiness
  - Vibration
  - Wind
  - Temperature
  - Pain
  - Proprioception
- Sensitivity varies greatly
- Haptics is bidirectional:
  - Tight coupling between sensing and acting on the environment
  - e.g. picking up a cup: use haptics to tread the line between slipping and crushing the cup
  - Tactile/force devices:
    - Pin arrays for the fingers: individually actuated pins
    - Force-feedback ‘arms’
    - ‘Pager’ motors
    - Particle brakes: stopping motion
    - Passive haptics
    - Most successful haptics are very application-specific (e.g. surgical devices)
  - Virtual contact
    - What should we do when contact has been made with a virtual object?
      - Should the virtual hand continue to mirror the pose of physical hand, or be blocked by the wall?
    - The output of collision detection is the input to virtual contact
    - Cues for understanding the nature of contact with objects is typically over-simplified (e.g. sound)
  - Vibrotactile displays:
    - Use of vibration motors as a display
    - US Navy TSAS project: communicate which direction is ‘down’ to pilots during maneuvers
    - Haptic vest: communicate collision direction, strength to users
    - Wind feedback: head tracking + fans
Olfactory
- Very hard - too many types of receptors
- Almost all human-perceivable colors can be produced from just three sub-pixel types
- Nose has ~15,000 types of receptors
Gustatory
- Know the base tastes, but no way of producing or delivering them
- Meta cookie: AR display, air pumps with different smells, (tasteless?) cookie with marker burned into it

Display anchoring:

World-fixed
View-fixed
Body-worn
Hand-held

Visual display types:

World-fixed displays
- Fishtank/desktop VR
- Projection AR
Body-worn displays:
- Opaque HMDs (VR)
- Transparent HMDs (AR)
Hand-held displays:
- Tablet/phone VR/AR
- Boom-mounted screens (not too common today)

Mixing Reality

Visual

NB: we don’t need to simulate reality, just need to make it good enough to make the brain believe it is physically correct.

Direct:

                                    Human
Real-world ----> Environment ----> sensory ----> Nerves ----> Brain
 signal                           subsystem

                  Display?         Retina        Optic    Direct cranial
                                                 nerve     stimulation

Captured/mediated

Real-world ----> Environment ----> Capture device ----> Post-processing ----> Captured signal

Audio

Real-world ----> Environment ----> Outer ear ----> Middle ear ---> Inner ear ----> Nerves ----> Brain

Typical AR/VR systems use speakers (environment) or headphones (outer ear)
Mixing could also be performed in the inner middle ear using bone conduction

Mic-through AR:

Microphone glued to earbuds
PC mixes audio for virtual user

Hear-through AR:

Acoustic-hear-through AR: multiple speakers placed around the room
Bone-conduction: ears are not covered so can continue to hear
Mixing at the sensory subsystem
Own voice: combination of sound reaching ears through air, plus vibration through cocela

Visual Mixing

Projection:

Project virtual content on top of the physical world
Examples:
- Microsoft IllumiRoom (2013):
  - Use projector to ‘extend’ TV content
  - Can also distort and re-project room texture

Optical-see-through AR:

HMD with transparent display
e.g. Microsoft Hololens, Magic Leap

Optical-see-through Projective AR:

Projection onto retro-reflective surfaces: only visible to the user wearing the projector

Video-see-through AR:

Camera on headset: camera feed mixed with virtual content and displayed in headset display
Benefit: easy to remove things from reality: hard/impossible in optical-see-through systems
e.g. Varjo XR-1

Visual Cues

Do we need stereo, which is one of the major things added by VR compared to traditional displays?

Monoscopic cues:

Overlap (interposition)
Sizing/shadows
Size
Linear perspective
Texture gradient
Height
Atmospheric effects
Brightness

Stereoscopic cues:

Parallax between two images
Only good for within a few meters of the cameras

Motion depth:

Changing relative position of head and objects
User (e.g. head movement) and/or object movement
- Proprioception can disambiguate between these two cases

Physiological cues:

The eye changes during viewing
Accommodation: muscular changes of the eye
Convergence: movements to bring images to the same location on both retinas

Masking/Occlusion

Making a physical object block a virtual one.

CAVE (CAVE Automatic Virtual Environment)
- Projection of VR content onto room surface
- Need to create mask to prevent projection on physical objects
HMD: not necessary; mixing being done virtually
Fishtank VR: display edge/bezels can break effect

Real-world Problems with Immersion

Feeling sick after using VR for a prolonged period
Popcorn problem: can’t interact with physical objects (or eat) without taking the headset on and off
Communication: very difficult to talk with someone using a VR headset

Dynamic immersion:

Open VR headsets which allow the user see the real world from the user’s peripheral vision
Linderman: replaced part of Google cardboard’s frame with LCD panels (no backlight) that could be turned on and off
Later version added eye tracker + tiny LCDs with eyes on the outside

Visuals & Sound

Non-intrusive senses: touch etc. requires something on or in your body.

Final Thoughts

Real world stimuli: high fidelity/low control
CG stimuli: lower fidelity but complete control
Far, far future: a 3D printer and robot that quickly creates objects and puts them in the environment
Later mixing point = more ‘personal’ stimuli (closer to the brain)
Multi-sensory approaches compensates for weaknesses in one sense with another sense

06. Interaction in VR

Rob Lindeman, Director of HITLab NZ.

User interaction:

How can we do things in VR environments?
What controllers/inputs do they support?
As a user, what kinds of thins would I even want to do?
- Is the task fatiguing? Will it make me feel sick

The state of VR:

1980s/1990s:
- Much hype
- No inroads into everyday life:
  - Lagging technology
    - But the technology we have today seems close
  - Lack of understanding of usability issues
  - Lack of ‘killer apps’ (games?)
- Some use in specific scenarios:
  - Surgical simulation
  - Military training
  - Phobia therapy
  - Oil/gas visualization
  - Automotive design
2000:
- Growth of video games led to:
  - Immense increase in hardware power
  - Reduction in hardware costs
  - Immense growth in number of users/gamers
  - Many ‘new’ interface devices
- Better understanding of 3D-UI
- Gaming emerged as the killer app (?) (at least for now)
2010s:
- Lots of new hardware:
  - WiiMote, WiiMotion Plus
  - Kinect
  - Powerful smartphones (economies of scale)
  - ~12 VR/AR headsets announced
  - Lots of controllers
- 2015:
  - A lot of VR headsets
  - Very few AR headsets
  - Many phone-integrated headsets (e.g. Google Cardboard, Samsung Gear VR)
  - Some locomotion controllers (e.g. treadmills), but no consumer hardware
  - A lot of companies go out of business: no use case for the products

Motivation for studying VR interaction:

Mouse + Keyboard great for general desktop UI tasks:
- Text entry, selection, drag/drop, scrolling, rubber banding etc.
- Fixed computing environment
- 2D windows, 2D mouse
But how do we design effective techniques for 3D?
- With a 2D device?
- With multiple n-D devices?
- New devices?
- 2D interface widgets?
- With a new language; new interaction techniques that can support the new environment
Gaming:
- Tight coupling between action and reaction
- Requires precision
VR gives real first-person experiences, not just views
- HMDs:
  - Look behind by turning your head
- Selecting/manipulating objects:
  - Reach out with your hand and grab it
- Travel:
  - Walk
  - Except that you are still in a confined physical space with objects and cats in the way
- Doing things that have no physical analog is more problematic
  - How do you change font size?

Existing input methods:

Joystick, trackballs, trackpoints, trackpads, tablets, gaming controllers
General vs purpose-built controllers:
- General purpose:
  - Single device used for many thing
  - Mouse, joystick, gamepad, game controllers, Vive/Touch controllers, WiiMote etc.
  - Okay for many tasks, but not optimal
- Special purpose:
  - Typically used for a specific task (e.g. driving, playing guitar)
  - Very effective for a given task
Current devices:
- PlayStation
  - Vibration motors with different weights in each wing to allow varying vibration strength
  - PS5: shoulder/trigger buttons have motors for force feedback (variable resistance)
- Xbox controllers
- WiiMote
- Leap Motion hand tracking
- Kinect body tracking
- Hand-held devices:
  - Smartphones
  - Tablets
  - Nintendo DS
  - Sony PSP

Classification Schemes

Relative vs absolute movement:

Mice return a delta: relative motion
Touch screens, pen tablets return absolute position

Integrated vs separable degrees of freedom:

e.g. Etch-a-sketch has separate X and Y controls
Motions that are easy with one are hard with the other

Analog vs digital:

Continuous vs digital input: prefer former

Isometric vs isotonic:

Isometric: infinite resistance (no motion), but force sensing
- e.g. ThinkPad TrackPoint
Isotonic: zero resistance
In reality, devices exist on a continuum of elasticity
- Mice are mostly isotonic, but they do have mass and hence inertia
- Some controls (e.g. joysticks) are self-centering

Rate control vs position control:

Mice
- Usually position control
- Scrolling:
  - Scroll wheel: position control
  - Windows middle click and drag: rate-controlled scrolling
Trackballs: usually position control
Joysticks: usually position (cross-hair) or velocity (e.g. aircraft)
Rate control eliminates need for clutching/ratcheting
Isotonic-rate and isometric-position control usually poor

Special-purpose vs general-purpose:

Game controllers must support many types of games
- Few ‘standard’ mappings: each game can do things differently (for the most part)
Some special-purpose devices:
- Mostly based on things that already exist in the real world
- Examples:
  - Guitar controllers
  - Steering wheels
  - RPG keyboard/joystick
  - Drum kits, dance pads, bongos etc.

Direct vs indirect:

Direct:
- Click and drag with mouse/stylus/finger
- Touch screen gestures: swipe, two-finger rotate
- Problems:
  - Works well for things that have a physical analogue, but not for those without
  - May have low precision
  - Selection/de-selection may be messy
Indirect
- Use some widget to indirectly change something

3D Input Devices:

SpaceBall/SpaceMouse
- Isometric device which senses force
CyberGlove II
- From the 1990s
- Strain sensors used to sense hand movement
PHANTOM Omni Haptic Device
- Stylus attached to robot arm
- Force feedback allows user to feel surface
- Limited working volume, only one point of contact, very limited use case
HMD with 3/6-DOF trackers (e.g. Oculus Quest)

3D Spatial Input Devices:

Microsoft Kinect
Leap Motion
Oculus Touch
HTC Vive
- Used touch surfaces which could be displayed by a little, rather than a joystick
  - Very different experience
Valve Index
- Strapped to your hand: don’t need to actively hold the controller
- Had finger tracking

Motion-Capture/Tracking Systems:

Probably won’t be that common in the future: people will just use inside-out tracking built into headsets
Used in movies/TV and games
- Capture actual motion and re-use
Can be done interactively or offline
Can capture three or more DoF
- Positions orientation, limbs
No good general purpose approaches to high-fidelity, full-body tracking without markers
Attempts:
- Magnetic tracking
  - Transmitter creates a magnetic field
  - Wired receivers stuck to clothes
  - Receivers tracked (relative to the transmitter) using changes in magnetic field
  - Pros:
    - Fairly lightweight
    - Six DoF
  - Cons:
    - Very noise near ferrous metal (e.g. rebar)
    - Limited range
- Ultrasonic tracking
  - Speakers emitting ultrasonic sound, captured by receivers containing microphones
  - Used to compute distance
  - Receivers had to point in the right direction (hemisphere)
  - High resolution and accuracy
  - Requires ‘line-of-sight’
- Inertial trackers
  - Accelerometers, gyroscopes
  - Pros:
    - Lightweight
    - Wireless
  - Cons:
    - Error accumulates
    - Only moderate accuracy
  - e.g. Wii MotionPlus
- Optical tracking
  - Multiple fixed makers
  - Known camera parameters
  - Inside-out tracking: camera attached to user, lab-mounted fixed landmarks
  - Outside-in tracking: cameras attached to lab, landmarks attached to user
  - Active vs passive:
    - Active: markers are lights
    - Passive: reflective markers with external light
  - PlayStation MOVE:
    - Stick with illuminated ball of known size and color
    - Camera tracker + internal tracker used for tracking
  - Leap Motion:
    - Three IR LEDs for illumination
    - Stereo cameras used for depth
- Hybrid tracking
  - Compensate negative characteristics of one approach with another

Other Input Devices:

Speech input
Gestures (e.g. pointing at an object)
Device actions (e.g. buttons, joysticks)
Head/gaze, eye blinks
- ‘Put that, there’: hybrid speech + gesture

Special Purpose Input Devices:

Some applications are more ‘real’ with a device that matches the real action
Examples:
- Light gun
- Flight simulator motion platform
- Snowboard/surfboard
- Pod racer
- Motorcycle
Sensors are very cheap today: you may be able to simply attach some sensors to a passive object

Interaction in VR

Mapping Devices to Actions:

For each (user, task, environment)
- For the four basic VR tasks
  - For each device DOF
    - Choose a mapping to an action

VR interaction:

Must take advantage of people’s real-world experience
And for those without real-world analogues, allow users to express their intent
Without making people tired
Without making people sick
While making it easy to learn and use

Main interaction tasks (Bowman et al.):

Object selection/manipulation:
- How does the user select the object they wish to manipulate?
- How do they actually manipulate it?
Navigation:
- Wayfinding (mental): where am I now, and how do I get to where I am going?
- Locomotion (motor): how do I travel there?
System control:
- Changing system parameters
- Manipulating widgets
  - Lighting effects
  - Object representation
  - Data filtering
- Approaches:
  - Floating windows
  - Hand-held windows
  - Gestures
  - Menus on fingers
Symbolic input:
- Text/number input
Avatar control (Lindeman):
- How do you control you?
And throwing things 😆

Objects:

Issues:
- Ambiguity when there are multiple objects the user could be pointing to
- Distance
- Selecting multiple objects
- Releasing objects
Selection approaches:
- Direct/enhanced grabbing (latter: items further away than arms reach)
- Ray-casting
- Image-plane
Manipulation approaches:
- World in miniature (WIM): miniature world representing the world you are in
  - Can you pick yourself up?
- Skewers
- Surrogates
Modifying objects:
- Choose among object properties
- Natural mappings of actions to changes
- Arbitrary mappings

Object selection in the real world:

Touching/grabbing
Pointing
- Finger: direct
- Pointer: extended
- Mouse: indirect
Voice: ask someone
Context
Eye gaze

Selection-task decomposition:

Indicate:
- Denote which open we intend to select
- On desktop: move mouse
- In VR:
  - Avatar hand-movement
  - Device movement
  - Virtual ‘beam’ TODO raytracing
Confirm:
- On desktop: mouse click
- In VR:
  - Click
  - Dwell (timeout)
  - Verbal cue

Reaching objects:

Indicating at a distance
- Go-go: greater than 1:1 mapping when arm more than a certain distance away
- Two-handed pointing
- World in miniature
- Flashlight
- Voodoo dolls
Image plane technique: user pinches object, determine XY location on the image plane and then use determine front-most object at that point

Manipulation:

Typical tasks:
- Repositioning
- Rotation
- Property modification
Approaches
- WIM
- 3D widgets:
  - Virtual sphere for rotation
  - Jack for scaling
- Non-isomorphic translation/rotation
- Skewers
- 2D widgets

Design Guidelines:

Use existing techniques unless you really need to
Match the interaction technique with the device
- Use task analysis (e.g. does the task need high precision?)
Use techniques that can help reduce clutching
- When dragging with mice: if you reach the end of the mouse pad, you need to grip and pick up the mouse, then move it to the opposite end of the mouse pad
Non-isomorphic techniques are more useful and intuitive
Use pointing techniques for selection; virtual hand techniques for manipulation
Use grasp-sensitive object selection
Constrain degrees of freedom when possible
- Fewer mistakes, less annoyance
There is no single best interaction technique: just test, test, test

Research papers:

Object Impersonation (Wang, IEEE VR 2015)
- Open headset: user can simultaneously use tablet
- User becomes the object:
  - e.g. moving a light: user’s head becomes the light, can turn head to position beam
  - e.g. paving a road: the path you move in becomes the road
Navigation: wayfinding
- Easy to get lost in VR
- Traditional tools:
  - Maps (North vs forwards up)
  - Landmarks
  - Spoken dierction
- Non-traditional
  - Callouts
  - Zooming
Navigation: travel
- Limited physical space, possibly infinite virtual space
- Different travel types:
  - Walking, running, turning, strafing, back stepping, crawling, quick start/stop, driving, flying, tale_porting
  - Also lying down, kneeling, ducking, jumping
- Travel isn’t the goal: usually doing other things while travelling
Initial Exploration of a Multi-Sensory Design Space:
- Spinning chair: user leans forwards/backwards to move
- Fans opposing direction to travel to simulate feeling of movement
Finger Walking (Yan et al., 2016)
- Short distance: finger tapping on a touchpad to ‘walk’
- Long distance: two fingers (like feet on a hoverboard), force determines speed and angle between fingers determines angle
TriggerWalking (Sarupuri et al., 2016):
- Tap controller trigger to walk: controller orientation controls moment direction
System Control using Hybrid Virtual Environments (Wang, 3DUI 2013):
- Open headset
- Tablet taped to arm used for selecting objects
- Drop items in the scene in VR
Avatar Control
- Paul Yost
- IMU controllers attached to arms, feet for full body tracking

The ‘optimal’ interface depends on:

The capability of the user:
- Dexterity
- Expected level of expertise
The task:
- Task complexity
- Granularity: how precise does the input need to be? -The environment:
- Stationary, moving, noisy environments

07. Interaction in AR

Stephan Lukosch

AR Interface Foundations:

AR requirements:
- Combining real/virtual images: display technologies
- Interactive in real-time: input & interactive technologies
- Registered in 3D: viewpoint tracking technologies
AR feedback loop:
- User input, camera movement
- Pose tracking
- Registration of virtual content
- Situated visualization
Augmentation Placement:
- Relative to:
  - Head
  - Body
  - Hand
  - Environment: tables, walls, mid-air
Displays:
- Head-mounted (glasses)
- Hand-held projector
- Hand-help display (smartphones)

Designing AR system = interface design which satisfies the user and allows real-time interaction

Interacting with AR content:

Augmented reality content is spatially registered: how do you interact with it?
By touch:
- Hololens clicker: one button controller
By raycasting:
- Cast a ray passing through eye and controller
By hand tracking:
- Hand is recognized and mapped to a hand model
- Gestures (e.g. pinch) allow interaction
Body tracking
- Skeleton tracking provides whole-body input
- Requires some sort of tracking system (e.g. external cameras)

Evolution of AR interfaces

Expressiveness and intuitiveness has increased over time:

Browsing:
- 2D elements registered to real-world content
- For visualizing; limit interaction with the content
- Mostly hand-held devices
3D AR:
- Allows manipulation of 3D objects anchored in the real world
- Dedicated controllers, head-mounted displays, 6DOF tracking
- One of the most important interaction classes within AR
- No tactile feedback: just visual
Tangible UI:
- Rekimoto, Saitoh, 1999
  - Virtual objects projected onto a surface
  - Physical objects used as controls for virtual objects
  - Supports collaboration
- Ishii and Ullmer, 1997
  - Tangible bits
- [Augmented Groove, 2000]:
  - Mapping physical actions to MIDI
- Limitations:
  - Difficult to change object properties
  - Limited display: projected onto a surface or screen
  - Separation between object and display
- Advantages:
  - Natural: user’s hands can be used to interact with both real and virtual objects
  - No need for special purpose input device
  - User intuitively knows how to use the interface
Tangible AR
- Tangible interfaces have a tangible gap: interaction and presentation are on 2D surfaces. However, there is no interaction gap: same input devices can be used for physical and virtual objects. Tangible AR tries to close both gaps:
  - Physical controllers for moving virtual content
  - Support spatial 3D interaction
  - Support multi-handed interaction
  - Time and space multiplex interaction
    - Space-multiplexed:
      - Many devices with one function
      - More intuitive, quicker to use
      - e.g. (physical) toolbox
      - Poupyrev et. al., 2003
        
        Different functionality assigned to markers
        
        Opaque functionality: couldn’t tell what the marker would do by looking at it
    - Time multiplexed
      - One device with many functions
      - Space efficient
      - e.g. mouse
      - VOMAR:
        
        Catalog book; tap a paddle against a page/section to choose the functionality of the paddle
Natural AR:
- Use of natural user input: freehand gestures, body motion, gaze, speech
- Multimodal input: not all input methods are appropriate in all situations
- HITLabNZ spider demo:
  - Overhead camera with depth captures real-time hand model
  - Can get spider to crawl over your hand
  - Presence: how believable the virtual content is to the usre
- Hololens 2:
  - Continuous 3D hand tracking
    - Hololens sometimes overlays blue hand over user’s hand in order to reassure them that the tracking is working without distracting the user by overlaying the virtual hand for the whole time: less is better
  - Gesture-driven interface
  - Speech input:
    - Commands applied to the object you are currently looking at (gaze tracking)
    - Good for quantitative input (numbers, text)
      - Precise input difficult in AR

Designing AR Systems

Basic design guidelines:

Provide a good conceptual model and metaphor
- True for any kind of user interface
Make things visible:
- If an object has a function, then the user interface should show it
- Even if a function is obvious, the user may not realize that the system supports this
Map interface controls to the customer’s model
- Not that of the system implementation
Provide feedback: WYSIWYG
Interface components:
- Physical objects
- Interaction metaphor
- Virtual objects

Affordances

Objects are purposely built: they include affordances and make them obvious.

Affordances: an attribute of an object that allows people to know how to use it

Physical affordances:

Chairs are to sit
Handles are to twist and pull
Scissors are to cut
Surface Dial: we expect circular objects to be spun

Interfaces:

Virtual objects do not have ‘real’ affordances
They are better conceptualized as ‘perceived’ affordances
- Based on people’s prior experiences
- Common/repeated metaphors become ingrained in users

Augmented reality:

Physical: tangible controllers and objects
Virtual: virtual graphics and audio

Case Studies

Navigating a spatial interface:

Menu displayed over hand; other hand used as pointer
Menu attached to marker held in one hand; other hand used as pointer
Interaction with cylinder: rotate to select
One hand interaction: gesture to select?
Place marker on surface: another marker used for selection
- Menu fixed at one location: stable

Workspace awareness in collaborative AR:

Local player solving puzzle
Remote instructor gave advice in AR
- Knew solution, gave advice either visually or aurally
- Audio made users more aware of the instructor’s actions, but was also more distracting
- Visual allowed players to follow instructions better than audio
HMD had very limited FoV: local player may not notice when instructor marked a piece

Depth perception in AR:

Object selection with HMD
Tried reducing brightness of background and blurring: neither worked out
Lighting plays an important role in depth perception

3D AR lens:

Magnifying glass with physical handle and a marker where the lens would be
When in AR, acted like a real magnifying glass

Magic book:

3D model shown in book using marker/texture tracking

Interaction Design

The process of:

Discovering requirements, designing to fulfil requirements, produce prototypes and evaluates them
- Often requirements will be conflicting: you must make trade-offs that will best suit your future users
- Focus on users and their goals
- Trade-offs to balance conflicting requirements
Approaches:
- User-centered design: user knows the best and guides the designer; the designer translates user needs and goals
- Activity-centered design: focus on user behavior around tasks: their behavior determines the goals
- System design: system is in the focus and sets the goal
- Genius design/rapid expert design: design based on the experience and creativity of the designer

Solving the right problem:

Engineers and business people are trained to solve problems
Designers are trained to discover problems
We should rather have no solution than a brilliant solution to a non-existent problem
Designers should:
- Never start by trying to solve the problem
- Start by trying to understand what the real issues are
- Diverging upon a solution
- Studying people, their needs and their goals

Double diamond of design:

Four phases of design:
- Diamond 1:
  - Discovery:
    - Understand the problem. Never assume what the problem is
    - Talk to the users
  - Define: use insights from discovery phase to describe and define the problem
- Diamond 2:
  - Develop:
    - Explore alternatives solutions to the problem
    - Seek inspiration from elsewhere
  - Deliver:
    - Test out the solutions; give them to the user and gather feedback
    - Reject under-performing solutions; improve promising ones
- Loop through each diamond as many times as required, and return to the start if required
Principles:
- Put people first: understand the people using the service; their needs, strengths, aspirations
- Communicate visually and inclusively: help people gain a shared understanding of the problem and ideas
- Collaborate and co-create: work with others
- Iterate: spot errors early

Involving users:

Expectation management:
- Must give them realistic expectations: no surprises, no disappointments
- Timely training
- Communication, but no hype
Ownership:
- Make users active stakeholders
- More likely to forgive/accept problems
- Can make a big difference in the acceptance and success of the problem

Interaction design:

Discover requirements
Design alternatives
- Which may lead to requirements being refined
Prototype alternatives
Evaluate the product and its user experience throughout
- If it sucks, this means that the requirements were probably incorrect

Practical issues:

Who are the users?
What are the users’ needs?
How do you generative alternative designs?
How do you choose among alternatives?
Who are the stakeholders?
- They can influence the success/failure of the project, so involve them and keep them happy

What are the users’ needs?

Users don’t know what is possible
Instead:
- Explore the problem space
- Investigate user activities: see what can be improved
- Try out potential ideas

Alternative generation:

Humans tend to stick with what works
Considering alternatives: the design space, helps identify better designs
Where do alternative designs come from?
- Flair and creativity: research and synthesis
- Cross-fertilization of ideas from different perspectives
- From users
- Product evolution based on changed use
- Inspiration from similar and different products/domains
Balance constraints and trade-offs
Morphological charts:
- List the functions: what does the product need to do?
  - e.g. for a beverage container, it must contain the beverage, provide access to the contents, and display product information
- Then for each function, list its means:
  - Methods of addressing the functions/user needs
  - e.g. for the beverage container, access to the contents could be done through a pull tab, straw, or a cap
- Pick one means for each function
  - Not every combination will be practical or possible

Choosing between alternatives:

Interaction design focuses on externally-visible and measurable behavior
Technical feasibility
Evaluation with users or peers
- Use prototypes, not static documentation: behavior is key
A/B testing:
- Defining appropriate metrics is non-trivial
Quality thresholds:
- Different stakeholder groups have different quality threshold
- Use usability, user experience goals to defined criteria

Prototyping:

Allows the designer and their users to explore interactions and capture key interactions
Focuses on use experience
Communicates design ideas
Learn through doing
Avoids premature commitment

Typical development:

Sketching
- Helps to express, develop and communicate design ideas
Storyboards
UI mockups
Interaction flows
Video prototypes
Interaction prototypes
Final native application

Low fidelity prototypes:

Low development cost allows evaluation of multiple design concepts
Limits the feedback you can get: error checking, navigational and flow limitations

High fidelity prototypes:

Fully interactive
Has look and feel of the final product
Clearly defined navigational scheme
Much higher development cost
Sunk cost bias: more reluctant to make changes given the time/effort

08. Collaboration in Mixed Reality

Tuckman’s model of group formation:

Forming

Orientating themselves around the task at hand
Become acquainted with each other
Testing group behaviors
Establishing common viewpoints, values
Establishing initial ground rules

Storming

Marked by intense team conflicts
Leadership and roles determined
Project and tasks redefined
Characteristics:
- Disagreements
- Resistance to task demands
- Venting of disagreements
- High level of uncertainty about the goals

Norming

Team roles cleared up
Agreement on how the team can work with each other
Clear expectations and consensus on group behaviors and norms
Consensus on group goals, quality standards
Forming the basis for behavior for the remainder of the project

Performing

Active work on a project
Clearly understood roles, tasks, and well-defined norms
Sufficient interest and energy from all team members

Adjourning

Dissolution of the team: team tasks are accomplished and the team disbands
Possible feelings of regret

Drexler’s team performance model:

Orientation: why I am here?
Trust building: who are you?
Goal clarification: what are we doing?
Commitment: how are we doing it?
Implementation: who does what, when, where?
High performance
Renewal

Collaboration

Definitions

Wood and Gray, 1991: a process that occurs when a group of stakeholders engage in an interactive process using shared rules, norms and structures to act or decided on issues related to that domain
Terveen, 1995: a process in which two or more agents work together to achieve a shared goal
Knoll and Lukosch, 2013: an interactive process in which a group of individual group members use shared rules, norms and structures to create or share knowledge in order to perform a collaborative task

Designing collaboration

Collaboration is affected by internal and external factors:

The group: size, proximity, experience
Task: type and complexity
Context: organizational culture and environment
Process: interactive process, shared rules, norms
Tools: technology and their limitations

Collaboration outcomes:

Creative ideas for activities
Shared understanding
Commitment
Consensus
Sharing perspectives and visions
More objective evaluation
Acceptance
Mutual learning
Shared responsibility

e.g. using AR to help people understand impacts of climate change.

Collaboration Challenges

Piirainen et al., 2012 - group perspective:

Shared understanding:
- Ensure the team has a shared understanding and mental models of:
  - The problem
  - The current state of the system
  - The envisioned solution
Satisfying quality requirements/constraints
Balancing rigor and relevance
- The more formal the process, the slower you go but the more you can involve and understand stakeholders
Organizing and ensuring effective, efficient interaction between actors
Ensuring ownership
- Team members must pick up tasks and take ownership of them

Nunamaker et al. 1997 - process perspective:

Free riding
- Especially in larger groups
Dominance
- Both the amount of work done and of decision-making power
Group think
Hidden agenda
Fixed design
- Process limits the design space the group can explore
Lack of expert facilitators

Haake et al., 2010 and Olson and Olson, 2000 - tool perspective:

Google Docs, email, video conferencing, etc.
No regular use
Variety
Not intuitive
Difficult to adapt to group needs
Collaboration awareness
- Being aware of when other people have made changes
Co- and spatial referencing

Collaboration Design from a Tool Perspective

Time-space matrix of Computer-Supported Cooperative Work (CSCW):

Same place, same time (synchronous interaction): face-to-face interaction
Same place, different time (asynchronous interaction): shared files, team rooms etc.
Different place, same time (synchronous distributed): video calls, shared editors etc.
Different place, different place (asynchronous distributed): email, newsgroups etc.

In AR:

Synchronous, co-located: AR shared space
Synchronous, remote: AR telepresence
Asynchronous, co-located: AR annotations/browsing (in-situ)
Asynchronous, remote: generic sharing

3C model:

Communication: information exchange to facilitate a shared understanding
Coordination: arranging task-oriented activities
Collaboration: working together towards a shared goal
Group awareness mediates relation: none of the other 3Cs are possible without it

Human-Computer-Human Interaction Design

Software design: software interacts with other software
Human-computer interaction design: humans interacting with computers
Human-computer-human interaction design: several humans in front of several computing devices working together towards a shared task
- Computers must interact with each other, and humans must interact with each other as well

Oregon Software Development Process (OSDP) (Lukosch, 2007):

Oregon Experiment, Christopher Alexander:
- University campus did not put down any footpaths initially, but waited to see what trails the students would make
Patterns for computer-mediated interaction:
- High-level patterns:
  - Focus on issues and solutions targeted at end users
  - Empower end users to shape their groupware application
- Low-level patterns:
  - Describe issues/solutions targeted at software developers
  - Focus on system implementation and includes technical details
- Example: remote field of vision
  - Collaborative whiteboard/canvas: need to know where team members are and where they are looking at
  - Possible solution: multi-user scrollbar (multiple narrow scrollbars)
    - Understand where their team members are both globally and relative to themselves
    - Users can see roughly how much of their screen space intersects with another user’s, and where
Iterations follow design -> implementation -> test/usage -> planning cycle
Conceptual iteration
- Talking to users, understanding the problem space, creating prototypes
- Use of patterns to discuss high-level ideas with users
- Developers use the low-level patterns tODO
Development iteration: TODO
- Requirements analysis
- Low-level patterns used to plan and design groupware
- Functional tests
Tailoring iteration:
- Users have used the prototypes and have provided feedback

Workspace Awareness in Collaborative AR

Types:

Informal awareness:
- General sense of who is around and what they are up to
- Not necessarily related to project work
Social awareness:
- Understanding of the person:
  - What they are interested in
  - Their emotional state
  - What they are paying attention to
Group-structural awareness:
- Knowledge about the group structure:
  - Roles/responsibilities/status
  - Positions on issues
  - Group processes
Workspace awareness:
- Understanding of the task space
- Interaction of others with the space and its artifacts

Awareness categories and elements:

Who:
- Presence: is anyone in the workspace?
- Identity: who is participating?
- Authorship: who is doing that?
What:
- Action: what are they doing?
- Intention: what is their goal?
  - They are doing x in order to achieve y
- Artifact: what object are they working on
Where:
- Location: where are they working?
- Gaze: where are they looking?
- View: where can they see?
- Reach: where can they reach?
  - Can children or short people access it?

Workspace awareness:

Knowledge: who/what/when/when/how
- is used to determine what to look for next
Exploration
- is used to gather perceptual information
The environment
- aids in interpreting the perceptual information
Knowledge
is used to help with collaboration
- Coordination of activities
- Anticipation of events
which impacts the environment

Case Studies

Workspace awareness in collaborative AR:

Remote expert can see what the player can see and give hints on how to complete the puzzle
Expert given a gray box which represents the size of the Hololen’s display
Expert can freeze the view:
- View is continually changes, which makes it difficult to focus
- Hence, they should be able to freeze the view (and annotate it), possibly in a separate window
The remote person must be able to communicate to the local person that they have made changes or annotated something: workspace awareness
They can, of course, talk, but the paper tried adding automatic notifications:
- Aural: TTS when the remote user adds/selects/deletes and object, or when they freeze/unfreeze the view
- Visual: small blinking icon
Results:
- Audio is much more noticeable, but also more annoying
- Participants preferred visual notifications

A collaborative game to study presence and situational awareness in a physical and an augmented reality environment:

Game played in two environments (physical/augmented reality)
Investigate how a remote person can try to help a local team
Three players had to build a Lego tower following certain constraints:
- Each player has access to a subset of the constraints
- Each player could only move certain-colored blocks
  - But also some blocks that everyone could move
Two players co-located, one player remote
- Co-located users wearing HMDs, remote user viewing laptop
The same group also played it physically (with randomized order)
Asked AR presence questionnaire:
- Interaction/immersion
- Interference/distraction
- Audio/tactile experiment
- Moving in environment
Results:
- Mental demand not significantly different
- Physical demand in AR higher
  - Finger has to hover in midair
- Slower in AR
- Presence:
  - Interaction in AR much more difficult, and impacts concentration
  - Difficult for remote player to understand and foresee the other people’s actions
  - Co-located AR players reported tactile experience (even though it was completely virtual)

CSI The Hague:

Collaboration with Hague police circa. 2009
Special skills required to secure evidence
Need to capture evidence early on, but collector is likely not an expert
Expert could remotely help the on-site person
Video see-through HMD
Two webcams used for SLAM:
- 3D pose estimation
- Dense 3D map
  - Remote user could explore the space in VR
Bare hand tracking for gesture-based interaction
Evaluation:
- Lack of protocol for collaboration
- High mutual understanding
- Picture-oriented information exchange
- High consensus: both parties can see the same video stream
Data integrity: how do you ensure it has not been modified
Responsibility: if the crime scene gets messed up, who is responsible - the local person or the expert?

Burkhardt et al., 2009: seven dimensions of collaboration:

Fluidity of collaboration: verbal turns (cues?)/actions
Sustaining mutual understanding
Information exchange for problem-solving
Argumentation and reaching consensus
Task/time management
Cooperative/collaborative spirit in the team
Awareness of their individual tasks and contribution

09. Creating Multiple-Sensory VR Experiences

Part 1: Yuanjie Wu

Yuanjie Wu, post-doc researcher at HIT Lab.

(currently in Auckland).

Senses

Creating a realistic experience must provide a multi-sensory experience and create a sense of presence.

A VR system can be modeled as a loop of:

Input: data coming into the system from the user
Application: physics simulation, user interaction
Rendering: transform of data in computer-friendly format into human-friendly format - visual, aural, haptic, olfactory, gustatory
Output: feedback perceived by the user

What is ‘input’ and ‘output’ depends on the point of view: the system or the human.

Subjective reality: the way an individual experiences and perceives the external world in their own mind.

Brains consciously and sub-consciously find patterns. The sub-conscious can be thought of as a filter that only allows information that does not conform to the patterns to pass through.

Perceptual illusions provide insight into some of the shortcuts the brain makes:

Jastrow and Ponzo railroad illusion: brain can misinterpret size
Moon illusion: moon appears larger when on the horizon (compared to high in the sky) as there are foreground items that can be used as a frame of reference.
Ouchi illusion: rectangles appear to move

Mental models: NLP (neuro-linguistic programming)

External stimuli (senses) pass through
Filters, which delete, distort and generalize the information
- Based on meta programs, values, beliefs, attitudes, memories, decisions
Which consciously and unconsciously impacts the person’s
Internal state:
- Mental model
- Emotional state
- Physiology

VR research problems:

Avatars
Tracking
Cybersickness
Locomotion
Navigation
Perception/cognition
Social dynamicsoSafety
Ethics
Sensory delivery:
- Tactile (e.g. force feedback, temperature, pressure)
- Olfactory/gustatory
Evaluation metrics
Interaction/manipulation
Latency/FOV
Fatigue

Multi-sensory VR systems

Sub-systems:
- Stimulation of the senses
  - Requires specific hardware, software and protocols
Data processing
- Pre-processing:
  - Filtering
  - Serialization
- Transmission
Integration: combining all data into one rendering system
- Data fusion
- Application

Subject wearing HMD in a cage:

Enough space to walk around a little
External cameras track position
Fans mounted on cage used to direct wind
- Aroma diffusers using multiple scent bottles
Speakers attacked to floor used for vibration
- e.g. simulating off-road driving

Avatar system:

Control system
- Full body tracking with multiple Kinect cameras
  - Needed to estimate orientation - Kinects could not determine if they were looking at the person’s front or back
- Leap motion attached to the headset for natural hand tracking
  - Limited tracking range: users had to put their hands directly in front of them
  - Fix: stick 5 Leap motion sensors onto the headset
- HTC Vive lighthouse used for HMD positioning?

Realism:

Appearance realism
Behavior realism
- Verbal behavior
- Non-verbal behvaior
  - Body movement, facial expressions etc.

Part 2: Rory Clifford

Dr. Rory Clifford, post-doc research fellow at HIT Lab.

Focus on training simulations, cultural preservation.

What creates a profound VR experience?

Emotion
Sound
Movement
- Makes users feel present and localized within the space

In the first 30 seconds, you must:

Grab the person’s attention
- e.g. flashing light to grab the user’s attention
Provide affordances to navigate the environment
- They may be going the wrong way
- Although both diagetic and non-diagetic affordances can be used, diagetic cues keep the user more immersed
Provide a natural and intuitive method of interaction

Sound:

Induces mood
Deepens the presence
Adds believability
3D spatial sound especially deepens immersion
- Can also help with UI problems like navigation and discovery
Don’t over do it

Movement:

Movement types:
- Teleportation
  - 360 video:
    - Quick and easy way to produce VR content
    - Can only teleport to pre-defined positions
- Gaze-based
- Physical controls
  - e.g. replica of steering wheel

Smell:

Olfactory sensory system
Direct connection to brain through crainal nerves: most other sensory input passes through hypothalamus - an additional step of processing
Must limit amount of smell to prevent simulator sickness
Can trigger memories
- Theory: help users remember VR training when in actual scenarios

Vibro-tactile feedback:

e.g. jolts, earthquakes, engine vibration
Low-frequency audio passing through subwoofers or audio transducers
Can be external (e.g. floor or other hard surface) or fitted (e.g. vest)
- Vests: portable, but users are aware that the vest is there, reducing immersion

Haptics:

Independent of the sound channel
Assists with spatial awareness and helping anchor the user in virtual space
More control over the vibration (supported in game engines)

Fire Emergency NZ (FENZ):

Arial firefighting training
- Can only train once a year before fire training
- Can’t exactly start fires for training
- Expensive: requires several aircraft
Projector-based windows
Headsets with multiple simulated audio channels mimicking real headsets
Vibro-tactile feedback in chairs

Modeling the real world:

Photogrammetry:
- Low accuracy but provides good textures
- Requires cleanup to reduce number of polygons
LiDAR:
- High-accuracy, high-polygon count
- Camera used for texturing, but not great - should be combined with photogrammetry

10. Human Perception and Presence in MR

Rob Lindeman, Director HITLab NZ.

In popular media:

UI about complementing the character: their personality, proficiency in technology, basic scene state (e.g. red blaring lights for bad information coming).
Impressions matter
Flow is a good concept to study
Popular media can give us good ideas

Terms:

Presence: sense of ‘being there’
Immersion: being surrounded
Flow: heightened state of awareness/action
Situation awareness: clear understanding of surroundings
Natural interaction: interaction that recedes into the background
- Low cognitive load

‘Being there’

What does it mean to ‘be here’?

Experience of going through some process to get to a place (e.g. walking through the door)

What does it mean to be together?

Eye contact with others, talking, shaking hands

How can we re-create these using technology?

In a real environment, we can use:

Hand-held mobile device
- Phones/tablets
In-vehicle system
- Navigation/traffic
Augmented reality
- There++: augmenting reality

For a remote physical environment:

Phone
Video conference
- Eye contact difficult: looking at the camera (for eye contact) means you can’t see others
Teleoperated robots
- Allows movement and possibly even manipulating the environment
Drones

In virtual environments:

Video games: FPS, MMOs
- Can be present even without VR
- Multiplayer games mimic physical co-presence
Immersive learning environments
- e.g. immersive chemistry
Surgical simulations
- Allows more precision and manipulators
- Allows training on simulated data

In described environments:

Movies
Books
- As long as you have the essence, the brain is able to fill in the blanks through their imagination
- However, everyone imagines a different scene: this can lead to disappointment when a book is adapted into a movie

Game Design

What makes a good game?

A great game is a series of interesting and meaningful choices made by the player in pursuit of a clear and compelling goal

Sid Meier

‘Natural Funativity’:

Survival-skill training
Needs to have the player develop a set of skills with increasing levels of difficulty
Putting them to the test: missions, quests, levels etc.
Prize at the end (or in the middle)
- e.g. unlocking items, badges, leaderboards

Game structure:

Movies:
- (typically) have a linear structure
- Are fixed - controlled by the writer/director/cinematographer
In comparison, games must provide ‘interesting and meaningful choices’
- User must be in control
- Not fun to die due to circumstances outside your control
Choices must make sense in the context of the story

Flow

Mihály Csíkszentmihályi, Flow: The Psychology of Optimal Experience (1990):

Hightened sense of perception
Highly focused on the primary task
In the ‘sweet spot’ between frustration and boredom Occurs in athletes, writers, video gamers, programmers For game design:
The ‘sweet spot’ for difficulty is relatively large
Game difficulty must match the player skill (and increase over time)
- But if it matches exactly, this itself will cause the player to get boredom. Hence, difficulty should oscillate slightly

Convexity of game play:

Provide a choke point: all paths, regardless of what the player chose, should lead to a single result
- e.g. bosses at the end of game stages, story progression
- The number of choices available after every iteration should increase
  - e.g. unlocked items, skills, regions to explore
This addresses the narrative paradox: writers can create a complete story while providing players with (the perception of) choice

Flow and convexity can be combined:

The choke point should be at the higher end of difficulty
After the choke point, choices can be provided to the user and the difficulty slightly decreased (relative to the player’s skill)

flOw

By Jenova Chen (Thatgamecompany)
Adaptive difficulty: game tries to determine player skill and adaptively change the difficulty level to match

Characterizing Flow:

A challenge activity that requires skills
The merging of action and awareness
- Tight coupling between actions and responses
Clear goals
Direct feedback
Concentration on the task at hand
Sense of control
Loss of self-consciousness
Transformation of time

Immersion

Immersion:

To completely surround/envelope it
- e.g. swimming, intensive language course
Affects all the senses
- Sound can be as important as the visuals
- Also need to consider touch and smell
How can we immerse MR users?

Haptic ChairIO (Feng et al., 2016):

Chair that looks like a joystick
And acts like a joystick: it leans, and tilt sensors can be used as input
HITLab added vibration floor, pan-tilt fan units:
- Combined with VR headset for audio/video
- Footstep vibrations and fans (wind from the ‘motion’) provide movement cues
- Non-fatiguing: sitting down, hands free to do other work
- Clear mapping of seat movement to camera movement

Natural interaction:

Recedes into the background:
- Low cognitive load for interaction techniques
- Stimuli/feedback can be easily digested
- Low cumber
Multi-sensory feedback
Multi-modal user input
- e.g. ‘put that over there’: combines pointing (gesture) and voice
Hybrid ways of executing commands
- Interactions should evolve with the user
- Provide scaffolding to novices
- Provide fast and efficient interactions for experts

Personal experiences:

We all filter our senses
Variations in eyesight, hearing etc.
Different childhood experiences
Different moods

Presence

Types of presence:

Presence: sense of ‘being there’
- How virtual characters react to you
- The depth of the interactions with the environment
  - Can you turn on the tap? Open a cupboard? Pick up a cat?
  - Every interaction has a cost, both in terms of development and performance
- The invisibility/naturalness of the interface
- The lack of distractions (e.g. cables)
Co-presence: ‘being there together’
- Multiple people can be in the same shared space without feeling ‘together’
Tele-presence: ‘being over there’
- Remotely present in a partially physical space
Tele-co-presence: ‘being over there together’?

Measuring presence:

How can be measure if someone feels ‘present’ in a game or other virtual environment?
How can we measure the depth of presence?
Methods:
- Questionnaires
  - Slater Usoh Steed
  - Witmer & Singer
  - Questions must be written carefully and validated
    - Ensure they are unambiguous
  - Measurement is done after the fact
- Behaviors
  - Watch the user and see how they react
    - If you throw something at them, do they duck?
    - If they get hit, do they scream?
    - Will they refuse to walk off a ledge?
  - Hard to measure the depth of presence (but easy to see it)
  - Issue: you may need to invent/incorporate events
- Physiological measures
  - Possible metrics:
    - Heart rate
    - Sweat (galvanic skin response or skin conductance)
    - Breathing rate/regularity
  - Hard to fake
  - Issues:
    - Some measures take time to settle
    - May need to calibrate to a baseline
    - Need to wear sensors

The Real World

The real world is great:

Fast update rate
Multi-modal rendering
Really good physics
Nearly infinite fidelity
Can handle massive numbers of objects and players
- Realistic crowd behavior
Minimal lag

Hence, it is useful to use existing things from the real world: this makes AR easier than VR in terms of fidelity.

But beyond perceptual, there is:

Anticipation
Expectations
Previous experiences

We can tap into experiences already anchored in the mind of the user: provide the essence and let the brain fill in the details, or plant new experiences: seeds that can grow and become scaffolding for future experiences.

To do this:

Prime the user to expect what you are about to show
- A VR experience starts long before the physical experience:
  - Advertising
  - Word of mouth
  - To plant the seed, tell give them some specific information: this reduces variability between users.
    - e.g. while you wait in line at a Disney park, you are shown videos, newspaper clips describing the backstory etc. which immerse you and reduce perceived wait time
Remove all distractions
- Non-interactable objects (e.g. cupboards that you can’t open)
- Lack of interaction precision
- Fatigue
- Bumping into cables
- Wearing a lot of gear

The myth of technical immersion:

Technology is not necessary to achieve immersion
- Books are very low-tech but can still transport us to fantastic places
Our ‘high-fidelity’ technology is still relatively low-fidelity:
- Leverage the mind to fill in the blank
- e.g. in Alien, you don’t see the alien until the end
- e.g. reading a ghost story at night with a window open: the environment and story are matched
Tasks should be:
- Easy to learn
- Easy to carry out
- Not fatiguing
- Require appropriate precision
  - e.g. movement/velocity control: need both very fine and large movements
- Support appropriate expressiveness

Impossible spaces:

Have a non one-to-one mapping for rotation to redirect walking and effectively increase the size of the virtual space
Change blindness for redirected walking: modify/reconfigure the virtual space when they are looking the other way
- Redirection also works with reaching, touching

Dava Visualization in Mixed Reality

Master in Human Interface Technology (MHIT)

HITLab NZ:

Founded in 2002
Research focuses: VR, AR, applied immersive gaming
Philosophy:

We put people before technology, start with the person, look at all the tasks they are trying to perform, TODO

MHIT:

Application, development and evaluation of HIT
Learn:
- Interface design principles
- Describe/evaluate interface hardware/technology
- Research/development skills
Engage with industry
3 months of course work:
- HITD602 design & evaluation
  - Relationship between aesthetics, function, UX
  - Evaluation of design/experience
- HITD603 prototyping and projects
  - Requirement analysis, engaging with clients/problem owners
9 month thesis project
- Develop prototype
- Run user study
- Write thesis
Requirements:
- BEHons
- Min. B+ grade
Scholarships available: more or less certain that you could get fees-only scholarship
- One student getting stipend from industry
22% of MHIT students remain in academia (enrolled in PhD program)

Data Visualization in Mixed Reality

Immersive analytics (Immersive Analytics, Springer, 2018):

Coping with the ever-increasing amount and complexity of data around us that surpasses our ability to understand/utilize in decision-making:
- Business analysis
- Science
- Policy making
- General public (e.g. personalized health data)
Removing barriers between people, their data and tools used for analysis
Support data understanding and decision-making everywhere by everyone
Allows both individual and collaborative works
Engagement helps support data understanding and decision maknig
Builds upon:
- Data visualization
- Visual analytics
- VR/AR
- Computer graphics
- HCI

Very dependent on availability of immersive technologies:

HMDs for AR/VR
Large wall-mounted, hand-held or wearable displays
ML to interpret user gestures/utterance

Immersive analytics allows engagement:

With wider audience through tools/technologies that more fully engage the senses
With a new generation whose primary input device is not the mouse/keyboard
In situations are desktop computing is impossible
In groups where all participants are equally empowered

Opportunties:

Situated analytics
- User-controlled data analytic linked with objects in the physical world
  - Energy consumption
  - Construction progress
  - Supermarket (e.g. nutritional value of foods, comparison)
  - Instruments in a lab
Embodied data exploration
- Touch/gesture/voice/TUI for more intuitive/engaging data exploration
- Computer becomes invisible to the user
Collaboration: colocated or remote; synchronous or asynchronous
Spatial immersion: 3D (or 2.5D) rather than 2D visualization
Multi-sensory presentation
- Beyond visual/audio (e.g. haptics)
- Augmented cognition
Engagement in data-informed decision-making
- Involve the general public/other stakeholders
- Allows immersive interactive narrative visualizations (e.g. climate change, carbon footprint)

Possible Values of 3D for Data Visualizations

Additional visual channel (3rd spatial dimension) for data visualization:

Prone to occlusion, depth disparity, foreshortening
Studies demonstrate some benefits to this channel

Immersive display technologies have advanced considerably: higher resolution, lower latency, wider range of interaction technologies

Immersive workspaces:

Use the space around you as a workspace
Place data visualizations where you want, anchored to the physical space (or relative to your position)
Beyond task effectiveness:
- Focus not on accuracy/speed
- Does spatial immersion support deeper collaboration, greater engagement, or a more memorable experience?

Depth Cues and Display Technology

Linear perspective:
- Consequence of the projective properties of the eye as a sensor:
- Occlusion: objects closer in space prevent us from seeing objects behind it
- Foreshortening
  - Relative size: two objects of the same size at different distances from the observers project differently
  - Relative density: spatial patterns of objects/visual features appear denser as the distance to the pattern increases
  - Height in visual fields:
  - Objects are bound to rest on the ground
  - Bottom of objects can be used as a reference
Aerial perspective: changes in color properties of objects at large distances
Motion perspective: moving object/observers provide information about 3D structure
Binocular disparity/stereopsis: small differences in the images received by the left/right eye
Accommodation (depth of field):
- Effects of dynamic physiological changes in the shape of each eye
- Amount of blur of the background and other objects provides information about their relative distance
- Dependent on the lightness of the scene
Depth cues:
- Shadows
  - Cue for judging the height of an object above the plane
  - Useful for floating objects
- Convergence
  - Reflex of the visual system: change in rotation of the eyes that takes place to align the object/region of interest in the center of the eyes’ fovea
  - Eye orientation/angle (and differences between the two eyes) can be used to infer short distances
Controlled point of view
- Ability to manipulate the point of view in a virtual space (without physically moving)
- User knows positional changes, expects visual changes
- Relies on touch/proprioception
- Complementary to visual cues
- e.g. moving joystick to move your avatar/camera
Subjective motion
- Actual physical motion in the space of the observer
- Information through the vestibular system (balance, movement detection)
- Complementary to visual cues
Object manipuation
- Change position of objects with respect to the observer
- Trigger motion perspective, changes in other cues
- Does not trigger vestibular signals; uses touch (somatic), motor, priprioception

Limitations of depth perception:

30% of population may experience binocular deficiency
Binocular acuity decreases with age
Line-of-sight ambiguity: rays can only intersect once (occlusion)
Text legibility
- Low resolution of HMDs
- Foreshortening, 3D orientation

Comparing 2D with 3D Representations - Potential Benefits of Immersive Visualization

Cone Trees:

Indented lists/tree structures in 3D, where nodes are arranged in a cone that you can rotate
Linear perspective provides a focus+context view of the tree
3D cues of perspective, lighting, shadows help with understanding
More effective use of display space
Interactive animation reduces cognitive load
Study results:
- Poor representation for hierarchical data: occlusion, slow tree rotation
- May help in improving understanding of the underlying structure

Data mountains:

Arrange documents on a virtual 3D desktop
More objectives on the desktop
- Linear perspective provides focus + context view
Natural metaphor for grouping
Leverages 3D spatial memory
Study results:
- 2D data mountains outperformed 3D, although participants thought otherwise
- 2.5D data (2D + linear perspective) outperform 2D
- i.e. 3D < 2D < 2.5D

Aviation:

Show position and predicted flight path in 3D
Study results:
- Better for lateral/altitude flight path tracking
- Worse for accurate measurement of airspeed
- ATC found it worse for everything other than collision avoidance

3D shapes/landscapes:

3D better for:
Understanding the overall shape
Approximate navigation and relative positioning
2D better for precise manipulation

Network visualization:

3D better for judging if there is a path between highlighted nodes
Motion cues beneficial for:
- Path following in 3D mazes
- Viewing graphs in AR
Egocentric spherical layout of 3D graph with HMD outperforms 2D for:
- Finding common neighbors
- Finding paths
- Recalling node location

Multivariate data visualization:

3D scatter plots better for:
- Distance comparisons
- Outlier detection
- Cluster identification and shape identification
- Answering integrative questions

Spatial and spatio-temporal data visualization:

2D vs 3D representations in VR:
- Exocentric: globe in front of view
- Egocentric: standing inside globe
- Flat map
- Curved map around the user
- Exocentric globe more accurate for distance comparison and estimation
- More time required for task completion compared to maps

Overall:

Clusters/other structures may be clearer in 3D
Sufficient depth cues required for the viewer to see clusters
3D may benefit path following
Binocular ‘pop-out’ may be beneficial for highlighting elements
Using the 3rd dimension to show time is a successful idiom

Summary:

3D not generally better than 2D
3D may show overall structures in multi-dimensional spaces better
2D preferable for precise manipulation or accurate data value measurement
Choice of technology and depth cues can make a significant difference to the effectiveness:
- Binocular presentation, head-tracking increased spatial judgment accuracy
- Binocular 3D beneficial for depth-related tasks: spatial understanding and manipulation

Data Visualization in AR - Situated Analytics

Data visualizations integrated into the physical environment
Needs to take into account the existence of the physical world
Examples:
- Supermarket (e.g. viewing detailed product information, price comparison)
- Attendees at a conference (e.g. displaying name, affiliation)
- Machinery in a lab (e.g. showing progress)
- Objects at a building site

Conceptual model:

The raw data and the visualization pipeline exist in a logical world
Raw data is turned into a visual form fit for human consumption
Data is brought into the physical world through a physical presentation
A physical referent (real-world items) may be present

Physically vs perceptually-situated visualizations:

Physical distance separating a physical presentation and its physical referent may not necessarily match the perceived distance (e.g. visualizing microchip vs mountain)
Spatial situatedness needs to be refined:
- Physically situated in space: if its physical presentation is physically close to the data’s physical referent
- Perceptually situated in space: if its physical/virtual presentation appears to be close to the data’s physical referent (e.g. mountain and its data visualization)

Embedded vs non-embedded visualizations:

Embedded visualizations are deeply integrated within their physical environment
- Different virtual sub-elements align with their related physical sub-elements

Interaction:

By altering its pipeline (e.g. filtering data)
By altering the physical presentation (e.g. moving around, re-arranging elements)
Using insights to take immediate action

12. Evaluating Immersive Experiences

Can simply ask the player for their opinion, but these statements are qualitative.

Through validated instruments that use questionnaires, you can get quantitative data (e.g. on situational awareness, workload).

There are many methods to achieve this:

Usability (System Usability Scale (SUS))
Game Experience Questionnaire (GEQ)
Situational Awareness (Overview and SART)
NASA Task Load Index (TLX)
Simulation Workload measure (SIM-TLX)
Immersive Tendencies Questionnaire (ITQ)
iGroup Presence Questionnaire (IPQ)
User Experience Questionnaire (UEQ)
Game Engagement Questionnaire (GEQ)
Revised Game Engagement Model (R-GEM)
Revised Personal Involvement Inventory (PII)
Flow Short Scale

Engagement

What is engagement?

Some disagreement between academics in what it is and how you quantify it
Benyon et al. 2005:
- Must be accessible, usable and acceptable
- Should provide experiences that pull people in to create experiences that are:
  - memorable,
  - satisfying,
  - enjoyable,
  - rewarding
IJsselsteijn et al. 2008:
- Sensory and imaginative immersion
- Tension
- Competence that is asked of the user
- Flow
- Negative/positive effect on the user
- Challenge

Elements of Flow (Csikszentmihalyi):

Be feasible for the user to complete the task
Allow the user to concentrate on the task
Have clearly defined goals
Provide feedback on the user’s actions
Feel involved in the situation
Give the user control over the situation and goals
Allow for a loss of self-conciousness: stop being aware of themselves
Transformation of time: forget about time passing by
Autotelic experience: activities should be intrinsically rewarding

O’Brien & Toms:

Point of engagement: user decides to use the system based on factors such as:
- Aesthetics
- Novelty
- Interest
- Personal motivations
- Specific/experimental goals
Engagement:
- Aesthetics and sensory appeal
- Attention
- Awareness
- Control
- Interactivity
- Novelty
- Challenge
- Feedback
- Interest
- Positive
Disengagement attributes which prevent users from re-engaging with the system:
- Usability
- Challenge
- Positive affect
- Negative affect
- Perceived time
- Interruptions

Situational Awareness

AR promises the ability to provide additional information to the environment you are in.

Many jobs require high situational awareness to make effective and timely decisions.

Situational awareness (Endsley, 1995):

Level 1: the perception of elements in the environment
Level 2: the comprehension of their meaning
Level 3: the projection of their status in the near future
People make decisions based on their situational awareness, and their actions change the state of the environment, creating a feedback loop
Situational awareness, decision-making and the performance of their actions can be influenced by:
- System capability
- Interface design
- Stress/workload
- Complexity
- Automation
- As well as more individual factors:
  - Abilities, experience, trailing
  - Their ability to process information
Situational awareness and decision-making can be affected by:
- The user’s goals/objectives
- Their preconceptions and expectations

Assessing situational awareness:

Self-rating:
- Non-intrusive: ask questions post-trial
- Subjective
- Situation Awareness Rating Technique (SART):
  - Most well-known self rating system
  - 10 dimensions on a Likert scale from 1 to 7
  - Applicable when:
    - The task is dynamic, collaborative and changeable (e.g. long tasks that can’t be frozen)
    - Task outcome is not known (e.g. real world task)
    - U-(D-S) score:
      - U: summed understanding
        
        Information quantity/quality
        
        Familiarity with the situation
      - D: summed attentional demand
        
        Instability, variability and complexity of the situation
      - S: summed attentional supply
        
        Arousal (alert/ready for activity or low alertness level)
        
        Spare mental capacity
        
        Concentration of attention
        
        Division of attention
Freeze probe:
- Task randomly frozen: questions asked about the current or recent state of the system
- May negatively affect performance
Real-time probe:
- Experts ask questions asked during the experiment
- No task freeze
- Response time indicator of situational awareness
Observer rating:
- Experts observe participants while they do the task and rate their situational awareness

Metrics

NASA Task Load Index (TLX):

Subjective workload assessment tool designed for human-machine systems
Users rate workload on five dimensions:
- Mental demands
- Physical demands
- Temporal demands (how hurried/rushed was the pacing?)
- Performance: success in achieving the task
- Effort/frustration:
  - How much effort did they put into the task
  - How insecure/discourage/irritated/stressed/annoyed were they
Overall workload score takes weighted average, with weights being defined by the experimenters based on their expert judgment

Simulation workload measure (SIM-TLX):

Based on NASA-TLX
Released 2020
Considers degree of immersion, perceptual difficulties, novel methods of controlling the environment
In addition to mental, physical, temporal demands, and frustration, asks:
- Task complexity: how complex was the task
- Situational stress: how stressed were they while perfuming the task
- Distraction: how distracting was the task environment
- Perceptual strain: how uncomfortable/irritating were the visual/auditory aspects
- Task control: how difficult was control/navigation

System Usability Scale (SUS):

10 questions on a Likert scale from 1 to 5:
1. I think that I would like to use this system frequently
2. I found the system unnecessarily complex
3. I thought the system was easy to use
4. I think that I would need the support of a technical person to be able to use this system
5. I found the various functions in this system were well integrated
6. I thought there was too much inconsistency in this system
7. I would imagine that most people would learn to use this system very quickly
8. I found the system very cumbersome to use
9. I felt very confident using the system
10. I needed to learn a lot of things before I could get going with this system
Items 1, 3, 5, 7, 9: take sum of val - 1
Items 2, 4, 6, 8, 10: take sum of 5 - val
Multiply sums by 2.5
‘Good’ usability: average SUS value > 68

Game Experience Questionnaire (GEQ (another also has the same acronym)):

Modular structure with core, social presence, and post-game modules
Likert scale from 0-4; each question assesses one of seven components:
- Competence
- Immersion
- Flow
- Tension
- Challenge
- Positive/negative
Slightly controversial: heavily used and can provide some insights, but was never validated
If GEQ score is low while the usability score is high, it likely means the game is bad
- Bad usability will usually lead to a bad game experience

Igroup Presence Questionnaire (IPQ):

How much an individual believes they are really in the virtual environment (VE)
Constructed with ~500 participants
Three subscales:
- Spatial presence: sense of being physically present in the VE
- Involvement: attention devoted to the VE and the involvement experience
  - Awareness of real world surroundings (e.g. sound, room temperature, other people etc.)
- Experienced realism: subjective experience of realism
14 questions, including a general question that does not belong to any of the subscales
Answered on a Likert scale from 0 to 6 (-3 to 3)
Answers for each sub-scale summed together (with a few questions inverted)
- Results in a 3D scale

Immersive Tendency Questionnaire (ITQ):

Measuring the tendency of individuals to be involved/immersed: how much of the immersion comes from the experience you created versus the participant’s tendencies?
- Participants group may be biased - people taking part in VR studies likely to have more experience with VR compared to the general population
7 point scale per item
Three subscales:
- INVOL: tendency to become involved in activities:
  - Difficulties with people getting your attention/being aware of surroundings when watching tv/movie/reading book
  - Identifying closely with characters
  - Becoming scared/apprehensive/fearful after watching TV show/movie
- FOCUS: tendency to maintain focus on current activities:
  - How physically fit/mentally alert they feel currently
  - How well they can block out external distractions
  - Losing track of time
- GAMES: tendency to play games
  - Feeling like they are inside the game rather than controlling it through a controller
  - How often do they play video games

User Experience Questionnaire (UEQ):

Measure UX of interactive projects
7-step Likert scale from -3 to 3
Fully validated (in multiple languages)
6 scales, 26 items:
- Attractiveness: overall impression of the product
- Perspicuity: how quickly/easily they can learn to use the product
- Efficiency: how fast/efficient the interaction (and feedback) is; amount of perceived ‘unnecessarily’ effort
- Dependability: how in-control they feel; can the user predict system behavior and feel ‘safe’ while using the product
- Stimulation: how exciting/fun is the product
- Novelty: how innovative/creative is the product

Game Engagement Questionnaire (GEQ):

Uses engagement as an indicator of game involvement
Attempts to quantify absorption, flow, presence and immersion
Questions on a no/maybe/yes scale with each question having a unique mapping to a numeric scale

Revised Game Engagement Model (R-GEM):

Evaluates subjective gameplay experience
Extends the GEQ
At some point users shift from low-level to high-level engagement
Low level:
- Immersion: felling of being enveloped by the game’s stimuli/experiences
- Involvement: motivation to play
High level:
- Presence: feeling of being physically located within the game
- Flow: optimal experience of intrinsically-motivated enjoyment
Questionnaire based on SUS, ITQ, PQ, Flow Short Scale (FSS), Personal Involvement Inventory (PII), Technology Acceptance Model (TAM)

Case Studies

AR game to assess upper extremity motor dysfunctions:

Existing validated tools to assess motor performance for e.g. stroke, Alzheimer, Parkinson’s patients
Instead of moving physical items, use AR and motion capture to assess health
Goals:
- Evaluate usability/game experience
- Compare characteristics of movements in AR versus real world
Used NASA-TLX, SUS, GEQ and Kinect motion capture to collect data
Results:
- More engaging than standardized tests
- Motion capture was not accurate enough (at least not for initial assessment)
  - Technology is probably good enough today

Human augmentation for distributed situational awareness:

Collaboration with Dutch police and fire department
Virtual co-location of local and remote experts
- Person at crime scene wears HMD (and backpack laptop), remote expert can annotate crime scene
- Remote expert also has audio connection
Results:
- AR increased workload and situational awareness
- Remote colleague appreciated: acted as advisor
- Local user wanted avatar for the remote colleague for more presence, not just voice

Aerial wildfire firefighting training:

Air attack supervisor (AAS):
- Coordinates fire crews fighting wildfires
- Communicates hazards and gives advice
- Stressful and dangerous
Transitional AAS training expensive, rarely done
Conditions:
- Cylindrical projection display with 270 degree field of regard
  - AAS and pilot can see and interact with each other
- HMD with 360 degree field of regard (but limited by headset’s FoV)
  - AAS can see an avatar of the pilot
Methods:
- SART for situational awareness
  - Non-significant difference
- NASA TLX for workload
  - HMD had slightly lower workload
- IPQ for presence
  - HMD had slightly higher presence

Superhuman Sports

Designing MR games that motivate and engage users in physical activity.

You can:

Augment the senses
- Extra-sensory perception:
  - X-ray vision
  - ‘Spider sense’
  - Clairvoyance
- Sensory augmentation:
  - Map an ‘invisible play world’ onto existing senses (substitution)
- Change properties of one sensory modality into stimuli for another
Augment the body
- Laser tag: vest with PGM force feedback with constricted movements the wearer’s movements the more they got shot
- Mechanical tail: affected balance as the player moved around
- MetaArmS: remap feet to mechanical arms
Augment the playing field:
- Adding virtual elements:
  - New physics
  - New equipment
  - New opponents
- Train in a safe environment:
  - Climbing treadmill:
    - Circular wall which rotates as you move up, allowing users to climb endlessly
    - Users always ~50 cm above the ground, but there is a button which causes a ‘platform’ to appear
    - Physical excursion adds immersion to the experience?
Technology is still not quite ready, so interactions should be kept simple