06. Testing

Testing strategies != testing

Debate: developers should not test their own code/program

Developers should develop, testers should test

Negative: developers should develop and test

Positive:

Separation of concerns:
- One team makes, one team breaks
- Specialization
Developers are not end users; testers can have a better understanding of the domain and of how users will use the software
Know what your own code does: may only write tests you know will pass
Developer may misinterpret requirements and write tests accordingly; a tester will have their own understanding of the requirements
Gives developer more confidence in their code; experienced tester there to catch bugs
Will write code that is easier to test as you know that someone else will be looking through it

Negative:

Latency involved in the back-and-forth between developers and testing
Counterpoint; writing code that is easier to test if there is a dedicated tester
- Should be writing easy-to-test code anyway
Cycle of lots of testing and lots of development; if developers are also testing can switch develop-test workload; can’t really do that when there are dedicated testers
Understanding existing tests helps when writing newer features
Can’t really use TDD if dedicated testers are involved; TDD is iterative, which is hard to do when there are separate teams

Counterpoints against positive:

Developers should have good understanding of users and problem domain anyway
Code review process should catch requirements being interpreted
Having dedicated testers may lead to complacency in code quality and review process

Counterpoints against negative:

Some industries have strict regulations, require dedicated testers
Domain knowledge: some is expected, but unrealistic to expect deep domain knowledge from every tester
Lower bus factor: developer + tester both need to understand the domain

Quality

Who creates quality? The developers or the testers?
Who is responsible for (maintaining) quality?
When is quality created?

Quality is created by the developer - so what is testing for?

Testing isn’t about unit testing or integration testing. It is the mindset; a systematic process of:

Poking and prodding at at system to see how it behaves
Understanding the limits of a system
Determining if it behaves as expected
Determining if it does what is is meant to do; it is fit-for-purpose

Testing is about how a user experiences the system and how it compares to our expectations.

In what contexts is testing not required?

When making a one-off thing (a prototype)
When it doesn’t matter if it works right
- Zero impact on people’s lives or livelihoods
Small programs

Hypothesis Testing

The broad steps:

Conjecture
- Some sort of expectation informed by your model of the system/world
Hypothesis (and null hypothesis)
- A testable conjecture
Conducting systematic testing of the hypothesis, possibly in multiple ways
Supporting/rejecting the null hypothesis

Example:

Model + Conjecture:
- Logging in is a difficult feature to create securely
- I have a feeling there is a flaw in the login logic
Hypothesis:
- Insecure logins are possible
Testing:
- Use ‘back button’ after logging out
- Refresh the page
- Checking if passwords are plain text
- Sending information as a GET request
- Logging in as admin and password
- Attempting an SQL injection attack
- Attempting a login with no password
- etc.

Verifiability vs Falsifiability

What will it take for us to be able to claim that there are no bugs in the system?

You must test every conceivable avenue and every single branch; verify the system. This is almost impossible, although formal proofs are possible in limited domains.

Karl Popper - The Logic of Scientific Discovery, 1934.

Verifiability: every single branch can be tested

Falsifiability: at least one example that contradicts the hypothesis can be found

Hence, there is a large asymmetry between the two: when making scientific hypotheses, we find evidence to support or disprove the hypothesis but we can never prove the hypothesis is true.

Testing vs. Automation

Automations help with making the testing process easier; it is not testing itself.

Testing is the human process of thinking about how to verify/falsify.

Testing is done in context; humans must intelligently evaluate the results taking this into account.

Biases

Confirmation Bias

The tendency to interpret information in a manner that confirms your own beliefs:

x is secure
- In what way? How does it need to be used? What are its limits?
100% test coverage means there are no bugs
Documentation being used to confirm a tester’s belief about the SUT (system under test)
- Assumes that the SUT’s documentation is completely correct
Positive test bias
- Testing positive outcomes is verifying; instead you should be attempting to falsify by choosing tests and data that may lead to negative outcomes

Congruence Bias

Subset of confirmation bias, in which people over-rely on their initial hypothesis and neglect to consider alternatives (which may indirectly test the hypothesis).

In testing, this occurs if the tester has strategies that they use all the time and do not consider alternative approaches.

Anchoring Bias

Once a baseline is provided, people unconsciously it as a reference point.

Irrelevant information affects the decision making/testing process.

The tester is already anchored in what the system does, perhaps from docs, user stories, talks with management etc. and not consider alternate branches.

Functional fixedness: a tendency to only test in the way the system is meant to be used and not think laterally.

Law of the Instrument Bias

Believing and relying on an instrument to a fault.

Reliance on the testing tool/methodology e.g. acceptance/unit/integration testing: we use x therefore y must be true.

The way the language is written can affect it as well. e.g. the constrained syntax of user stories leads to complex information and constraints being compressed and relevant information being lost.

Resemblance Bias

The toy duck looks like a duck so it must act like a duck: judging a situation based on a similar previous situation

e.g. if you have experience in a similar framework, you may make assumptions about how the current framework works based on your prior experience. This may lead to ‘obvious’ things being missed or mistaken.

Halo Effect Bias

Brilliant people/organizations never make mistakes. Hence, their work does not need to be tested (or this bug I found is a feature, not a bug).

Authoritative Bias

Appealing to authority
Testers feeling a power level difference when talking to developers
Listening to what management wants rather than what should be tested
- Management should be told about the consequences of any steps that are skipped

Types of Testing Techniques

Static testing:

Looking at the static code or document
Static code analysis, cross-document tractability analysis, reviews

Dynamic testing:

Forcing failures in executable items

Scripted vs unscripted tests; compared to to unscripted tests, scripted tests:

Are repeatable, providing auditability and verification and validation
- Unscripted tests have generally have little to no records and are not repeatable
Allow test cases to be explicitly traced back to requirements; test coverage can be documented
Allow test cases to be retained as reusable artifacts for current and future projects, saving time in the future
Are more time-consuming and costly, although this may be mitigated by automating the tests
Have test cases are defined prior to the execution, making them less adaptable to the system as it prevents itself and more prone to cognitive biases
- Unscripted tests allow testers to follow ideas and change their behavior based on the system’s behavior
Are boring; testers may lose focus and miss details during test execution
Unscripted testing requires more thought and hence are less prone to biases

Testing Toolbox

Three main classes:

Black box testing:
- Specification-based testing: does it meet the user-facing requirements?
- No access to internals
White box testing:
- Structure-based testing
- Full access to implementation
Grey box testing:
- A domination of black and white box testing

Unit testing:

White box testing
Test individual units

Integration testing:

Testing the interface between two modules
API testing
Grey box

System testing:

Testing the system; does the system do does it is meant to?
Black box test
Many types of tests: regression, performance, sanity, smoke, installation etc.

Smoke testing:

AKA build verification/acceptance testing
Pumping smoke into the pipe and seeing if any smoke comes out of cracks
Testing to see the critical, core functionality works (e.g. can it boot)
A time saving measure: is the system stable enough that we can go into the main testing phase?

Sanity testing:

Very high-level regression test, similar to smoke testing
Testing if it is sane; does the system perform rationally and do what it is meant to do?

Regression testing:

Verifying that the system continues to behave as expected after something has been modified
Each test targets a specific small operation

Acceptance testing:

Formal tests; used during validation
Checking if the system satisfies requirements
Customer decides if it is accepted
Types:
- End-use acceptance testing (UAT)
  - People simulating end-users test the system
- Business acceptance testing (BAT)
  - Checking that the system meets the requirements of the business
- Regulations/standards acceptance testing (RAT)
- Alpha/beta testing
- Accessibility testing
  - Accessible by the target audience
  - Text contrast, colors, highlighting
  - Magnifications
  - Screen readers
  - UI hierarchy
  - Special keyboards
  - User guides, training, documentation
- Performance testing
  - Non-functional requirements
  - Is the system fast enough?
  - Load testing (at the expected load)
    - e.g. UC network was tested and found to perform great, but many students would log in to lab machines at the start of the hour and overload the system
  - Stress testing (under max load or beyond for long periods)
  - Data transfer rates, throughput
  - CPU/memory utilization
  - Running the system on a client with limited resources
    - Or on networks where certain resources may be blocked (e.g. China)
    - Are the devices you are testing on representative of what clients will be using?
  - ‘Service level agreements’

End-to-end testing:

Scenario-based testing: testing a real scenario a user may run into, from the beginning to the end
Uses actual data and simulated ‘real’ settings
Expensive and cannot usually be fully automated

Security testing:

Access, authentication and authorization
Roles, permissions
Vulnerabilities, threats, risks
- Present and future: think about what may happen in the future
Attacks
Data storage (security), encryption
Type:
- Penetration testing
- Security audit

Test/Behavior Driven Development (TDD/BDD)

Development, NOT testing strategies.

Tests made in this process are prototypes and hence they .

TDD tests are blue-sky, verification tests rather than falsifiability tests. Additionally, they are prototypes and hence, TDD tests should (in theory) be thrown away and rewritten (sunk-cost fallacy).

Audits

How will you test the system?

Look at the tests, not the techniques.

James Bach - The Test Design Starting Line: Hypotheses - Keynote PeakIT004

Testing Certifications

Standards:

Condensed experience, knowledge and wisdom from the domain experts that wrote the standards
Provides confidence for management, customers, the development team and the government
Standards != quality

International software testing qualifications board (ISTQB):

Most popular testing certification
Multichoice exams
Teaches testing techniques, not how to test
- Testing is done by humans; testing techniques help humans do the testing
- Always take the context of the system under test (SUT) into account

In the exam:

Four questions which provide scenarios
Don’t just vomit out testing techniques

ISO/IEC/IEEE 29119-4 Test Techniques

Split into three different high-level types:

Black/specification-based testing
White/clear/structure-based testing
Grey: combination

Specification

Equivalence Class Partitioning (ECP)

Partition test conditions, usually inputs, into sets: equivalence partitions/classes. Be careful of sub-partitions.

Only one test per partition is required.

e.g. alphabetical characters, alphanumeric, ASCII, emoji, SQL injection.

e.g. square root function could have num >= 0, int <= 0, float <= 0 equivalence classes

Classification Tree Method

Grimm/Grochtmann, 1993:

Find all classifications/aspects
Divide the input domain into subsets/classes
Select as many test cases as are needed for a thorough test

e.g. DBMS:

Classification aspects are:
- Privilege: regular, admin
- Operations: read, write, delete
- Access method: CLI, browser, API
For each test, pick one value from each class
Make enough test cases for ‘thorough’ coverage: do not need to have tests for every permutation

Boundary Value Analysis

Test along the boundary:

Allows you to catch errors such as off-by-one errors
Equivalence partitioning usually used to find the boundaries
Check to ensure you have found all boundaries

Syntax Testing

Tests the language’s grammar by testing the syntax of all inputs in the input domain.

Requires a very large number of tests. Usually automated and may use a pre-processor.

Note that a correct syntax does not mean correct functionality.

Process:

Identify the target language/format
Define the syntax in formal notation
Test and debug the syntax
- Use the syntax graph to test normal and invalid conditions

Combinatorial Test Techniques

When there are several parameters/variables. TODO

Reduce the test space using other techniques:

Pair-wise testing
Each choice testing
Base choice testing

Decision Table Testing

AKA cause-effect table testing

Software makes different decisions based on a variety of factors:

State
Input
Rules

Decision table testing tests decision paths: different outputs triggered by the above conditions.

Decisions tables help to document complex logic and business rules. They have CONDITIONS (e.g. user logged in or not) and ACTIONS that are run when the conditions are met (that are run by the user and/or system).

Cause-Effect Graphs

AKA Ishikawa diagram, fish bone diagram.

Document dependencies.

Syntax:

Cause c and effect e nodes, both inside a circle
- Intermediary nodes (e.g. AND joining two causes) have no label
Lines connecting the causes to effects
Not: ~ in the line
OR: an arc ( intersecting the lines between the causes and the effect, and a v next to the arc
AND: an arc ( intersecting the lines between the causes and the effect, and a ^ next to the arc

Example

If the user clicking the ‘save’ button is an administrator or a moderator, then they are allowed to save. When the 'save” button is clicked, it should call the ‘save’ functionality.

If the user is not an admin or moderator, then the message in the troubleshooter/CLI should say so.

If the ‘save’ functionality is not hooked up to the ‘save’ button, then there should be a message about this when the button is clicked.

C1: the user is an admin C2: the user is a moderator C3: the save functionality is called

E1: the information is saved E2: the message ‘you need to be an authenticated user’ E3: the massage ‘the save functionality has not been called’

c1
   \    ----~--- e3
    \  /
 v ( > --------- e1
    /         ^(/
               /
c2 /      ____/
      ___/
c3 _/___________ e2

More complex diagrams should use fishbone diagrams.

State Transition Graphs

States
Transitions between states
Events (that trigger transitions)
Actions (resulting from transitions)

Scenario Testing

Scenarios are a sequence of interactions (between systems, users etc.).

Scenarios should be credible and replicate an end-user’s experience. They should be based off of a story/description.

Scenario tests test the end-to-end functionality and business flows, both blue-sky and error cases. However, scenario tests should not need to be exhaustive - these are expensive and heavily-documented tests.

Scenario tests also test usability from the user’s perspective, not just business requirements.

Random/Monkey Testing

Using random input to test; used when the time required to write and run the directed test is too long, too complex or impossible.

Heuristics could be used to generate tests, but care should be taken to ensure there is still sufficient randomness as to cover the specification.

There needs to be some sort of mechanism to determine when a test fails and the ability to be able to reproduce the failing test.

Monkey testing useful to prevent tunnel vision and when you cannot think laterally.

Structure-Based Techniques

Structure and data.

Statement Testing

AKA line/segment coverage.

Test checks/verifies each line of code and the flow of different paths in the program.

Conditions that are always false cannot be tested.

Similar to BVA except it is focused more on the paths rather than the input.

Branch/Decision Testing

Test each branch where decisions are made.

Branch coverage:

Minimum number of paths which will ensure all paths are covered
Measures which decision outcomes have been tested

All branches are validated.

Data Flow Testing

Test for data flows, detects improper use of data in a program such as:

Variables that are declared but never used
Variables that are used but never declared
Variables that are defined multiple times before being used
Variables that are deallocated before being used

It creates a control flow graph and a data flow graph; the latter represents data dependencies between operations.

Static data flow testing analyzes source code without executing it, which dynamic data flow testing does the analysis during execution.

(e.g. data just passing though a class without being used directly by it?).

Experience-based Testing

Error guessing: get a experienced tester to think of situations that may break the program.

Error guessing:

Depends on the skill, experience, and intuition of the tester: no explicit rules or testing methods
Can be somewhat: list possible defects/failures and design tests to produce them
Can be effective and save time