Testing strategies != testing
Debate: developers should not test their own code/program
Developers should develop, testers should test
Negative: developers should develop and test
Positive:
- Separation of concerns:
- One team makes, one team breaks
- Specialization
- Developers are not end users; testers can have a better understanding of the domain and of how users will use the software
- Know what your own code does: may only write tests you know will pass
- Developer may misinterpret requirements and write tests accordingly; a tester will have their own understanding of the requirements
- Gives developer more confidence in their code; experienced tester there to catch bugs
- Will write code that is easier to test as you know that someone else will be looking through it
Negative:
- Latency involved in the back-and-forth between developers and testing
- Counterpoint; writing code that is easier to test if there is a dedicated tester
- Should be writing easy-to-test code anyway
- Cycle of lots of testing and lots of development; if developers are also testing can switch develop-test workload; can’t really do that when there are dedicated testers
- Understanding existing tests helps when writing newer features
- Can’t really use TDD if dedicated testers are involved; TDD is iterative, which is hard to do when there are separate teams
Counterpoints against positive:
- Developers should have good understanding of users and problem domain anyway
- Code review process should catch requirements being interpreted
- Having dedicated testers may lead to complacency in code quality and review process
Counterpoints against negative:
- Some industries have strict regulations, require dedicated testers
- Domain knowledge: some is expected, but unrealistic to expect deep domain knowledge from every tester
- Lower bus factor: developer + tester both need to understand the domain
Quality
- Who creates quality? The developers or the testers?
- Who is responsible for (maintaining) quality?
- When is quality created?
Quality is created by the developer - so what is testing for?
Testing isn’t about unit testing or integration testing. It is the mindset; a systematic process of:
- Poking and prodding at at system to see how it behaves
- Understanding the limits of a system
- Determining if it behaves as expected
- Determining if it does what is is meant to do; it is fit-for-purpose
Testing is about how a user experiences the system and how it compares to our expectations.
In what contexts is testing not required?
- When making a one-off thing (a prototype)
- When it doesn’t matter if it works right
- Zero impact on people’s lives or livelihoods
- Small programs
Hypothesis Testing
The broad steps:
- Conjecture
- Some sort of expectation informed by your model of the system/world
- Hypothesis (and null hypothesis)
- A testable conjecture
- Conducting systematic testing of the hypothesis, possibly in multiple ways
- Supporting/rejecting the null hypothesis
Example:
- Model + Conjecture:
- Logging in is a difficult feature to create securely
- I have a feeling there is a flaw in the login logic
- Hypothesis:
- Insecure logins are possible
- Testing:
- Use ‘back button’ after logging out
- Refresh the page
- Checking if passwords are plain text
- Sending information as a GET request
- Logging in as
adminandpassword - Attempting an SQL injection attack
- Attempting a login with no password
- etc.
Verifiability vs Falsifiability
What will it take for us to be able to claim that there are no bugs in the system?
You must test every conceivable avenue and every single branch; verify the system. This is almost impossible, although formal proofs are possible in limited domains.
Karl Popper - The Logic of Scientific Discovery, 1934.
Verifiability: every single branch can be tested
Falsifiability: at least one example that contradicts the hypothesis can be found
Hence, there is a large asymmetry between the two: when making scientific hypotheses, we find evidence to support or disprove the hypothesis but we can never prove the hypothesis is true.
Testing vs. Automation
Automations help with making the testing process easier; it is not testing itself.
Testing is the human process of thinking about how to verify/falsify.
Testing is done in context; humans must intelligently evaluate the results taking this into account.
Biases
Confirmation Bias
The tendency to interpret information in a manner that confirms your own beliefs:
- x is secure
- In what way? How does it need to be used? What are its limits?
- 100% test coverage means there are no bugs
- Documentation being used to confirm a tester’s belief about the SUT (system under test)
- Assumes that the SUT’s documentation is completely correct
- Positive test bias
- Testing positive outcomes is verifying; instead you should be attempting to falsify by choosing tests and data that may lead to negative outcomes
Congruence Bias
Subset of confirmation bias, in which people over-rely on their initial hypothesis and neglect to consider alternatives (which may indirectly test the hypothesis).
In testing, this occurs if the tester has strategies that they use all the time and do not consider alternative approaches.
Anchoring Bias
Once a baseline is provided, people unconsciously it as a reference point.
Irrelevant information affects the decision making/testing process.
The tester is already anchored in what the system does, perhaps from docs, user stories, talks with management etc. and not consider alternate branches.
Functional fixedness: a tendency to only test in the way the system is meant to be used and not think laterally.
Law of the Instrument Bias
Believing and relying on an instrument to a fault.
Reliance on the testing tool/methodology e.g. acceptance/unit/integration testing: we use x therefore y must be true.
The way the language is written can affect it as well. e.g. the constrained syntax of user stories leads to complex information and constraints being compressed and relevant information being lost.
Resemblance Bias
The toy duck looks like a duck so it must act like a duck: judging a situation based on a similar previous situation
e.g. if you have experience in a similar framework, you may make assumptions about how the current framework works based on your prior experience. This may lead to ‘obvious’ things being missed or mistaken.
Halo Effect Bias
Brilliant people/organizations never make mistakes. Hence, their work does not need to be tested (or this bug I found is a feature, not a bug).
Authoritative Bias
- Appealing to authority
- Testers feeling a power level difference when talking to developers
- Listening to what management wants rather than what should be tested
- Management should be told about the consequences of any steps that are skipped
Types of Testing Techniques
Static testing:
- Looking at the static code or document
- Static code analysis, cross-document tractability analysis, reviews
Dynamic testing:
- Forcing failures in executable items
Scripted vs unscripted tests; compared to to unscripted tests, scripted tests:
- Are repeatable, providing auditability and verification and validation
- Unscripted tests have generally have little to no records and are not repeatable
- Allow test cases to be explicitly traced back to requirements; test coverage can be documented
- Allow test cases to be retained as reusable artifacts for current and future projects, saving time in the future
- Are more time-consuming and costly, although this may be mitigated by automating the tests
- Have test cases are defined prior to the execution, making them less adaptable to the system as it prevents itself and more prone to cognitive biases
- Unscripted tests allow testers to follow ideas and change their behavior based on the system’s behavior
- Are boring; testers may lose focus and miss details during test execution
- Unscripted testing requires more thought and hence are less prone to biases
Testing Toolbox
Three main classes:
-
Black box testing:
- Specification-based testing: does it meet the user-facing requirements?
- No access to internals
-
White box testing:
-
Structure-based testing
-
Full access to implementation
-
-
Grey box testing:
- A domination of black and white box testing
Unit testing:
- White box testing
- Test individual units
Integration testing:
- Testing the interface between two modules
- API testing
- Grey box
System testing:
- Testing the system; does the system do does it is meant to?
- Black box test
- Many types of tests: regression, performance, sanity, smoke, installation etc.
Smoke testing:
- AKA build verification/acceptance testing
- Pumping smoke into the pipe and seeing if any smoke comes out of cracks
- Testing to see the critical, core functionality works (e.g. can it boot)
- A time saving measure: is the system stable enough that we can go into the main testing phase?
Sanity testing:
- Very high-level regression test, similar to smoke testing
- Testing if it is sane; does the system perform rationally and do what it is meant to do?
Regression testing:
- Verifying that the system continues to behave as expected after something has been modified
- Each test targets a specific small operation
Acceptance testing:
- Formal tests; used during validation
- Checking if the system satisfies requirements
- Customer decides if it is accepted
- Types:
- End-use acceptance testing (UAT)
- People simulating end-users test the system
- Business acceptance testing (BAT)
- Checking that the system meets the requirements of the business
- Regulations/standards acceptance testing (RAT)
- Alpha/beta testing
- Accessibility testing
- Accessible by the target audience
- Text contrast, colors, highlighting
- Magnifications
- Screen readers
- UI hierarchy
- Special keyboards
- User guides, training, documentation
- Performance testing
- Non-functional requirements
- Is the system fast enough?
- Load testing (at the expected load)
- e.g. UC network was tested and found to perform great, but many students would log in to lab machines at the start of the hour and overload the system
- Stress testing (under max load or beyond for long periods)
- Data transfer rates, throughput
- CPU/memory utilization
- Running the system on a client with limited resources
- Or on networks where certain resources may be blocked (e.g. China)
- Are the devices you are testing on representative of what clients will be using?
- ‘Service level agreements’
- End-use acceptance testing (UAT)
End-to-end testing:
- Scenario-based testing: testing a real scenario a user may run into, from the beginning to the end
- Uses actual data and simulated ‘real’ settings
- Expensive and cannot usually be fully automated
Security testing:
- Access, authentication and authorization
- Roles, permissions
- Vulnerabilities, threats, risks
- Present and future: think about what may happen in the future
- Attacks
- Data storage (security), encryption
- Type:
- Penetration testing
- Security audit
Test/Behavior Driven Development (TDD/BDD)
Development, NOT testing strategies.
Tests made in this process are prototypes and hence they .
TDD tests are blue-sky, verification tests rather than falsifiability tests. Additionally, they are prototypes and hence, TDD tests should (in theory) be thrown away and rewritten (sunk-cost fallacy).
Audits
How will you test the system?
Look at the tests, not the techniques.
James Bach - The Test Design Starting Line: Hypotheses - Keynote PeakIT004
Testing Certifications
Standards:
- Condensed experience, knowledge and wisdom from the domain experts that wrote the standards
- Provides confidence for management, customers, the development team and the government
- Standards != quality
International software testing qualifications board (ISTQB):
- Most popular testing certification
- Multichoice exams
- Teaches testing techniques, not how to test
- Testing is done by humans; testing techniques help humans do the testing
- Always take the context of the system under test (SUT) into account
In the exam:
- Four questions which provide scenarios
- Don’t just vomit out testing techniques
ISO/IEC/IEEE 29119-4 Test Techniques
Split into three different high-level types:
- Black/specification-based testing
- White/clear/structure-based testing
- Grey: combination
Specification
Equivalence Class Partitioning (ECP)
Partition test conditions, usually inputs, into sets: equivalence partitions/classes. Be careful of sub-partitions.
Only one test per partition is required.
e.g. alphabetical characters, alphanumeric, ASCII, emoji, SQL injection.
e.g. square root function could have num >= 0, int <= 0, float <= 0 equivalence classes
Classification Tree Method
Grimm/Grochtmann, 1993:
- Find all classifications/aspects
- Divide the input domain into subsets/classes
- Select as many test cases as are needed for a thorough test
e.g. DBMS:
- Classification aspects are:
- Privilege: regular, admin
- Operations: read, write, delete
- Access method: CLI, browser, API
- For each test, pick one value from each class
- Make enough test cases for ‘thorough’ coverage: do not need to have tests for every permutation
Boundary Value Analysis
Test along the boundary:
- Allows you to catch errors such as off-by-one errors
- Equivalence partitioning usually used to find the boundaries
- Check to ensure you have found all boundaries
Syntax Testing
Tests the language’s grammar by testing the syntax of all inputs in the input domain.
Requires a very large number of tests. Usually automated and may use a pre-processor.
Note that a correct syntax does not mean correct functionality.
Process:
- Identify the target language/format
- Define the syntax in formal notation
- Test and debug the syntax
- Use the syntax graph to test normal and invalid conditions
Combinatorial Test Techniques
When there are several parameters/variables. TODO
Reduce the test space using other techniques:
- Pair-wise testing
- Each choice testing
- Base choice testing
Decision Table Testing
AKA cause-effect table testing
Software makes different decisions based on a variety of factors:
- State
- Input
- Rules
Decision table testing tests decision paths: different outputs triggered by the above conditions.
Decisions tables help to document complex logic and business rules. They have CONDITIONS (e.g. user logged in or not) and ACTIONS that are run when the conditions are met (that are run by the user and/or system).
Cause-Effect Graphs
AKA Ishikawa diagram, fish bone diagram.
Document dependencies.
Syntax:
- Cause
cand effectenodes, both inside a circle- Intermediary nodes (e.g. AND joining two causes) have no label
- Lines connecting the causes to effects
- Not: ~ in the line
- OR: an arc
(intersecting the lines between the causes and the effect, and avnext to the arc - AND: an arc
(intersecting the lines between the causes and the effect, and a^next to the arc
Example
If the user clicking the ‘save’ button is an administrator or a moderator, then they are allowed to save. When the 'save” button is clicked, it should call the ‘save’ functionality.
If the user is not an admin or moderator, then the message in the troubleshooter/CLI should say so.
If the ‘save’ functionality is not hooked up to the ‘save’ button, then there should be a message about this when the button is clicked.
C1: the user is an admin C2: the user is a moderator C3: the save functionality is called
E1: the information is saved E2: the message ‘you need to be an authenticated user’ E3: the massage ‘the save functionality has not been called’
c1
\ ----~--- e3
\ /
v ( > --------- e1
/ ^(/
/
c2 / ____/
___/
c3 _/___________ e2
More complex diagrams should use fishbone diagrams.
State Transition Graphs
- States
- Transitions between states
- Events (that trigger transitions)
- Actions (resulting from transitions)
Scenario Testing
Scenarios are a sequence of interactions (between systems, users etc.).
Scenarios should be credible and replicate an end-user’s experience. They should be based off of a story/description.
Scenario tests test the end-to-end functionality and business flows, both blue-sky and error cases. However, scenario tests should not need to be exhaustive - these are expensive and heavily-documented tests.
Scenario tests also test usability from the user’s perspective, not just business requirements.
Random/Monkey Testing
Using random input to test; used when the time required to write and run the directed test is too long, too complex or impossible.
Heuristics could be used to generate tests, but care should be taken to ensure there is still sufficient randomness as to cover the specification.
There needs to be some sort of mechanism to determine when a test fails and the ability to be able to reproduce the failing test.
Monkey testing useful to prevent tunnel vision and when you cannot think laterally.
Structure-Based Techniques
Structure and data.
Statement Testing
AKA line/segment coverage.
Test checks/verifies each line of code and the flow of different paths in the program.
Conditions that are always false cannot be tested.
Similar to BVA except it is focused more on the paths rather than the input.
Branch/Decision Testing
Test each branch where decisions are made.
Branch coverage:
- Minimum number of paths which will ensure all paths are covered
- Measures which decision outcomes have been tested
All branches are validated.
Data Flow Testing
Test for data flows, detects improper use of data in a program such as:
- Variables that are declared but never used
- Variables that are used but never declared
- Variables that are defined multiple times before being used
- Variables that are deallocated before being used
It creates a control flow graph and a data flow graph; the latter represents data dependencies between operations.
Static data flow testing analyzes source code without executing it, which dynamic data flow testing does the analysis during execution.
(e.g. data just passing though a class without being used directly by it?).
Experience-based Testing
Error guessing: get a experienced tester to think of situations that may break the program.
Error guessing:
- Depends on the skill, experience, and intuition of the tester: no explicit rules or testing methods
- Can be somewhat: list possible defects/failures and design tests to produce them
- Can be effective and save time