Ethics
Equality: same treatment for everybody
Equity: customized treatment to ensure everyone has the same opportunity
Algorithms and AI: garbage in, garbage out. If the dataset it is fed is biased, the output will be biased.
ACM code of ethics. TL;DR: respect everyone + make mistakes and reflect on your mistakes.
Reliability and Resilience
Faults, errors and failures:
- Human error: invalid input data causing the system to misbehave
- System fault: something that leads to an error
- System error: visible effects of misbehavior
- System failure: when system returns bad results
To improve reliability:
- Fault avoidance: design, development process, tools and guidelines
- Detection and correction: test, debug, validate
- Fault tolerance: designing system to handle/recover from failures
Availability and reliability:
- Availability: probability of being able to successfully access the system
- Reliability: probability of failure-free operation
Works-as-designed problem:
- System specification may be wrong (not reflecting user truth)
- System specification erroneous (typos, not proof-read etc.)
Reliability can be subjective, affecting only a subset of users:
- Errors can be concentrated in a specific part of the system
- Responses may be slow at a specific time or location
Capacity management:
- Thing carefully about software architecture, especially for I/O
- Ues threads carefully (starvation and deadlocks)
- Write dedicated tests and monitor production systems
Architectural strategies:
- Protection systems
- Systems that monitor the execution of others
- Trigger alarms or automatically correct the behavior
- Multiversion programming
- Concurrent computation
- Hardware with different items/providers
- Software with different development teams
- Voting systems e.g. triple-modular-redundancy
- Concurrent computation
Reliability Guidelines
- Visibility: need-to-know principle; if variables/methods don’t need to be exposed, don’t expose them
- Validity:
- Check format and domain of input values (including boundaries)
- Use if statements or regression-test-enabled assert statements
- Avoid errors becoming system failures by capturing them; never send back an error message with the stack trace
- Erring:
- Avoid untyped languages
- Encapsulate ‘nasty’ stuff
- Restart: provide recoverable milestones so that it can restart into a good state
- Constants: express fixed or real-world values with meaningful names
Unicorns:
Security treats come from:
- Ignorance: unknown risks
- Design: security is disregarded
- Carelessness: bad design
- Trade-offs: lack of emphasis on security
The Four Rs of a Resilience Engineering Plan
- Recognition: how an attacker may target an asset
- Resistance: possible strategies to resist each threat
- Recovery: plan data, software and hardware recovery procedures
- And test the restore procedure
- Reinstatement: define the process to bring the system back
Resilience planning:
- Identify resilience requirements: minimum functionality while being attacked
- Backup and reinstatement procedures
- Classifying critical assets; how should they work in degraded mode
- Test it! An untested backup is as bad as no backup
Checklist:
- Permissions: ensure file, user permissions are correct
- Session: terminate long-running, inactive user sessions
- Overflows: take care around memory overflows
- Password: require strong passwords
- Input: sanitize inputs