Simplicity is the ultimate sophistication
Intrusion detection systems: IDS
In recent news: Microsoft Teams GIFShell Attack
Convince user to install a stager. Once done:
- Command execution:
- Attacker sends GIF with embedded commands
- Teams logs the message in publicly accessible logs
- Stager can then extract commands from GIF and execute them
- Exfiltration:
- Teams survey card filename has no length limit
- Stager submits card to the attacker’s public webhook
- Initial infection:
- Sharepoint link generated for any files that are uploaded
- When message sent, contains a POST request with SharePoint link to that file (e.g. image)
- Attacker can replace that URL and Teams will display it as the original file type
- e.g. send execute as an image: when user clicks it, it will download it
- Can also use deep links to Excel etc., which may have vulnerabilities that allow RCE
- By default, Teams messages can be received from people outside the organization
Hardware
Hardware-level Protection Mechanisms
Intel requires that privilege level can only be changed by kernel processes:
- However, the instruction set is not designed for virtualization; workarounds
- Software Guard eXtension (SGX) creates enclave - encrypted, trusted zone
ARM uses TrustZone to:
- Support hardware-level cryptographic functions
- Lock phones to networks
- Run licensed/critical code (e.g. fingerprint, SIM operations)
Hardware sandboxing with CHERI (ARM):
- Allows fine-grained support to isolate processes at the CPU level
- Enables sandboxed memory allocations
- e.g. each browser tab runs in a separate process
Issues with enclaves:
- Puts a very high level of trust on the manufacturer
- DRM
- TODO:
- IN THE EXAM!!!
- In relation to protection rings:
- Protecting access to hardware via software access control (e.g. kernel/userspace barrier)
- Or running completely separate (specialized?) hardware
- e.g. bank apps on Android can use Secure Environment
- Downsides: any vulnerabilities in the apps/endpoints using them can allow very low-level access to the hardware (e.g. cryptographic keys used for device boot)
- Intel SGX:
- Untrusted section can create one or more enclaves in encrypted memory
- Enclaves cannot be modified after they are built
- Untrusted sections can later call functions in the enclave
- SGX depreciated in 11/12th gen core processors
- 4K Blu-rays require it; users won’t be able to view content they bought at the highest quality in the future
- Intel SGX Explained
- Runs at ring 3 only (-1/hypervisor, 0/kernel, 1-2/drivers (not really used), 3/application)
- Untrusted section can create one or more enclaves in encrypted memory
- AMD TrustZone
- ARM: has multiple processor modes (e.g. user, supervisor, system, …, hypervisor)
- TrustZone: secure and non-secure states (i.e. orthogonal to rings)
- Can partition SoC peripherals (e.g. areas of RAM only used by secure mode)
Mobile Platforms
OS:
- iOS: simplified BSD with separate secure enclave
- Android: simplified Linux with SELinux features
App management:
- iOS: walled garden, with all apps being reviewed Apple
- Android: signed by developers, with some being ‘Play Protect Verified’
Permissions:
- Android:
- ‘Dangerous’ permissions must be accepted by users at run-time (post-Android 8)
- Previously, would be shown during install and people would just click yes
- Now users allow/deny one by one: allows them to make more informed decisions
- Vendors can define their own permissions: can lead to fragmentation
- SoK: Lessons Learned from Android Security Research for Appified Software Platforms:
- Users cannot associate privacy risks with permissions; may underestimate or overestimate risks
- Insecure IPC (exposed activities?): other apps can use this for privilege escalation
- Web views: web to app/app to web for privilege escalation and data leakage
- Permissions:
- Over-privileged applications
- Ad and other libraries running in same process and inheriting same privileges
- Only shown on install/first use, not whenever the permissions are actually used
- No mandatory access control (until SELinux)
- APIs
- Lack of good secure remote code loading API lead to unsigned implementations
- MITM attacks in 95% of cases where developers customised TLS certificate validation
- ‘Dangerous’ permissions must be accepted by users at run-time (post-Android 8)
Monitoring and Response
MAPE-K control loop
(Monitor, Analyze, Plan, Execute), Knowledge.
Circa 2003, need for autonomic managers overlooking the functioning of running systems:
- Self-configuring: system can deploy nodes on-demand
- Self-healing: the system can handle failing components
- Self-optimizing: able to manager the workload dynamically
- Important for cloud workloads where you pay by CPU hours etc.
Using a knowledge source (log files, system events):
- Monitor a collecting of ‘interesting’ events; pass problematic ones onto the next step
- Analyze the collected data (and predictions) and evaluate the issue
- Plan a change in accordance with policies
- Execute the plan (multiple actions/steps)
Exercise: MAKE-K on Assignment 2 Codebase
- What aspects of the system should be logged?
- System load (e.g. requests/minute, CPU usage)
- Actions taken to the system (to allow rollbacks)
- What data do you need to be captured? Why?
- For each request, IP addresses, user agents etc.
- POST requests, uploads etc.: non-sensitive data
- Usernames probably not sensitive, although users may accidentally type their passwords into their username fields
- Access to admin panels
- Request response times
- CPU, RAM, disk usage
- Database queries, number of rows returned, processing time
- Non-standard requests (e.g. unused ports, unsupported protocols)
- What safety measures can you apply automatically?
- Append-only logs or cloned logs
- Notify admin (e.g. through email) when anomalous events occur
- IP throttling/bans (e.g.
fail2ban) - Recaptcha
- Disabling pings, unused protocols etc. (or isolating it onto a different machine) (e.g. admin panel login/password change, low disk, high CPU, long response times)
Quality attributes:
- How can you achieve self-configuration
- What part of the system should self-heal?
- Is managing the workload simply a matter of increasing resources?
- Or is this a symptom of an issue?
Base Rate Fallacy
Assuming that ‘interesting’ events are uncommon:
- The base rate is the ratio of ‘interesting’ events to total events
- A small false positive rate is large given a large population
- This may lead to a vast majority of identified events being false positives
People cannot go through a thousand events to find the one true positive:
Intrusion Detection Systems
These can be categorized into three main techniques:
- Signature-based:
- Identify specific patterns that match known-bad patterns
- Fast and low false-positive rate
- Can only detect known attacks
- Need examples of malicious traffic
- Rules defined at the low-level
- The more rules you define, the more resources are required to monitor the traffic occurring in real-time
- Specification-based:
- Events deviate from per-application specifications of legitimate actions
- Model the normal trends; anything outside that raises an alarm
- Uses manually-developed, system- or protocol-specific specifications
- Deep understanding of the system required
- Behavioral model of the system required
- Needs a working implementation, possibly in active use, or a similar application to model this
- Can detect new attacks
- Not flexible to changes to the system - requires security-oriented testing during development
- Events deviate from per-application specifications of legitimate actions
- Anomaly-based:
- Machine learning used to record the steady state: requires recording of current system behavior
- Cannot detect anomalies that were present during the training stage
- Can detect new attacks
- May have a higher false-positive rate compared to other techniques
- If expected behavior wasn’t captured during training
- Changes to the system will require re-training
- Difficult to model what is normal behavior - what if it is currently being attacked?
Factors to consider:
- Initial setup and deployment
- Types of detected events
- Flexibility to new events
- Extensibility by practitioners
- Low-level IDS
- Packet sent to decoder:
- Determines packet protocol (e.g. IP, TCP)
- Checks for malformed packets, anomalies in the header
- Preprocessors:
- Checks for IPs that are banned etc.
- Deals with IP refragmentation etc.
- Processes and normalizes the data into a standardized format
- Detection:
- Snort rules, custom detectors applied
- Log and verdict:
- Discard and log the packet, or send it to the downstream machine
All IDSes have a pipe-and-filter architecture, with the fastest, most basic rules being applied first to remove the most obvious bad packets.
Networking
LANs:
- Hubs: broadcasts packets to all connected devices
- An infected device can monitor all traffic
- Switches: sends packets to only the receiver
- Isolates traffic and may allow firewall rules to be applied
Ethernet:
- Packet sniffing
- Hardware-supported sniffing
- Port mirror: switch duplicates traffic to another device
- Test access port (TAP): port which reads the traffic going through the device
TCP:
- Three-way handshake
- SYN, SYN/ACK, ACK
- SYN: send initial sequence number
- ACK: acknowledge receipt of the sequence number
- Each packet increments the sequence number
DDoS:
- TCP doesn’t validate the sender’s IP address: can use this to make servers flood the victim with packets
IDSs contain rules to detect suspicious activities:
- Ingress: filter packets entering the network/machine
- Egress: filter packets leaving the machine/network
DNS poisoning:
- Inject fake DNS entry to DNS server
- DNS servers synchronize with each other, so the poisoned entry spreads
- https://www.eweek.com/cloud/dns-poisoning-suspected-cause-of-huge-internet-outage-in-china/