08. Monitoring and Detecting Intrusions

Simplicity is the ultimate sophistication

Intrusion detection systems: IDS

In recent news: Microsoft Teams GIFShell Attack

Convince user to install a stager. Once done:

Command execution:
- Attacker sends GIF with embedded commands
- Teams logs the message in publicly accessible logs
- Stager can then extract commands from GIF and execute them
Exfiltration:
- Teams survey card filename has no length limit
- Stager submits card to the attacker’s public webhook
Initial infection:
- Sharepoint link generated for any files that are uploaded
- When message sent, contains a POST request with SharePoint link to that file (e.g. image)
- Attacker can replace that URL and Teams will display it as the original file type
  - e.g. send execute as an image: when user clicks it, it will download it
  - Can also use deep links to Excel etc., which may have vulnerabilities that allow RCE
- By default, Teams messages can be received from people outside the organization

Hardware

Hardware-level Protection Mechanisms

Intel requires that privilege level can only be changed by kernel processes:

However, the instruction set is not designed for virtualization; workarounds
Software Guard eXtension (SGX) creates enclave - encrypted, trusted zone

ARM uses TrustZone to:

Support hardware-level cryptographic functions
Lock phones to networks
Run licensed/critical code (e.g. fingerprint, SIM operations)

Hardware sandboxing with CHERI (ARM):

Allows fine-grained support to isolate processes at the CPU level
Enables sandboxed memory allocations
e.g. each browser tab runs in a separate process

Issues with enclaves:

Puts a very high level of trust on the manufacturer
DRM
TODO:
- IN THE EXAM!!!
- In relation to protection rings:
  - Protecting access to hardware via software access control (e.g. kernel/userspace barrier)
  - Or running completely separate (specialized?) hardware
  - e.g. bank apps on Android can use Secure Environment
Downsides: any vulnerabilities in the apps/endpoints using them can allow very low-level access to the hardware (e.g. cryptographic keys used for device boot)
Intel SGX:
- Untrusted section can create one or more enclaves in encrypted memory
  - Enclaves cannot be modified after they are built
- Untrusted sections can later call functions in the enclave
- SGX depreciated in 11/12th gen core processors
  - 4K Blu-rays require it; users won’t be able to view content they bought at the highest quality in the future
- Intel SGX Explained
  - Runs at ring 3 only (-1/hypervisor, 0/kernel, 1-2/drivers (not really used), 3/application)
AMD TrustZone
- ARM: has multiple processor modes (e.g. user, supervisor, system, …, hypervisor)
- TrustZone: secure and non-secure states (i.e. orthogonal to rings)
- Can partition SoC peripherals (e.g. areas of RAM only used by secure mode)

Mobile Platforms

OS:

iOS: simplified BSD with separate secure enclave
Android: simplified Linux with SELinux features

App management:

iOS: walled garden, with all apps being reviewed Apple
Android: signed by developers, with some being ‘Play Protect Verified’

Permissions:

Android:
- ‘Dangerous’ permissions must be accepted by users at run-time (post-Android 8)
  - Previously, would be shown during install and people would just click yes
  - Now users allow/deny one by one: allows them to make more informed decisions
- Vendors can define their own permissions: can lead to fragmentation
- SoK: Lessons Learned from Android Security Research for Appified Software Platforms:
  - Users cannot associate privacy risks with permissions; may underestimate or overestimate risks
  - Insecure IPC (exposed activities?): other apps can use this for privilege escalation
  - Web views: web to app/app to web for privilege escalation and data leakage
  - Permissions:
    - Over-privileged applications
    - Ad and other libraries running in same process and inheriting same privileges
    - Only shown on install/first use, not whenever the permissions are actually used
    - No mandatory access control (until SELinux)
  - APIs
    - Lack of good secure remote code loading API lead to unsigned implementations
    - MITM attacks in 95% of cases where developers customised TLS certificate validation

Monitoring and Response

MAPE-K control loop

(Monitor, Analyze, Plan, Execute), Knowledge.

Circa 2003, need for autonomic managers overlooking the functioning of running systems:

Self-configuring: system can deploy nodes on-demand
Self-healing: the system can handle failing components
Self-optimizing: able to manager the workload dynamically
- Important for cloud workloads where you pay by CPU hours etc.

Using a knowledge source (log files, system events):

Monitor a collecting of ‘interesting’ events; pass problematic ones onto the next step
Analyze the collected data (and predictions) and evaluate the issue
Plan a change in accordance with policies
Execute the plan (multiple actions/steps)

Exercise: MAKE-K on Assignment 2 Codebase

What aspects of the system should be logged?
- System load (e.g. requests/minute, CPU usage)
- Actions taken to the system (to allow rollbacks)
What data do you need to be captured? Why?
- For each request, IP addresses, user agents etc.
- POST requests, uploads etc.: non-sensitive data
  - Usernames probably not sensitive, although users may accidentally type their passwords into their username fields
- Access to admin panels
- Request response times
- CPU, RAM, disk usage
- Database queries, number of rows returned, processing time
- Non-standard requests (e.g. unused ports, unsupported protocols)
What safety measures can you apply automatically?
- Append-only logs or cloned logs
- Notify admin (e.g. through email) when anomalous events occur
- IP throttling/bans (e.g. fail2ban)
- Recaptcha
- Disabling pings, unused protocols etc. (or isolating it onto a different machine) (e.g. admin panel login/password change, low disk, high CPU, long response times)

Quality attributes:

How can you achieve self-configuration
What part of the system should self-heal?
Is managing the workload simply a matter of increasing resources?
- Or is this a symptom of an issue?

Base Rate Fallacy

Assuming that ‘interesting’ events are uncommon:

The base rate is the ratio of ‘interesting’ events to total events
A small false positive rate is large given a large population
This may lead to a vast majority of identified events being false positives

People cannot go through a thousand events to find the one true positive:

Intrusion Detection Systems

These can be categorized into three main techniques:

Signature-based:
- Identify specific patterns that match known-bad patterns
- Fast and low false-positive rate
- Can only detect known attacks
- Need examples of malicious traffic
- Rules defined at the low-level
- The more rules you define, the more resources are required to monitor the traffic occurring in real-time
Specification-based:
- Events deviate from per-application specifications of legitimate actions
  - Model the normal trends; anything outside that raises an alarm
- Uses manually-developed, system- or protocol-specific specifications
- Deep understanding of the system required
- Behavioral model of the system required
  - Needs a working implementation, possibly in active use, or a similar application to model this
- Can detect new attacks
- Not flexible to changes to the system - requires security-oriented testing during development
Anomaly-based:
- Machine learning used to record the steady state: requires recording of current system behavior
- Cannot detect anomalies that were present during the training stage
- Can detect new attacks
- May have a higher false-positive rate compared to other techniques
  - If expected behavior wasn’t captured during training
  - Changes to the system will require re-training
    - Difficult to model what is normal behavior - what if it is currently being attacked?

Factors to consider:

Initial setup and deployment
Types of detected events
Flexibility to new events
Extensibility by practitioners

Snort:

Low-level IDS
Packet sent to decoder:
- Determines packet protocol (e.g. IP, TCP)
- Checks for malformed packets, anomalies in the header
Preprocessors:
- Checks for IPs that are banned etc.
- Deals with IP refragmentation etc.
- Processes and normalizes the data into a standardized format
Detection:
- Snort rules, custom detectors applied
Log and verdict:
- Discard and log the packet, or send it to the downstream machine

All IDSes have a pipe-and-filter architecture, with the fastest, most basic rules being applied first to remove the most obvious bad packets.

Networking

LANs:

Hubs: broadcasts packets to all connected devices
- An infected device can monitor all traffic
Switches: sends packets to only the receiver
- Isolates traffic and may allow firewall rules to be applied

Ethernet:

Packet sniffing
Hardware-supported sniffing
- Port mirror: switch duplicates traffic to another device
- Test access port (TAP): port which reads the traffic going through the device

TCP:

Three-way handshake
SYN, SYN/ACK, ACK
SYN: send initial sequence number
ACK: acknowledge receipt of the sequence number
Each packet increments the sequence number

DDoS:

TCP doesn’t validate the sender’s IP address: can use this to make servers flood the victim with packets

IDSs contain rules to detect suspicious activities:

Ingress: filter packets entering the network/machine
Egress: filter packets leaving the machine/network

DNS poisoning:

Inject fake DNS entry to DNS server
DNS servers synchronize with each other, so the poisoned entry spreads
https://www.eweek.com/cloud/dns-poisoning-suspected-cause-of-huge-internet-outage-in-china/