Lecture 3

Last Lecture Recap

In our previous lecture, we delved into the concept of binary instrumentation, with a particular focus on dynamic binary instrumentation. This technique is foundational for conducting dynamic taint analysis, a specialized form of analysis that builds upon it. We explored how this approach was initially proposed in a pioneering paper aimed at detecting exploits. The methodology not only involved detection but also leveraged the data gathered during dynamic analysis to create simple signatures capable of thwarting exploitation attempts.

The core mechanism of dynamic taint analysis involves marking untrusted input as "tainted" and then tracking how this taint propagates throughout the execution of software or binaries. The key is to monitor whether this tainted data reaches critical points or "dangerous sinks" within the system. These sinks could include:

Control flow addresses
Format strings
Specific system calls

This detection technique is notably generic and does not depend on pre-defined patterns or signatures, thus allowing it to identify previously unknown or zero-day attacks.

Challenges in Dynamic Taint Analysis

While the high-level concept is straightforward, the implementation is fraught with challenges—the details of which can be quite intricate.

Pointer Tainting

One significant challenge is "pointer tainting." Consider the following scenario: you have a move instruction, MOV EAX, [EBX + 4], where the memory source operand is [EBX + 4]. If EBX is tainted, should EAX be tainted as well?

Argument Against Tainting EAX: The specific memory location being accessed is not tainted, so there’s no immediate reason to taint the destination register, EAX.
Argument For Tainting EAX: Since EBX is tainted, the attacker potentially controls the memory address being accessed. This implies that the data fetched could be influenced by the attacker, justifying the tainting of EAX.

A legitimate use case that illustrates the necessity of pointer tainting is character encoding conversions, such as converting ASCII to Unicode or other encoding schemes like Base64. These conversions often use lookup tables or conversion tables, where the index is used to retrieve the corresponding character. In such scenarios, allowing pointer tainting is essential to accurately track the flow of tainted data.

Pointer tainting can lead to "taint explosion," where a piece of data becomes influenced by other data points, not through direct interaction but via indirect influence. This can happen, for example, when a value is used as an index to look up another table, spreading the influence further. Over time, this influence, or "taint," can propagate extensively throughout a system, if not properly managed.

This phenomenon poses a significant challenge, particularly when undertaking full-system taint analysis. This type of analysis doesn't just focus on a single process but rather examines how influence spreads across an entire system. Without adequate control measures, tainted data can become pervasive, affecting nearly all parts of the system.

Moreover, the issues of overtainting and undertainting further complicate matters. Pointer tainting can inadvertently lead to overtainting problems, where data items become tainted when they shouldn't be, according to predefined standards. Conversely, undertainting occurs when tainting is not recognized, even when it should be.

Granularity of Taint Tracking

When dealing with these challenges, one crucial consideration is the level of granularity at which taint tracking occurs. The precision of tint tracking can be at the bit, byte, or word level:

Bit-level precision offers the highest precision, identifying precisely which bits are tainted. However, it requires more data storage and computational resources, making it complex to implement.
Byte-level precision strikes a balance, where any tainted bit within a byte marks the entire byte as tainted. This is a common choice due to its moderate resource demands and relatively straightforward implementation.
Word-level precision is the coarsest, where if any byte or bit in a word is tainted, the whole word is considered tainted. This can result in more overtainting due to its broad classification.

In my experience, byte-level precision is often the most practical choice. It provides a reasonable compromise between precision and resource usage. Bit-level precision, while the most accurate, entails significant overhead in terms of data storage and management. The choice of precision directly impacts the effectiveness and efficiency of the tint analysis, underscoring the importance of a well-thought-out approach to manage data influence within software systems.

Investigating Bit-Level Tainting

By increasing precision to the bit level, the problem of taint propagation can be tackled more precisely. In considering logical operations, such as the OR operation, we need to account for taint propagation rules. The simplest strategy is in-place propagation, where the taint status of input operands directly affects the output.

The input's taint status is transferred to the output. For instance, a shadow register can indicate which bits are tainted: a value of 1 means tainted, while 0 means not tainted. During operations, this register is effectively copied to the destination. However, it's crucial to consider the actual values involved in these operations, especially when dealing with constants. Two extreme cases arise when performing an OR operation:

OR with Zero: The operation does not change the value, behaving like a simple assignment. In this case, the taint status remains unaffected.
OR with Minus One: Every bit is set to one, effectively clearing the taint status. This happens because setting a bit to one overrides the previous taint status.

When developing taint rules, it is essential to consider both the taint status and the concrete value of bits. This dual consideration allows for more precise calculations of the taint result:

If a bit's concrete value is 1, and it is ORed with another value, that bit should not be considered tainted.
Conversely, if the concrete value is 0, the resulting taint status depends on the other operand's taint status.

Detailed Instructions and Taint Propagation

Memcheck operates at the bit level to ensure precision. It employs various rules to track how taints affect data, making it highly accurate in its analysis. Operations like subtraction can have intricate taint propagation rules. For instance, subtracting a constant from a tainted value might affect the upper bits more than the lower ones. Instructions involving bit shifts can alter the direction of taint propagation. For example, a left shift might propagate taints upwards, while a right shift might do the opposite.

Different tainting systems have been developed, each with its own set of rules:

General Rules:
- "Up" signifies taints moving to higher bits.
- "Down" indicates a downward taint propagation.
- "All Around" means that if any part of a byte is tainted, the entire byte is considered tainted.
- "In Place" suggests that taints remain localized without affecting neighboring bits.
Examples of Tainting Systems:
- Simple Tainting: My initial approach was a straightforward system where tainting was applied broadly; if any byte was tainted, all bits of that byte were considered tainted.
- Memcheck: Known for its precision, Memcheck uses a variety of rules including special conditions that take into account the specific values of data, allowing for more nuanced taint propagation.
- DECAF: In the process of optimizing and improving rules for calculating bit precision, my second attempt, aptly named "DECAF," focuses on refining these existing rules. Initially, the approach involves reviewing all the rules from MemeCheck, identifying those that can be enhanced. This is done through formal verification to ensure both soundness and precision.

Formal Verification of Taint Rules

Key Concepts:

Soundness: This ensures that if a bit should be tainted, it must indeed be tainted.
Precision: This ensures that if a bit should not be tainted, it remains untainted.

A rule is deemed perfect when it is both sound and precise, indicating no further improvements are necessary. Through this verification process, we discovered certain instructions with room for improvement.

Example of Rule Optimization:

One such instruction is the decrement operation (DEC). MemeCheck originally used a propagation method, but given that a decrement operation merely reduces a value by one, we devised a more precise rule specific to this operation. While these improvements may seem minor, formal verification of rule correctness is essential for ensuring overall system reliability.

Implementation Details:

The rules we enhanced are represented using pseudo code, illustrating the logic and calculations involved. For instance, an addition with a carry flag involves several computations, like calculating a mask, to ensure precision at the bit level.

Practical Application:

To demonstrate the implementation within DECAF, consider a simple example involving TCG IR (Tiny Code Generator Intermediate Representation) instructions. These include operations like moving immediate values to bit values, logical AND operations, and storing values in memory. Despite starting with only six TCG instructions, the tinting process involves numerous additional instructions, highlighting the complexity and the necessity of careful rule optimization.

Overall, the DECAF system exemplifies how bit-level precision can be achieved and verified through refined rules and rigorous formal verification, ensuring the robustness of the tinting process.

Instruction Insertion and Shadowing

When dealing with low-level instructions such as MOV or AND, it is often necessary to insert additional instructions before the original ones. This is done to preserve the concrete values of registers before executing the primary instruction, ensuring that subsequent operations have the correct data to work with. For example:

Shadowing: A temporary register may be used to store the value of another register, like EAX, before it is modified by an instruction. This technique is critical when the instruction needs the original value that might change during execution.
Example with AND Instruction: For an AND operation, nine additional instructions may need to be inserted beforehand to correctly manage the data dependencies and ensure proper execution.

Comparison of Implementations

Comparing different implementations can highlight improvements and trade-offs. A later implementation, referred to as DECAF, was compared with an original one, TEMU. These comparisons are crucial for understanding the impact of changes and optimizations made in the codebase.

The interaction between the guest system and the shell involves sending keystrokes that propagate through the system, affecting all layers from the kernel to applications. In testing environments for both Windows and Linux systems, commands such as DIR and CD are sent to the command console. These commands traverse the system layers, illustrating the flow and handling of input.

DIR Command: The DIR command, used for listing files in the current directory, showed 207 tainted bytes in DECAF. In TEMU, the number of tainted bytes was approximately 600 bytes. This indicates a significant reduction.
Find Command: The find command saw a remarkable reduction from around 6000 tainted bytes to nearly 1000 tainted bytes, illustrating a sixfold decrease in data size. This highlights the effectiveness of our optimization techniques.

A critical aspect of our analysis involved monitoring tainted EIP (Extended Instruction Pointer). Initially, we acknowledged that if untrusted input influences the EIP or the program counter, it signifies a potential attack vector. However, in practice, this is not always a definitive indicator. In DECAF, we noted 10 instances where the EIP was influenced by untrusted input, specifically keystrokes. Conversely, in TEMU, no such instances were observed.

In TEMU, I implemented controls over the levels of pointer tainting allowed. In DECAF, no such restrictions were in place. Upon deeper investigation into cases where EIPs were influenced by data, it turned out these instances were legitimate. Specifically, certain characters were used to determine function pointers within the Linux kernel. Thus, these EIP changes were not malicious but rather correct in context. It means TEMU has both over-tainting and under-tainting problems.

Limitations of Taint Analysis

Taint analysis, while helpful, is not infallible. It can yield false positives and negatives, indicating that it's not a silver bullet for detecting exploits.

Full-System Dynamic Binary Analysis

Motivation

Moving on, our focus will shift to comprehensive system dynamics, particularly the full system dynamic taint analysis. Previously, our discussions revolved around dynamic binary instrumentation and dynamic taint analysis at the user program level. These techniques involve monitoring and analyzing the behavior of individual programs. However, our new focus will be on analyzing entire systems, encompassing not just individual programs but the entire operating system, including the kernel and all running applications.

The necessity for full system analysis arises from various needs, such as malware analysis. Malware often operates beyond the scope of a single program. It can attack kernel space, load malicious kernel modules, or inject code into other processes. This complexity requires a more holistic approach to analysis.

For example, when performing dynamic analysis in a sandbox environment, we must consider that the environment itself is a virtual machine. Malware doesn't behave like typical applications; it often targets the kernel directly, sometimes through mechanisms like kernel rootkits. These rootkits are designed to compromise the kernel, illustrating the need for a broader analysis approach that includes not just the application layer but also the underlying system infrastructure.

Analyzing kernel malware, often referred to as "kernel rootkits," requires comprehensive system monitoring, not just the examination of individual user applications. This is because these rootkits operate within the kernel space, remaining invisible to both users and applications. When conducting vulnerability analysis, it's crucial to focus not only on user-level applications but also on the operating system's kernel. While compromising applications can have severe consequences, compromising the kernel can be even more catastrophic, as it allows attackers to fully control the system.

Traditional binary instrumentation tools like Pin or Valgrind are ineffective for the kernel, necessitating the use of virtual machine-based approaches to perform kernel analysis. This method provides the necessary environment to examine how malware impacts the system at the kernel level.

Embedded systems, which can either have their proprietary operating systems or be based on Linux or real-time operating systems, require similar analysis approaches as desktop or server systems. These systems include both a kernel and user-level applications. However, due to the customization of many components in embedded systems, it is imperative to analyze the entire system rather than individual components. Simply extracting a Linux binary from an embedded system and running it on a desktop is often impractical, as it may not function correctly due to system-specific customizations.

For example, in an embedded system like a router, the web server component might be intricately tied to the system's architecture. Attempting to run this component on a different architecture (e.g., ARM on an x86 system) through emulation often fails because of unique customizations. These might include direct mappings of ROM into the user space, which allows components to access configurations directly from memory locations that are unavailable when isolated and run on a different system.

To effectively emulate or run such customized embedded systems, one must ensure that the entire environment, including its specific architecture and memory mappings, is accurately replicated. This approach maximizes the chances of successful emulation and analysis.

When it comes to emulating firmware, a comprehensive approach is to emulate the entire system. This is particularly relevant for applications requiring in-depth analysis, such as malware detection. Tools like QEMU, which offers both full system emulation and user mode, are instrumental in this process. They allow for the emulation of entire systems, providing an environment conducive to thorough instrumentation and analysis.

Malware Analysis

One notable application of system emulation is in the field of malware analysis. This concept is detailed in my paper, "Panorama: Capturing System-wide Information Flow for Malware Detection and Analysis," published 18 years ago. The core idea revolves around monitoring and analyzing the behavior of a system to detect malicious activities.

Example: Keylogger Detection

Let's consider a practical scenario involving a Windows system placed in a virtual machine environment with QEMU. By doing so, we can monitor all execution behaviors within this system. Imagine sending keystrokes, such as entering passwords into an authentication window. In a normal situation, these keystrokes pass through the authentication process without issues.

However, if malware like a keylogger is present, it could capture these keystrokes through various methods. Keyloggers might inject themselves into the kernel space, integrate with the authentication process, or use certain Windows APIs to intercept keystrokes from another process. Each of these methods presents a potential breach.

By employing taint analysis, we can track the propagation of these keystrokes within the system. This analysis helps in identifying any irregularities in the keystroke flow, indicating the presence of a keylogger. The approach allows for the detection of such malware by observing suspicious behaviors in the propagation path.

Application to Kernel Rootkits

The same principles can be applied to detect other types of malware, such as kernel rootkits. Some rootkits attempt to conceal files or folders within the file system. By monitoring file system access, forensic tools can identify the inodes and other filesystem structures, revealing hidden activities.

Emulating the full system and utilizing comprehensive analysis tools enable us to detect and understand malware behavior at a granular level, enhancing our ability to secure systems against such threats.

Virtual Machine Introspection

It centers around the idea of observing a virtual machine's internal operations from an external vantage point, specifically the hypervisor. This approach allows us to monitor the behavior of the virtual machine without altering or interfering directly with the guest operating system. Let's break down how this can be effectively implemented and why it's beneficial.

The primary goal of employing such techniques as virtual machine introspection (VMI) is to gain a comprehensive understanding of the system's behavior, including the interactions between various software components and the underlying hardware. This is crucial for analyzing and detecting anomalies, such as those introduced by malicious software like rootkits, which may attempt to obscure their presence by manipulating system data or intercepting system operations.

Key Considerations in VMI Implementation

Intercepting Events: By monitoring critical events like system calls, context switches, page faults, and breakpoints, you can gather essential data about the system's operations. These events are pivotal as they represent the interaction points with the system's resources and can reveal suspicious or unauthorized activities.
Mapping Hardware to Operating System Level: Understanding the system at the hardware level is not enough. For meaningful analysis, you need to map these low-level operations to higher-level processes, such as identifying which API calls are made by specific applications or kernel modules. This involves determining whether a process is running in kernel space or user space and identifying the nature of these processes.
Identifying System Components: In an operating system like Windows, distinguishing between various components such as the NT kernel (ntoskrnl.exe), the windowing system (win32k.sys), and other device drivers is crucial. Similarly, recognizing user-level applications like winlogon.exe or a web browser helps in understanding the flow of operations and potential attack vectors.

Benefits of VMI

Non-Intrusive Monitoring: VMI enables the monitoring of virtual machines without the need for intrusive modifications, allowing for the unobtrusive observation of system behavior. This is particularly advantageous for analyzing systems in a production environment where uptime and stability are critical.
Enhanced Security Analysis: By observing the system from the hypervisor level, VMI provides a vantage point that is typically out of reach of malicious software operating within the guest OS, making it a powerful tool for security analysis and threat detection.
Versatility: The ability to observe a wide range of events and system interactions makes VMI a versatile tool for various applications, from debugging and performance monitoring to security analysis and digital forensics.

In essence, VMI is a powerful technique that leverages the hypervisor's capabilities to provide deep insights into a virtual machine's operations without the need for direct interaction with the guest OS. This method not only enhances the security posture by uncovering hidden threats but also enables a more comprehensive understanding of system dynamics.

In virtualization, whether using tools like KVM, Hyper-V, or VirtualBox, the hypervisor can be configured to intercept specific events. This interception allows us to monitor and manage virtualized environments effectively. When using QEMU in emulation mode, we engage in dynamic binary translation and instrumentation, enabling us to monitor all executed instructions, albeit at the cost of speed.

Understanding the underlying data structures in the operating system's memory is crucial. Each process within an OS has its dedicated page directory base address, a concept familiar to those who have studied operating systems and virtual memory. This address is stored in a specific register, such as CR-3 on x86 architectures or TTPR on ARM architectures. By mapping the CR-3 register to a process's data structure, we can parse this structure to discern the current processes running on the system.

For different operating systems, different data structures are used to manage processes. In Windows, this is typically the EPROCESS structure, while in Linux, it's a task struct. Modern tools for virtual machine introspection (VMI) and memory forensics provide profiles detailing these structures and their offsets, helping us identify fields such as process names, page table base addresses, and more.

The current process pointer, a vital element, is often located at a known offset in the stack for Windows or accessible through the GS register segments in Linux. Once the current process structure is identified, it can be parsed to extract critical information such as the process name, process ID (PID), page table base address, and details about loaded modules, which encompass the main executable and shared libraries.

By leveraging this information, we can determine the origin of specific instructions, whether they stem from shared libraries or kernel modules, thus providing a comprehensive view of the system's operation. This capability is particularly valuable for security and debugging purposes, as it allows us to peek into the virtual machine and understand its behavior in detail.

An Example: Google Desktop

Google Desktop is an example of how certain applications can collect and process private information from users. This application consists of multiple modules and processes rather than functioning as a standalone entity. These include:

Processes and Libraries: Google Desktop includes various processes like a crawler and shared libraries such as GLEP.dll, which is used for data compression.
Data Collection: When a user visits a webpage using a browser, Google Desktop can collect data through these processes and libraries, which then send information back to a Google server. This is confirmed by resolving IP addresses to domain names owned by Google.
Information Flow: While not necessarily malicious, this behavior confirms the flow of information and highlights how different components process and propagate this data.

Dynamic Binary Analysis for Android Systems

The concept of dynamic binary analysis, particularly in Android systems, was explored in a research paper published in 2012 titled "DroidScope." This paper focused on reconstructing the semantics of Android systems to analyze malware dynamically.

Android Architecture: The Android operating system is built on a Linux kernel. Above this kernel are system services, including a key process called Zygote, which acts as a template for Android applications.
Application Layer: Android applications are primarily written in Java, although other scripting languages like Kotlin are also used today. During the period discussed, Java was predominant, and applications used Dalvik bytecode instructions to operate within the Android environment.
Dalvik Virtual Machine (VM): This VM was specifically designed to run Java applications on Android devices, bypassing certain intellectual property restrictions from Oracle. It translates Java bytecode into Dalvik bytecode.
Native and Java Components: Applications can consist of both Java components and native components. The Java Native Interface (JNI) is used to allow Java applications to call native code components, which can enhance performance or provide access to system resources.

Importance of Multi-Layer Architecture

Holistic View: By examining both the native and Java level views, we gain a complete understanding of how applications execute within the Android system and interact with other components.
Tool Utilization: Tools such as ADB (Android Debug Bridge) and static analysis tools can analyze Java bytecode. However, a multi-layered approach provides deeper insights, offering a more comprehensive view of application behavior.

Dynamic Instrumentation and Analysis

Dynamic instrumentation allows us to perform various analyses within the Android environment:

API Tracing: This enables the monitoring of API calls made by the application.
Instruction Trace Collection: We can collect traces of both native and Dalvik instructions.
Taint Analysis: Conducted at the hardware instruction level, this analysis helps track the flow of sensitive data and can be mapped back to both OS and Java levels.

Practical Demonstrations

To illustrate these concepts, consider the task identification in an Android environment:

Process Identification: By using commands in QEMU (Quick Emulator), such as ps, you can list all running processes, including their process IDs (PIDs) and thread IDs (TIDs). This provides a detailed view of the threads active within each process.
Application Names: A challenge arises with application names due to the limited column field in task structures. This limitation means you might not obtain full application names using standard tools. Instead, the full name needs to be fetched from the Java components directly.

Java and Dalvik Views

The Dalvik instruction set, which Android applications compile into, consists of 256 opcodes. Parsing Dalvik bytecode is essential for understanding application execution, although the specifics of this process are complex and specialized.

Offset addressing is a technique employed to map opcodes to their corresponding implementations. Understanding this mapping is crucial for recovering the bytecode, which is the instruction set for virtual machines like Dalvik in Android.

Just-In-Time Compiler and Bytecode Recovery

Just-In-Time (JIT) Compiler: Android uses a JIT compiler to enhance the efficiency of application execution. However, JIT compilation can obscure the original bytecode, making it challenging to perform detailed analysis.
Disabling JIT: For accurate bytecode recovery, JIT can be disabled. This forces the system to revert to interpretation mode, where the bytecode is consistently available. Although this may degrade performance, it provides a clearer view for analysis.

Ahead-Of-Time Compilation

ART (Android Runtime): Modern Android versions utilize ART, which includes ahead-of-time (AOT) compilation. Similar to JIT, ART can complicate bytecode analysis. However, disabling ART allows the bytecode to remain accessible, facilitating comprehensive analysis.

Analyzing Android Malware

To illustrate the importance of dynamic analysis in malware detection, we examined a case study of an old Android malware known as "Joy Kung Fu."

Dynamic Analysis: Joy Kung Fu malware contains encrypted payloads that cannot be detected without running the application in an emulated environment. Dynamic analysis allows for observation of these hidden payloads.
Components: The malware includes native components alongside Java components, requiring robust tools for effective analysis.

Data Flow and Taint Propagation

A critical aspect of malware analysis is understanding taint propagation, which tracks the flow of sensitive information through the application.

Source and Propagation: In the case of Droid Kung Fu, the source of sensitive data is the getDeviceID API, commonly targeted by malware to steal device information.
Tracking: By marking the return value of this API, analysts can trace how the data propagates through the application:
The data enters a string object.
It's processed by various methods within the app.
Eventually, it forms part of a network request, indicating an attempt to exfiltrate data via HTTP.
The taint ultimately reaches a system call (SYS_WRITE), confirming data exfiltration attempts.

This comprehensive analysis shows how full-system examination can reveal the entire lifecycle of data within a malicious application, offering a detailed understanding of its operation and intent.

Android Malware Example: DroidDream

Malware Characteristics: The malware, similar to Droid Kung Fu, operates by employing simple XOR encryption to conceal private data before it is leaked.
Data Flow: The malware retrieves sensitive information, such as the subscriber ID, which is processed through various Java methods and eventually converted into a raw byte format. This data is then encrypted and sent out using system calls.

Analytical Techniques

Dynamic Analysis: This approach is crucial for understanding the full system behavior of malware. By collecting instruction traces, we can gain insights into how malware interacts with the system at a low level.
Instruction Tracing: Through tracing, we can identify specific instructions, like XOR operations, and analyze how memory operations are performed. This helps in understanding how malware encrypts data and executes its payload.

Case Study: Real Exploit Analysis

Exploit Understanding: The lecture touched upon analyzing a real exploit that can potentially root an Android device. This involves observing how encrypted data is used and how the exploit manipulates system privileges.
Privilege Escalation: An example discussed was the use of the setUID function, which can be manipulated to escalate privileges by forcing verification failures in system processes.

Challenges and Opportunities

Encrypted Malware: Encrypted malware poses a significant challenge, as it requires execution to fully understand its behavior. Dynamic analysis becomes a key tool in such scenarios.
Tracing and Analysis: By tracing instructions and system calls, we can uncover detailed behaviors and patterns that are not immediately obvious through static analysis.

Symbol Information

In the realm of software systems, understanding symbol information is crucial for analyzing and understanding how different components interact within a program. On operating systems like Windows or Linux, symbol information is embedded directly in the binary files. These symbols include exported and imported functions, and their names are stored in symbol tables, which allow you to track and identify them.

For Java programs, a similar concept applies. In Java bytecode, strings and method names are embedded within the bytecode itself. By parsing this bytecode, you can extract these strings and store them in a symbol database. This database helps identify Java methods and object types, making it easier to understand the program's structure and behavior.

While I intended to show a video demonstration to illustrate these concepts, technical difficulties prevented it from playing. The video was meant to provide a concrete example of how tools like JoyScoop operate, which is instrumental in system analysis and security research.

Overall, understanding symbol information and how to extract it is essential for system security research, enabling you to analyze and tackle various problems more effectively. With this foundational knowledge, you will be better equipped to apply these concepts in practical scenarios and future research endeavors.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search