Repository: cslarsen/stack-machine Branch: master Commit: da2c5b325784 Files: 31 Total size: 49.1 KB Directory structure: gitextract_6bwf0lp8/ ├── .gitignore ├── Makefile ├── README.md ├── compiler.cpp ├── error.cpp ├── fileptr.cpp ├── include/ │ ├── compiler.hpp │ ├── error.hpp │ ├── fileptr.hpp │ ├── instructions.hpp │ ├── label.hpp │ ├── machine.hpp │ ├── parser.hpp │ ├── upper.hpp │ └── version.hpp ├── instructions.cpp ├── machine.cpp ├── parser.cpp ├── sm.cpp ├── smc.cpp ├── smd.cpp ├── smr.cpp ├── tests/ │ ├── core-test.src │ ├── core.src │ ├── fib.src │ ├── forward-goto.src │ ├── func.src │ ├── hello.src │ ├── todo-print.src │ └── yo.src └── upper.cpp ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.o *.sm sm smc smd smr ================================================ FILE: Makefile ================================================ CXXFLAGS = -g -W -Wall -Weffc++ -Iinclude LINK.o = $(LINK.cc) TARGETS = instructions.o parser.o error.o upper.o fileptr.o machine.o compiler.o sm.o smr.o smc.o smd.o sm smr smc smd all: $(TARGETS) @echo Run \"make check\" to test package %.sm: tests/%.src ./smc $< smr: instructions.o machine.o upper.o fileptr.o smr.o smc: instructions.o machine.o upper.o error.o fileptr.o parser.o compiler.o smc.o smd: instructions.o machine.o upper.o error.o fileptr.o smd.o sm: instructions.o machine.o upper.o error.o fileptr.o parser.o compiler.o sm.o check: all ./sm tests/fib.src ./smc tests/fib.src ./smr tests/fib.sm ./smc tests/hello.src ./smr tests/hello.sm ./smc tests/forward-goto.src ./smr tests/forward-goto.sm ./sm tests/yo.src ./sm tests/func.src cat tests/core-test.src tests/core.src | ./sm - clean: rm -f $(TARGETS) *.stackdump tests/*.sm ================================================ FILE: README.md ================================================ Stack-Machine ============= This project contains * A simple, stack-based virtual machine for executing low-level instructions * An assembler supporting a Forth / PostScript like language * An interpreter able to run compiled programs Architecture and design ----------------------- The instructions are fixed-width at 32-bits and so are the arithmetic operands. By default, programs have 1 million cells available for both program text and data. This means that a virtual machine memory takes up 4MB plus the data and instruction stacks. The text and data regions are overlapped, so you can easily write self-modifying code (early versions actually required self-modification to be able to return from subroutine calls, just like Knuth's MIX, but I've since taken liberty to add such modern convenience into the core instruction set). There are no registers. This _is_ a stack machine, after all. As we know from theoretical computer science, a pushdown automaton needs _two_ stacks to be Turing equivalent. Therefore we employ two as well; one for the instruction pointer and one for the data. They live separately from the text and data region, and are only limited by the host process heap size. The machine contains no special facilities besides this: It's inherently single-threaded and has no protection mechanisms. Its operation is completely sandboxed, though, except for access to standard output. Aim --- The project aim was to create a simple machine and language to play around with. You can benefit from it by reading the source code, playing with a language similar to Forth, but conceptually simpler, and finally by seeing how easy it is to build your own system. The programming language ======================== The language is very similar to Forth and PostScript: You basically write in RPN --- reverse Polish notation. Anything not recognized as an instruction is put on the data stack, so to put the numbers 3 and 2 on the stack, just write 3 2 To multiply them, just append with an asterix: 3 2 * ; multiplication This operation pops the topmost two numbers on the stack and replaces them with the result of the multiplication. To run such a program, you'd need to include the core library first, since multiplication is defined as a function: $ cat tests/core.src your-file.src | sm 6 Labels, addresses and their values ---------------------------------- Labels are identifiers ending with a colon. They refer to a particular cell in the machine, and you can access their position, value or execute code from that cell location: label: ; create a label for the cell at this location &label ; put ADDRESS of label on top of stack &label LOAD ; put VALUE of label's cell "label" on top-of-stack label ; EXECUTE code from label position So, to put the _address_ of a label on the top of the data stack, just prepend the label name with an ampersand. If you want the _value_ of an address, put the address on the TOS (top of stack) and use the `LOAD` instruction to replace the TOS with the value at the given cell location. When executing code at a given label position, the machine first puts the address of the next instruction on top of the instruction stack. This way you can return from a function call by using the instruction `POPIP`: main: ; program start print-dot print-dot HALT print-dot: '.' OUT '\n' OUT POPIP ; return from "function" Variables and subroutines ------------------------- An idiom for creating variables is to create labels and putting a `NOP` at that location to reserve one memory cell to hold variables. An example of using a counter variable to implement a loop is given below. counter: NOP ; reserve 1 word for the variable "counter" program: 2 &counter STOR ; set counter to two &counter LOAD 1 ADD &counter STOR ; increment counter by one ; loop counter+1 times display: '\n' '*' OUT OUT ; print an asterix 1 &counter LOAD SUB &counter STOR ; decrement counter by one &display &counter LOAD JNZ ; jump to display if not zero The output of the above program is three stars: $ ./sm foo.src * * * You can forward-reference labels. In fact, another idiom is to jump to the main part of the program at the start of the source. Hello, world! ------------- You can do `72 OUT` to print the letter "H" (72 is the ASCII code for "H"). Cutting to the chase, a program to print "Hello!" would be: ; Labels are written as a name without whitespace ; and a colon at the end. main: 72 out ; "H" 101 out ; "e" 108 dup out out ; "ll" 111 out ; "o" 33 out ; "!" ; newline '\n' out 42 outnum ; print a number '\n' out ; and newline ; stop program halt Notice the use of the `HALT` instruction to stop the program. Multiplication and core library ------------------------------- I've implemented a multiplication function in the core library in `tests/core.src`: mul: ; ( a b -- (a*b) ) mul-res: nop ; placeholder for result mul-cnt: nop ; placeholder for counter mul-num: nop &mul-cnt stor ; b to cnt dup &mul-res stor ; a to res &mul-num stor ; and to num mul-loop: ; calculate res += a &mul-res load &mul-num load + &mul-res stor ; decrement counter &mul-cnt load -1 &mul-cnt stor ; loop until counter is zero &mul-cnt load &mul-loop swap -1 jnz &mul-res load popip ; ... *: ; alias for mul mul popip Note that this function needs definitions for the functions `+` and `-1`. Recall the program to multiply two numbers. Put the following in a file `hey.src`: 3 2 * outnum '\n' out halt If we concatenate the core library with our program, we get: $ cat tests/core.src hey.src | ./sm 6 You could implement the whole program without depending on the core library: ; semi-obfuscated multiply and print ; does not depend on any libraries ; re-inventing the wheel can be very educational! main: 12345 67890 * outnum '\n' out halt ; multiplication function w/inner loop *: R: nop C: nop N: nop &C stor dup &R stor &N stor *-loop: &R load &N load add &R stor 1 &C load sub &C stor &C load &*-loop swap 1 swap sub jnz &R load popip While implementing the Karatsuba algorithm should be quite easy, Toom-Cook multiplication is left as an exercise for the reader. It's not a joke --------------- I think I need to clarify that this project is actually not a joke. Fun, absolutely, but not a joke. I just wanted to create a simple virtual machine and from that I grew a language. It's very similar to Forth and PostScript, and we all know those are extremely powerful --- particularly Forth! Building stuff yourself is a powerful way of learning. A Fibonacci program ------------------- The following is a program to generate and print Fibonacci numbers, taken from `tests/fib.src`: ; Print the well-known Fibonacci sequence ; ; Our word size is only 32-bits, so we can't ; count very far. ; Program starts at main, so jump there &main jmp ; Create label 'count', which refers to this memory ; address. ; ; The NOP (no operation; do nothing) is only used ; to reserve memory space for a variable. count: nop ; Initialize the counter by storing 46 at the address of 'count'. ; ; POPIP will pop the instruction pointer, effectively jumping to ; the next location (probably the caller). count-init: 46 &count stor popip ; Shorthand for loading the number at 'count' onto the top of the stack. ; ; The "( -- counter)" comment is similar to Forth's comments, explaining ; that no number is expected on the stack, and after running this function, ; a number ("counter") will be on the stack. count-get: ; ( -- counter ) &count load ; load number popip ; Shorthand for decrementing the number on the stack dec: ; ( a -- a-1 ) 1 swap sub popip ; Store top of stack to 'count', do not alter stack count-set: ; ( counter -- counter ) dup &count stor popip ; Decrement counter and return it count-dec: ; ( -- counter ) count-get dec count-set popip ; Print number with a newline without altering stack show: ; ( number -- number ) dup outnum '\n' out popip ; Duplicate two top-most numbers on stack dup2: ; ( a b -- a b a b ) swap ; b a dup ; b a a rol3 ; a a b dup ; a a b b rol3 ; a b b a swap ; a b a b popip jump-if-nonzero: ; ( dest_address predicate -- ) swap jnz popip ; The start of our Fibonacci printing program main: count-init 0 show ; first Fibonacci number 1 ; second Fibonacci number loop: ; add top numbers and show ; a b -> a b a b -> a b (a + b) dup2 add show ; decrement, loop if non-zero count-dec &loop jump-if-nonzero Convenience features -------------------- I've added a `HALT` instruction. This replaces the old idiom of looping forever to signal that a program was finished: stop: stop ; form 1 stop: &stop jmp ; form 2 halt ; convenience form Originally, it was an argument of minimalism for not including any halt instructions. Secondly, I've added a `POPIP` instruction along with automatically storing the next instruction before performing a jump. This effectively let's you call and return from subroutines: boot: &main jmp halt foo: bar: baz: '\n' '!' 'e' 'c' 'i' 'u' 'j' 'e' 'l' 't' 'e' 'e' 'B' out out out out out out out out out out out out out popip main: foo bar baz Third, I never bothered to write my own print number function, because it would require me to write both division and modulus functions in source first. So I implemented `OUTNUM` that prints a number to the output: 123 OUTNUM '\n' OUT ; prints "123\n" Lacking is proper string handling. One could say that string handling is not this language's strongest point. Compiling the project ===================== To compile and run the examples: $ make all check To see the low-level machine instructions: $ ./smr -h To execute source code on-the-fly: $ ./sm filename To compile source to bytecode: $ ./smc filename The assembly language is not documented other than in code, because I'm actively playing with it. Although the interpreter is slow, it should be possible to convert stack operations to a register machine. In fact, it should be trivial to compile programs to native machine code, e.g. x86. Instruction set --------------- The instructions are found `include/instructions.hpp`: VALUE OPCODE EXPLANATION 0x00000000 NOP do nothing 0x00000001 ADD pop a, pop b, push a + b 0x00000002 SUB pop a, pop b, push a - b 0x00000003 AND pop a, pop b, push a & b 0x00000004 OR pop a, pop b, push a | b 0x00000005 XOR pop a, pop b, push a ^ b 0x00000006 NOT pop a, push !a 0x00000007 IN read one byte from stdin, push as word on stack 0x00000008 OUT pop one word and write to stream as one byte 0x00000009 LOAD pop a, push word read from address a 0x0000000A STOR pop a, pop b, write b to address a 0x0000000B JMP pop a, goto a 0x0000000C JZ pop a, pop b, if a == 0 goto b 0x0000000D PUSH push next word 0x0000000E DUP duplicate word on stack 0x0000000F SWAP swap top two words on stack 0x00000010 ROL3 rotate top three words on stack once left, (a b c) -> (b c a) 0x00000011 OUTNUM pop one word and write to stream as number 0x00000012 JNZ pop a, pop b, if a != 0 goto b 0x00000013 DROP remove top of stack 0x00000014 PUSHIP push a in IP stack 0x00000015 POPIP pop IP stack to current IP, effectively performing a jump 0x00000016 DROPIP pop IP, but do not jump 0x00000017 COMPL pop a, push the complement of a The instruction set could easily be more minimal, even more so if we allowed registers. Also, we have taken absolutely no care about the machine code values for each instruction. A good design would do something cool with that. License and author ================== Placed in the public domain in 2010 by the author, Christian Stigen Larsen http://csl.sublevel3.org ================================================ FILE: compiler.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include "compiler.hpp" #include "parser.hpp" #include "machine.hpp" #include "label.hpp" #include "upper.hpp" void compiler::error(const std::string& s) { if ( callback ) callback(s.c_str()); } bool compiler::islabel(const std::string& s) { size_t l = s.length(); return l<1? false : s[l-1] == ':'; } bool compiler::iscomment(const std::string& s) { return s[0] == ';'; } Op compiler::tok2op(const std::string& s) { return from_s(s.c_str()); } bool compiler::isliteral(const std::string& s) { if ( islabel(s) ) return false; return tok2op(s) == NOP_END; } bool compiler::isnumber(const char* s) { while ( *s ) if ( !isdigit(*s++) ) return false; return true; } bool compiler::ischar(const std::string& s) { size_t l = s.length(); if ( l==3 && s[0]=='\'' && s[2]=='\'' && s[1]!='\\' ) return true; if ( l==4 && s[0]=='\'' && s[3]=='\'' && s[1]=='\\' && (s[2]=='t' || s[2]=='r' || s[2]=='n' || s[2]=='0') ) return true; return false; } char compiler::to_ord(const std::string& s) { size_t l = s.length(); if ( l == 3 ) // 'x' return s[1]; if ( l == 4 ) // '\x' switch ( s[2] ) { case 't': return '\t'; case 'r': return '\r'; case 'n': return '\n'; case '0': return '\0'; } error("Unknown character literal: " + s); return '\0'; } bool compiler::islabel_ref(const std::string& s) { return s[0] == '&'; } int32_t compiler::to_literal(const std::string& s) { if ( isnumber(s.c_str()) ) return atoi(s.c_str()); if ( ischar(s) ) return to_ord(s); return -1; } bool compiler::ishalt(const std::string& s) { return s.empty() || upper(s)=="HALT"; } void compiler::check_label_name(const std::string& label) { if ( upper(label) == "HERE" ) error("Label is reserved: HERE"); } compiler::compiler(void (*cb)(const char*)) : m(cb), forwards(), callback(cb) { } void compiler::set_error_callback(void (*error_callback)(const char* message)) { callback = error_callback; } void compiler::compile_label(const std::string& label) { int32_t address = m.get_label_address(label); m.load(PUSH); // if label not found, mark it for update if ( address == -1 ) { check_label_name(label); forwards.push_back(label_t(label, m.pos())); } m.load(address); } void compiler::compile_function_call(const std::string& function) { // Return address is here plus four instructions m.load(PUSHIP); m.load(m.pos() + 4*m.wordsize()); // Push function destination address -- update it later m.load(PUSH); forwards.push_back(label_t(function, m.pos())); m.load(-1); // just push an arbitrary number // Jump to function m.load(JMP); // This is the return point } void compiler::compile_literal(const std::string& token) { if ( islabel_ref(token) ) { compile_label(token.substr(1)); return; } int32_t literal = to_literal(token); // Literals are pushed on to the stack if ( literal != -1 ) { m.load(PUSH); m.load(literal); return; } // Unknown literals are treated as forward function calls compile_function_call(token); } void compiler::resolve_forwards() { for ( size_t n=0; n #include #include "error.hpp" void error(const char* s) { fprintf(stderr, "\n%s\n", s); exit(1); } ================================================ FILE: fileptr.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include "fileptr.hpp" fileptr::fileptr(FILE *file) : f(file) { if ( f == NULL ) throw std::runtime_error("Could not open file"); } fileptr::~fileptr() { fclose(f); } fileptr::operator FILE*() const { return f; } ================================================ FILE: include/compiler.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include "instructions.hpp" #include "parser.hpp" #include "machine.hpp" #ifndef INC_COMPILER_HPP #define INC_COMPILER_HPP class compiler { machine_t m; std::vector forwards; void (*callback)(const char*); void error(const std::string& s); char to_ord(const std::string& s); int32_t to_literal(const std::string& s); void check_label_name(const std::string& label); static bool islabel(const std::string& s); static bool iscomment(const std::string& s); static Op tok2op(const std::string& s); static bool isliteral(const std::string& s); static bool isnumber(const char* s); static bool ischar(const std::string& s); static bool islabel_ref(const std::string& s); static bool ishalt(const std::string& s); public: compiler(void (*error_callback)(const char* message) = NULL); compiler(parser& p, void (*error_callback)(const char* message) = NULL); void set_error_callback(void (*error_callback)(const char* message)); void compile_label(const std::string& label); void compile_function_call(const std::string& function); void compile_literal(const std::string& token); void resolve_forwards(); bool compile_token(const std::string& s, parser& p); machine_t& get_program(); }; #endif ================================================ FILE: include/error.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ void error(const char* s); ================================================ FILE: include/fileptr.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #ifndef INC_FILEPTR_HPP #define INC_FILEPTR_HPP class fileptr { FILE* f; fileptr(const fileptr&); // deny fileptr& operator=(const fileptr&); // deny public: fileptr(FILE *file); ~fileptr(); operator FILE*() const; }; #endif ================================================ FILE: include/instructions.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #ifndef INC_SMCORE_H #define INC_SMCORE_H enum Op { NOP, // do nothing ADD, // pop a, pop b, push a + b SUB, // pop a, pop b, push a - b AND, // pop a, pop b, push a & b OR, // pop a, pop b, push a | b XOR, // pop a, pop b, push a ^ b NOT, // pop a, push !a IN, // push one byte read from stream OUT, // pop one byte and write to stream LOAD, // pop a, push byte read from address a STOR, // pop a, pop b, write b to address a JMP, // pop a, goto a JZ, // pop a, pop b, if a == 0 goto b PUSH, // push next word DUP, // duplicate word on stack SWAP, // swap top two words on stack ROL3, // rotate top three words on stack once left, (a b c) -> (b c a) OUTNUM, // pop one byte and write to stream as number JNZ, // pop a, pop b, if a != 0 goto b DROP, // remove top of stack PUSHIP, // push a in IP stack POPIP, // pop IP stack to current IP, effectively performing a jump DROPIP, // pop IP, but do not jump COMPL, // pop a, push the complement of a NOP_END // placeholder for end of enum; MUST BE LAST }; extern const char* OpStr[]; const char* to_s(Op op); Op from_s(const char* s); #endif ================================================ FILE: include/label.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include #ifndef INC_LABEL_HPP #define INC_LABEL_HPP struct label_t { std::string name; int32_t pos; label_t(const std::string& name_, int32_t position) : name(name_), pos(position) { } }; #endif ================================================ FILE: include/machine.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include #include #include "instructions.hpp" #include "label.hpp" #ifndef INC_MACHINE_HPP #define INC_MACHINE_HPP class machine_t { std::vector stack; std::vector stackip; std::vector labels; size_t memsize; int32_t *memory; int32_t ip; // instruction pointer FILE* fin; FILE* fout; bool running; void (*error_cb)(const char*); public: machine_t(void (*error_callback)(const char* msg)); machine_t( const size_t memory_size = 1024*1000/sizeof(int32_t), FILE* out = stdout, FILE* in = stdin, void (*error_callback)(const char* msg) = NULL); machine_t(const machine_t& p, void (*error_callback)(const char* msg) = NULL); machine_t& operator=(const machine_t& p); ~machine_t(); void reset(); void error(const char* s) const; void push(const int32_t& n); int32_t pop(); void puship(const int32_t&); int32_t popip(); void check_bounds(int32_t n, const char* msg) const; void next(); void prev(); void load(Op); void load(int32_t n); int run(int32_t start_address = 0); void exec(Op); int32_t* find_end() const; void load_image(FILE* f); void save_image(FILE* f) const; void load_halt(); void showstack() const; size_t size() const; int32_t cur() const; int32_t pos() const; int32_t get_label_address(const std::string& label) const; void addlabel(const char* name, int32_t pos, int lineno = -1); bool isrunning() const; void set_fout(FILE*); void set_fin(FILE*); void set_mem(int32_t adr, int32_t val); int32_t get_mem(int32_t adr) const; int32_t wordsize() const; // instructions void instr_nop(); void instr_add(); void instr_sub(); void instr_and(); void instr_or(); void instr_xor(); void instr_not(); void instr_in(); void instr_out(); void instr_outnum(); void instr_load(); void instr_stor(); void instr_jmp(); void instr_jz(); void instr_drop(); void instr_popip(); void instr_dropip(); void instr_jnz(); void instr_push(); void instr_puship(); void instr_dup(); void instr_swap(); void instr_rol3(); void instr_compl(); }; #endif ================================================ FILE: include/parser.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #ifndef INC_PARSER_HPP #define INC_PARSER_HPP class parser { FILE* f; int lineno; int update_lineno(int c); int fgetchar(); void move_back(int c); void skip_whitespace(); public: parser(FILE* f); int get_lineno() const; std::string next_token(); void skip_line(); }; #endif ================================================ FILE: include/upper.hpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include std::string upper(const std::string& s); ================================================ FILE: include/version.hpp ================================================ #define VERSION "Public domain, 2010-2011 by Christian Stigen Larsen" ================================================ FILE: instructions.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include "instructions.hpp" #include "machine.hpp" #include "upper.hpp" const char* OpStr[] = { "NOP", "ADD", "SUB", "AND", "OR", "XOR", "NOT", "IN", "OUT", "LOAD", "STOR", "JMP", "JZ", "PUSH", "DUP", "SWAP", "ROL3", "OUTNUM", "JNZ", "DROP", "PUSHIP", "POPIP", "DROPIP", "COMPL", "NOP_END" }; const char* to_s(Op op) { if ( op >= NOP && op < NOP_END ) return OpStr[op]; return ""; } Op from_s(const char* str) { std::string s(upper(str)); // slow, O(n/2) seek... :-) for ( int n=0; n(n); return NOP_END; } ================================================ FILE: machine.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include #include "machine.hpp" #include "label.hpp" #include "upper.hpp" machine_t::machine_t( const machine_t& p, void (*error_callback)(const char*)) : stack(p.stack), stackip(p.stackip), labels(p.labels), memsize(p.memsize), memory(new int32_t[p.memsize]), ip(p.ip), fin(p.fin), fout(p.fout), running(p.running), error_cb(error_callback) { memmove(memory, p.memory, memsize*sizeof(int32_t)); } machine_t::machine_t(const size_t memory_size, FILE* out, FILE* in, void (*error_callback)(const char*)) : stack(), stackip(), labels(), memsize(memory_size), memory(new int32_t[memory_size]), ip(0), fin(in), fout(out), running(true), error_cb(error_callback) { reset(); } machine_t::machine_t(void (*error_callback)(const char*)) : stack(), stackip(), labels(), memsize(1000*1024*sizeof(int32_t)), memory(new int32_t[memsize]), ip(0), fin(stdin), fout(stdout), running(true), error_cb(error_callback) { reset(); } machine_t& machine_t::operator=(const machine_t& p) { if ( &p == this ) return *this; delete[](memory); stack = p.stack; stackip = p.stackip; labels = p.labels; memsize = p.memsize; memory = new int32_t[memsize]; memcpy(memory, p.memory, memsize*sizeof(int32_t)); ip = p.ip; fin = p.fin; fout = p.fout; running = p.running; error_cb = p.error_cb; return *this; } void machine_t::reset() { memset(memory, NOP, memsize*sizeof(int32_t)); stack.clear(); ip = 0; } machine_t::~machine_t() { delete[](memory); } void machine_t::error(const char* s) const { if ( error_cb ) error_cb(s); } void machine_t::push(const int32_t& n) { stack.push_back(n); } void machine_t::puship(const int32_t& n) { stackip.push_back(n); } int32_t machine_t::popip() { if ( stackip.empty() ) { error("POP empty IP stack"); return 0; } int32_t n = stackip.back(); stackip.pop_back(); return n; } int32_t machine_t::pop() { if ( stack.empty() ) error("POP empty stack"); int32_t n = stack.back(); stack.pop_back(); return n; } void machine_t::check_bounds(int32_t n, const char* msg) const { if ( n < 0 || static_cast(n) >= memsize ) error(msg); } void machine_t::next() { ip += sizeof(int32_t); if ( ip < 0 ) error("IP < 0"); if ( static_cast(ip) >= memsize ) ip = 0; // TODO: Halt instead of wrap-around? } void machine_t::prev() { if ( ip == 0 ) error("prev() reached zero"); ip -= sizeof(int32_t); } void machine_t::load(Op op) { memory[ip] = op; next(); } void machine_t::load(int32_t n) { memory[ip] = n; next(); } int machine_t::run(int32_t start_address) { ip = start_address; while(running) exec(static_cast(memory[ip])); return 0; // TODO: exit-code ? } void machine_t::instr_nop() { next(); } void machine_t::instr_add() { push(pop() + pop()); next(); } void machine_t::instr_sub() { /* * This operation is not primitive. It can * be implemented by adding the minuend to * the two's complement of the subtrahend: * * SUB: ; ( a b -- (b-a)) * swap ; b a * compl ; b ~a * 1 add ; b (~a+1), or b -a * add ; b-a * popip * * The problem is that IF the underlying * architecture does not use two's complement * to represent negative values, stuff like * printing will fail miserably (at least in * the current implementation on top of C). */ // TODO: Consider reversing the operands for SUB // (it's currently unnatural) int32_t tos = pop(); push(tos - pop()); next(); } void machine_t::instr_and() { push(pop() & pop()); next(); } void machine_t::instr_or() { push(pop() | pop()); next(); } void machine_t::instr_xor() { push(pop() ^ pop()); next(); } void machine_t::instr_not() { // TODO: this probably does not work as intended push(!pop()); next(); } void machine_t::instr_compl() { push(~pop()); next(); } void machine_t::instr_in() { /* * The IN/OUT functions should be implemented * using something akin to x86 INT or SYSCALL or * similar. E.g.: * * 123 SYSCALL ; exec system call 123 * */ push(getc(fin)); next(); } void machine_t::instr_out() { putc(pop(), fout); fflush(fout); next(); } void machine_t::instr_outnum() { fprintf(fout, "%u", pop()); next(); } void machine_t::instr_load() { int32_t a = pop(); check_bounds(a, "LOAD"); push(memory[a]); next(); } void machine_t::instr_stor() { int32_t a = pop(); check_bounds(a, "STOR"); memory[a] = pop(); next(); } void machine_t::instr_jmp() { /* * This function is not primitive. * If we have e.g. JZ, we can always * do "0 JZ" to perform the jump. * * (Note that this will break the * HALT-idiom) * */ // TODO: Implement as library function //push(0); //instr_jz(); int32_t a = pop(); check_bounds(a, "JMP"); // check if we are halting, i.e. jumping to current // address -- if so, quit if ( a == ip ) running = false; else ip = a; } void machine_t::instr_jz() { int32_t a = pop(); int32_t b = pop(); if ( a != 0 ) next(); else { check_bounds(b, "JZ"); ip = b; // perform jump } } void machine_t::instr_drop() { pop(); next(); } void machine_t::instr_popip() { int32_t a = popip(); check_bounds(a, "POPIP"); ip = a; } void machine_t::instr_dropip() { popip(); next(); } void machine_t::instr_jnz() { /* * Only one of JNZ and JZ is needed as * a primitive -- one can be implemented * in terms of the other with a negation * of the TOS. * * (Note that this will break the HALT-idiom) */ /* instr_puship(); instr_compl(); instr_popip(); instr_jz(); */ int32_t a = pop(); int32_t b = pop(); if ( a == 0 ) next(); else { check_bounds(b, "JNZ"); ip = b; // jump } } void machine_t::instr_push() { next(); push(memory[ip]); next(); } void machine_t::instr_puship() { next(); puship(memory[ip]); next(); } void machine_t::instr_dup() { /* * This function is not primitive. * It can be replaced with a "function": * * ; ( a -- a a ) * dup: nop ; placeholder <- nop * &dup stor ; placeholder <- a * &dup load ; tos <- a * &dup load ; tos <- a * popip */ // TODO: Implement as library function int32_t a = pop(); push(a); push(a); next(); } void machine_t::instr_swap() { /* * This function is not primitive. * It can be replaced with a "function", * something like: * * ; ( a b -- b a ) * swap: * swap-b: nop ; placeholder * swap-a: nop ; placeholder * &swap-b stor ; swap-b <- b * &swap-a stor ; swap-a <- a * &swap-b load ; tos <- a * &swap-a load ; tos <- b * popip * */ // TODO: Implement as library function // a, b -- b, a int32_t b = pop(); int32_t a = pop(); push(b); push(a); next(); } void machine_t::instr_rol3() { /* * This function is not primitive. * It can be replaced with "functions", * something like: * * rol3: * rol3-var: nop ; stack = a b c * &rol3-var stor ; stack = a b, var = c * swap ; stack = b a, var = c * &rol3-var load ; stack = b a c * swap ; stack = b c a * popip * */ // TODO: Implement as library function // abc -> bca int32_t c = pop(); // TOS int32_t b = pop(); int32_t a = pop(); push(b); push(c); push(a); next(); } void machine_t::exec(Op operation) { switch(operation) { default: error("Unknown instruction"); break; case NOP: instr_nop(); break; // Strictly speaking, SUB can be implemented // by ADDing the minuend with the two's complement // of the subtrahend -- but that's not necessarily // portable down to native code case ADD: instr_add(); break; case SUB: instr_sub(); break; // non-primitive // Strictly speaking, all but NOT and AND are // non-primitive (or some other combination of // two operations) case AND: instr_and(); break; case OR: instr_or(); break; case XOR: instr_xor(); break; case NOT: instr_not(); break; case COMPL: instr_compl(); break; // Should be replaced with x86 INT-like operations case IN: instr_in(); break; case OUT: instr_out(); break; case LOAD: instr_load(); break; case STOR: instr_stor(); break; case PUSH: instr_push(); break; case DROP: instr_drop(); break; case PUSHIP: instr_puship(); break; case POPIP: instr_popip(); break; case DROPIP: instr_dropip(); break; case JZ: instr_jz(); break; case JMP: instr_jmp(); break; // non-primitive case JNZ: instr_jnz(); break; // non-primitive case DUP: instr_dup(); break; // non-primitive case SWAP: instr_swap(); break; // non-primitive case ROL3: instr_rol3(); break; // non-primitive case OUTNUM: instr_outnum(); break; // non-primitive } } int32_t* machine_t::find_end() const { // find end of program by scanning // backwards until non-NOP is found int32_t *p = &memory[memsize-1]; while ( *p == NOP ) --p; return p; } void machine_t::load_image(FILE* f) { reset(); while ( !feof(f) ) { Op op = NOP; fread(&op, sizeof(Op), 1, f); load(op); } ip = 0; } void machine_t::save_image(FILE* f) const { int32_t *start = memory; int32_t *end = find_end() + sizeof(int32_t); while ( start != end ) { fwrite(start, sizeof(Op), 1, f); start += sizeof(int32_t); } } void machine_t::load_halt() { load(PUSH); load(ip + sizeof(int32_t)); load(JMP); } size_t machine_t::size() const { return find_end() - &memory[0]; } int32_t machine_t::cur() const { return memory[ip]; } int32_t machine_t::pos() const { return ip; } void machine_t::addlabel(const char* name, int32_t pos, int) { std::string n = upper(name); if ( n.empty() ) error("Empty label"); else { n.erase(n.length()-1, 1); // remove ":" labels.push_back(label_t(n.c_str(), pos)); } } int32_t machine_t::get_label_address(const std::string& s) const { std::string p(upper(s)); // special label address "here" returns current position if ( p == "HERE" ) return ip; for ( size_t n=0; n < labels.size(); ++n ) if ( upper(labels[n].name.c_str()) == p ) return labels[n].pos; return -1; // not found } bool machine_t::isrunning() const { return running; } void machine_t::set_fout(FILE* f) { fout = f; } void machine_t::set_fin(FILE* f) { fin = f; } void machine_t::set_mem(int32_t adr, int32_t val) { check_bounds(adr, "set_mem out of bounds"); memory[adr] = val; } int32_t machine_t::get_mem(int32_t adr) const { check_bounds(adr, "get_mem out of bounds"); return memory[adr]; } int32_t machine_t::wordsize() const { return sizeof(int32_t); } ================================================ FILE: parser.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include #include "parser.hpp" int parser::update_lineno(int c) { if ( c == '\n' ) ++lineno; return c; } int parser::fgetchar() { return update_lineno(fgetc(f)); } void parser::move_back(int c) { if ( c == '\n' ) --lineno; ungetc(c, f); } void parser::skip_whitespace() { int c; while ( (c = fgetchar()) != EOF && isspace(c) ) ; move_back(c); } parser::parser(FILE* file) : f(file), lineno(1) { } int parser::get_lineno() const { return lineno; } std::string parser::next_token() { int c; std::string s; skip_whitespace(); while ( (c = fgetchar()) != EOF && !isspace(c) ) s += c; return s; } void parser::skip_line() { int c; while ( (c = fgetchar()) != EOF && c != '\n' ) ; } ================================================ FILE: sm.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * * Synopsis: Compile and run code on-the-fly. * */ #include #include #include "instructions.hpp" #include "fileptr.hpp" #include "compiler.hpp" #include "error.hpp" #include "upper.hpp" void compile_and_run(FILE* f) { parser p(f); compiler c(p, error); c.get_program().run(); } void help() { printf("Usage: sm [ file(s] ]\n"); printf("Compiles and runs source files on the fly.\n\n"); exit(1); } int main(int argc, char** argv) { try { if ( argc == 1 ) // by default, read standard input compile_and_run(stdin); for ( int n=1; n #include #include #include "version.hpp" #include "instructions.hpp" #include "fileptr.hpp" #include "compiler.hpp" #include "error.hpp" const char* file = ""; parser *p = NULL; // Return '.' of a filename static std::string sbasename(const std::string& s) { using namespace std; const string::size_type p = s.rfind('.'); return p == string::npos ? s : s.substr(0, p); } static void compile_error(const char* msg) { fprintf(stderr, "%s:%d:%s\n", file, p->get_lineno(), msg); exit(1); } void compile(FILE* f, const std::string& out) { delete(p); p = new parser(f); compiler c(*p, compile_error); c.get_program().save_image( fileptr(fopen(out.c_str(), "wb"))); } int main(int argc, char** argv) { try { if ( argc < 2 ) error("Usage: smc [ filename(s) | - ]\n" VERSION); for ( int n=1; n #include "instructions.hpp" #include "machine.hpp" #include "fileptr.hpp" #include "error.hpp" static bool isprintable(int c) { return (c>=32 && c<=127) || c=='\n' || c=='\r' || c=='\t'; } static const char* to_s(char c) { static char buf[2]; buf[0] = c; buf[1] = '\0'; switch ( c ) { default: return buf; case '\t': return "\\t"; case '\n': return "\\n"; case '\r': return "\\r"; } } static void disassemble(machine_t &m) { int32_t end = m.size(); while ( m.pos() <= end ) { Op op = static_cast(m.cur()); printf("0x%x %s", m.pos(), to_s(op)); if ( (op==PUSH || op==PUSHIP) && m.pos()<=end ) { m.next(); printf(" 0x%x", m.cur()); if ( isprintable(m.cur()) ) printf(" ('%s')", to_s(m.cur())); } printf("\n"); m.next(); } } int help() { printf("Usage: smd [ file(s) }\n\n"); printf("Disassembles compiled bytecode files.\n"); exit(1); } int main(int argc, char** argv) { try { for ( int n=1; n #include #include "version.hpp" #include "instructions.hpp" #include "machine.hpp" #include "fileptr.hpp" static void help() { printf("smr -- stack-machine run\n"); printf("%s\n\n", VERSION); printf("Opcodes:\n\n"); Op op=NOP; do { printf("0x%x = %s\n", op, to_s(op)); op = static_cast(op+1); } while ( op != NOP_END ); printf("\nTo halt program, jump to current position:\n\n"); printf("0x0 PUSH 0x%x\n", (unsigned int)sizeof(int32_t)); printf("0x%x JMP\n\n", (unsigned int)sizeof(int32_t)); printf("Word size is %lu bytes\n", sizeof(int32_t)); exit(0); } int main(int argc, char** argv) { try { bool found_file = false; for ( int n=1; n: ; ( a -- ) out ; write 8-bit to LSB of 32-bit value to output stream popip @: ; ( address -- value at address ) load popip inc-core: nop ================================================ FILE: tests/fib.src ================================================ ; Print the well-known Fibonacci sequence ; ; Our word size is only 32-bits, so we can't ; count very far. ; Program starts at main, so jump there &main jmp ; Create label 'count', which refers to this memory ; address. ; ; The NOP (no operation; do nothing) is only used ; to reserve memory space for a variable. count: nop ; Initialize the counter by storing 46 at the address of 'count'. ; ; POPIP will pop the instruction pointer, effectively jumping to ; the next location (probably the caller). count-init: 46 &count stor popip ; Shorthand for loading the number at 'count' onto the top of the stack. ; ; The "( -- counter)" comment is similar to Forth's comments, explaining ; that no number is expected on the stack, and after running this function, ; a number ("counter") will be on the stack. count-get: ; ( -- counter ) &count load ; load number popip ; Shorthand for decrementing the number on the stack dec: ; ( a -- a-1 ) 1 swap sub popip ; Store top of stack to 'count', do not alter stack count-set: ; ( counter -- counter ) dup &count stor popip ; Decrement counter and return it count-dec: ; ( -- counter ) count-get dec count-set popip ; Print number with a newline without altering stack show: ; ( number -- number ) dup outnum '\n' out popip ; Duplicate two top-most numbers on stack dup2: ; ( a b -- a b a b ) swap ; b a dup ; b a a rol3 ; a a b dup ; a a b b rol3 ; a b b a swap ; a b a b popip jump-if-nonzero: ; ( dest_address predicate -- ) swap jnz popip ; The start of our Fibonacci printing program main: count-init 0 show ; first Fibonacci number 1 ; second Fibonacci number loop: ; add top numbers and show ; a b -> a b a b -> a b (a + b) dup2 add show ; decrement, loop if non-zero count-dec &loop jump-if-nonzero ================================================ FILE: tests/forward-goto.src ================================================ ; simple test of forward labels start: &cause jmp effect: 'e' out 'f' out 'f' out 'e' out 'c' out 't' out '\n' out halt cause: 'c' out 'a' out 'u' out 's' out 'e' out 32 out '-' out '>' out 32 out &effect jmp ================================================ FILE: tests/func.src ================================================ ; program starts at main program main '4' out '\n' out halt three: '3' out '\n' out popip main: one two three popip one: '1' out '\n' out popip two: '2' out '\n' out popip ================================================ FILE: tests/hello.src ================================================ ; Labels are written as a name without whitespace ; and a colon at the end main: 72 out ; "H" 101 out ; "e" 108 dup out out ; "ll" 111 out ; "o" 33 out ; "!" ; newline 10 13 out out 42 outnum ; print a number 10 13 out out ; ... and CRLF ; stop program halt ================================================ FILE: tests/todo-print.src ================================================ ; This is a suggestion for a new "embed" keyword, ; as well as support for parsing strings. &main jmp msg: embed "Hello, world!\n" num: embed 40 printstr: ; ( adr -- ) print-src: nop ; placeholder &print-src stor ; store src address to print-src print-loop: &print-src load ; get ptr dup +1 &print-src stor ; save ptr + 1 load ; get char dup &print-exit jz ; stop if '\0' out ; print character &print-loop jmp ; loop print-exit: popip main: &msg printstr ; print "Hello, world\n" &num +1 +1 outnum '\n' out ; print "42" ================================================ FILE: tests/yo.src ================================================ ; simple code to demonstrate compile-and-run 'y' out 'o' out '!' out '\n' out ================================================ FILE: upper.cpp ================================================ /* * Made in 2010 by Christian Stigen Larsen * http://csl.sublevel3.org * * Placed in the public domain by the author. * */ #include #include "upper.hpp" std::string upper(const std::string& s) { std::string r(s); for ( int n=0, l=s.length(); n